New Big Data systems and advanced technologies are revolutionizing how businesses analyze their data assets and discover new value and insights across their business practices. Barry Zane, Senior Vice President and John Rueter, Vice President of Marketing at Cambridge Semantics, both recently sat down with DATAVERSITY® to discuss how companies are implementing new technologies such as Hadoop, Online Transaction Processing (OLTP), and Online Analytical Processing (OLAP) into their data discovery processes.
When Zane first got into the graph database space, he already knew that graphs excelled at processes that were difficult or impossible to do with a relational approach. What made him a convert was the discovery that “pretty much anything that you can do with relational, you can also do with the graph model.”
Rueter, who has worked in the data and information space for more than two decades, views the graph approach of working with data as the closest thing to the “Holy Grail” for solving data and information challenges.
“Given the forecast of the advancements and the volume of data in the future, graph has emerged in our view as the way forward,” Rueter said. He also noted that the fundamentally different approach that graph databases take to data analysis now make it possible to go beyond the historic limitations of relational databases.
Online Transaction Processing (OLTP) and Online Analytics Processing (OLAP)
According to Zane, the difference between OLTP systems and OLAP systems is that OLTPs are great at fetching or writing a single-point piece of information. “I call it, ‘Tell me about Steve,’ whereas analytics (OLAP) is much more geared to understanding patterns of whole populations of people, not individuals.”
OLTP systems also excel at thousands of “Tell me about Steve” queries in a second, but doing analytics (“Tell me about people”) in a matter of seconds has not been available in the graph model until now.
Rueter noted that the majority of graph products in the marketplace are currently focused solely on OLTP. Combining OLTP with Analytics and a robust straightforward interface designed for businesses and decision makers “really drives the choice of the graph model.”
Zane coined the term “GOLAP,” which stands for Graph Online Analytical Processing. “GOLAP is much more around the analytical side of it. That’s where the graph engine is doing the analytical process in a fundamentally different and better way.”
Compared to other approaches for dealing with Big Data, Zane remarked that “it turns out that graph is a much simpler and more flexible model than what you see with relational – let alone the Hadoop model.” If answering questions in a batch relational model takes 24 hours, Hadoop puts it a step even farther away, he said. The ability to ask a question and get an answer is usually measured in the seconds range. Zane believes that one of the reasons the use of graph is showing so much growth is that it’s closer to how people think. “People don’t really think in terms of rectangular tables that link to each other. They think in terms of relationships.”
Rueter sees the potential promise of lower storage costs with Hadoop as a driver toward its popularity, “but Hadoop alone still doesn’t get at the issue of what you want to do with your data.” Deeper analysis, iterative questioning processes, and creating context for data require additional tools. “The potential is there with Hadoop, but graph really delivers on that promise.”
The roadblock to widespread adoption of graph has been the performance needs of Big Data operations. Zane said there were similar concerns about relational databases until technology was developed to address speed issues. “That’s why we got into this space, because the people involved had deep experience in getting very high performance out of relational.” By addressing large-scale performance issues, he saw a way to bring graph into the mainstream, as had been done with relational.
“Anything is going to perform with a million pieces of data, but when you go to a trillion pieces of data, that really requires a design that leverages parallel computing.” Zane added that they benchmarked a trillion almost two years ago.
Advancements in OLTP and OLAP
Zane noted that there are now places where customer data is not just being collected and retained by a company, but also strategically used to provide a better, more holistic customer experience. “We’re now seeing the data go in the other direction,” back to the customer to encourage buying behavior, or to assist in a sale.
Amazon, a leader in the Big Data space, uses information gleaned from other customers’ buying habits to inform suggestions such as “customers who bought this item also bought this item,” or “80 percent of users who did this same search bought this item,” said Zane. The results of analytics showing the relationships between co-related products are then routed back into the OLTP system. The Amazon customer, in essence, is sending a simple query to the OLTP system, asking what other things people buy who bought a particular item, “but the analytics behind it were done, in this case, typically overnight by the warehouse system.”
Zane is seeing an increasing number of companies using this process of feeding analytics results back into the OLTP system so it can be responded to by millions of people online. “It’s done offline and fed back into the OLTP systems so that they can handle thousands of those per second,” he said.
Machine Learning and Artificial Intelligence (AI) models are similarly trained by analytics performed on OLAP systems, but graph OLAP systems inherently can perform faster iterative discovery and help inform decision-making processes, while also understanding more sophisticated questions and relationships.
The faster you can respond to the first question in real time, the faster the person is going to be able to ask the next question, and the next question, and the next question, Zane said. Historically there are still many technologies that are geared for batch operation, but if the batch cycle time between questions is 24 hours, “it’s pretty straightforward – you can’t do human exploration and discovery where you’re only able to post your question once a day.”
Rueter said of Cambridge Semantics’ Anzo platform, which is an end-to-end approach for dealing with data, “It’s the ingestion, the loading, the transformation, the ETL, the security, the provenance, etc., all the way out to discovering analytics all in a single environment.”
Until about six months ago, Rueter remarked that AnzoGraph had been “buried” within Cambridge Semantics’ Anzo offering. AnzoGraph was a foundational part of the ability to scale Semantic Technology and do analytics with it. “We realized that in some ways, we were under-serving the market by not better publicizing it, so we introduced AnzoGraph to the market as a standalone graph warehouse.”
Zane added that very few customers want responsibility for creating all the necessary tooling for ETL, modeling, and mapping data as it is being ingested. “With Anzo, you’ve got the ability to interact with the data without even knowing a query language.” Users can interact with their data using a graphical point-and-click type of interface, allowing them to ask complex queries underneath the cover that it forwards to the database. “That ability to create and ask interesting questions with a graphical user interface without knowing the query language – let alone a programming language – is a huge value proposition to business users.”
What makes AnzoGraph unique, said Zane, it the ability to take data from outside sources – relational or graph – and present it “So that it makes sense with the business user’s notion of the world.” Rueter added that users are able to create provenance for their data all the way through, “And they can feel confident in the kind of decisions that they are making.”
Enterprises need the promise of faster and deeper insights at a fraction of the cost of traditional approaches, within a new data-oriented world of constantly evolving systems, data types, and new technologies. The latest developments in graph database technology are now providing a reliable solution to those problems.
Image used under license from Shutterstock.com