Big Data is a hot topic as waves of digital information issue forth from smartphones, tablets, social networks, online videos, wireless sensors and more.
Information technology managers need to meld these new Web cloud data sources with their older, in-house software systems. Companies are grappling with how to manage and best use this rising flood of data without drowning in the sheer volume.
Data integration software extracts data from one source, transforms them into a shared format and then loads them into a system that works with other forms of data.
For IT managers, the surge of Big Data offers both risks and opportunities, says Markarian.
Last week, Informatica launched its latest product suite, Informatica 9.1, which aims to lower risks and make good on opportunities.
Markarian recently spoke with IBD about Big Data, from company headquarters in Redwood City, Calif.
IBD: How do you define Big Data?
Markarian: Informatica defines Big Data as a confluence of three trends: big transaction data, big interaction data and Big Data processing.
Big transaction data is the most accessible idea. It comes from big ERP (enterprise resource planning) systems with data from order entries, general ledgers or HR data.
All of this is structured information (organized into rows and columns) that’s inside ERP applications.
That data grew up being stored in standard relational databases. Then as data volumes have gotten bigger, they’ve gone into data warehousing appliances from Teradata (NYSE:TDC – News), (IBM’s (NYSE:IBM) Netezza and ( Microsoft’s (NASDAQ:MSFT) Datallegro, for better analysis of structured data.
Now we’re seeing lots of different types of (unstructured) data from Facebook, Twitter, RFID (radio frequency sensors) tags and human genome data.
All of these new data formats look and feel very different from ERP.
All of that data is suddenly different from ERP systems, where the information provides its own context.
IBD: So are you saying the problem comes in merging structured transactional data with unstructured social feeds?
Markarian: Yes. The social part presents unique challenges.
If you could guarantee that you could just hear the social messages you care about, the problem is more tractable.
But the problem is that a really big net catches everything that’s said.
So you need to analyze if it’s something positive or negative, or if something needs to be done or if it can be ignored.
These are big challenges. In a world of structured information, if something is entered into your general ledger, you inherently care about it.
But in the case of social media, you have no idea, and the amount of processing required to give you an idea is enormous.
IBD: And that leads us to the third trend, Big Data processing?
Markarian: There’s a whole new category for Big Data processing between structured and unstructured data.
The vanguard of Big Data processing is called Hadoop. It’s a technology that grew up around an open-source language called MapReduce, which was developed by Yahoo (NASDAQ:YHOO – News) and Google (NASDAQ:GOOG – News) to process massive data feeds, to improve their search engine speeds.
Hadoop is a language and processing environment in a massively parallel way.
For instance, Yahoo has a 40,000-node Hadoop cluster, which is spectacularly large, to process these unstructured data feeds compared to structured data in relational databases.
It’s a way of tackling massive data problems in a shared-nothing environment, which means each node doesn’t need to know what’s going on in any other node to do its job.
Informatica has partnered with EMC and Cloudera around their Hadoop versions.
You can use Hadoop to handle structured information too. So our customers are layering it around their big data warehouse appliance environments.
In this way, they can bring in some unstructured information, or use it as a more cost-effective means to manage their existing data.
IBD: How can companies benefit from better management of Big Data?
Markarian: Let’s take two customers, HealthNow (New York) and Station Casinos.
HealthNow may want to combine traditional transactional data about their health care customers, plus what’s being said about their services in social media.
The way to do that is to listen to Twitter feeds and Facebook posts from their customers.
They would use unstructured data processing to understand what’s being said, and then correlate the identity of Facebook users with their customers who are saying something about their company, or about their insurance provider.
That could give them an opportunity to win a new customer, or provide warnings about the danger of losing an existing customer.
Station Casinos takes all the information from its slot machines and ATMs, and combines that with things like hotel records and social media updates, so they understand what customers are doing and thinking.
That can be used to directly market to customers about their stay at the casino, or even to influence the customers’ own network of friends.
Figuring out what might inspire their friends to visit a Station Casino is really about understanding the second order of social media on their business.
That’s the big problem for customers to solve: How to combine social interaction data with older transactional data to gain an advantage.
IT managers have never tried to solve problems like this before, so it’s a brave new world for these guys. Informatica’s technology can help solve this new generation of problems.