All of a sudden, everyone has money for Big Data. From small start-ups to mid-sized companies and large enterprises, businesses are now keen to invest in and build Big Data solutions to generate more intelligent data. So what is Big Data all about?
In my opinion, Big Data is the new buzzword for a data mining technology that has been around for quite some time. Data analysts and business managers are fast adopting techniques like predictive analysis, recommendation service, clickstream analysis etc. that were commonly at the core of data processing in the past, but which have been ignored or lost in the rush to implement modern relational database systems and structured data storage. Big Data encompasses a range of technologies and techniques that allow you to extract useful and previously hidden information from large quantities of data that previously might have been left dormant and, ultimately, thrown away because storage for it was too costly.
Big Data solutions aim to provide data storage and querying functionality for situations that are, for various reasons, beyond the capabilities of traditional database systems. For example, analyzing social media sentiments for a brand has become a key parameter for judging a brand’s success. Big Data solutions provide a mechanism for organizations to extract meaningful, useful, and often vital information from the vast stores of data that they are collecting. Big Data is often described as a solution to the “three V’s problem”:
Variety: It’s common for 85 percent of your new data to not match any existing data schema. Not only that, it might very well also be semi-structured or even unstructured data. This means that applying schemas to the data before or during storage is no longer a practical option.
Volume: Big Data solutions typically store and query thousands of terabytes of data, and the total volume of data is probably growing by ten times every five years. Storage solutions must be able to manage this volume, be easily expandable, and work efficiently across distributed systems.
Velocity: Data is collected from many new types of devices, from a growing number of users and an increasing number of devices and applications per user. Data is also emitted at a high rate from certain modern devices and gadgets. The design and implementation of storage and processing must happen quickly and efficiently.
There is a striking difference in the ratio between the speeds at which data is generated compared to the speed at which it is consumed in today’s world, and it has always been like this. For example, today a standard international flight generates around .5 terabytes of operational data. That is during a single flight! Big Data solutions were already implemented long ago, back when Google/Yahoo/Bing search engines were developed, but these solutions were limited to large enterprises because of the hardware cost of supporting such solutions. This is no longer an issue because hardware and storage costs are dropping drastically like never before. New types of questions are being asked and data solutions are used to answer these questions and drive businesses more successfully. These questions fall into the following categories:
- Questions regarding social and Web analytics: Examples of these types of questions include the following: What is the sentiment toward our brand and products? How effective are our advertisements and online campaigns? Which gender, age group, and other demographics are we trying to reach? How can we optimize our message, broaden our customer base, or target the correct audience?
- Questions that require connecting to live data feeds: Examples of this include the following: a large shipping company that uses live weather feeds and traffic patterns to fine-tune its ship and truck routes to improve delivery times and generate cost savings; retailers that analyze sales, pricing, economic, demographic, and live weather data to tailor product selections at particular stores and determine the timing of price markdowns.
The below diagram illustrates the industries across which Big Data can play a significant role in storage and analytics in the coming future.
There is a lot of debate currently about relational vs. non-relational technologies. “Should I use relational or non-relational technologies for my application requirements?” is the wrong question. Both technologies are storage mechanisms designed to meet very different needs. Big Data is not here to replace any of the existing relational model-based data storage or mining engines; rather, it will be complementary to these traditional systems, enabling people to combine the power of the two and take data analytics to new heights.
The first question to be asked here is, “Do I even need Big Data?” Social media analytics have produced great insights about what consumers think about your product. For example, Microsoft can analyze Facebook posts or Twitter sentiments to determine how Windows 8.1, its latest operating system, has been accepted in the industry and the community. Big Data solutions can parse huge unstructured data sources—such as posts, feeds, tweets, logs, and so forth—and generate intelligent analytics so that businesses can make better decisions and correct predictions.