August 29, 2016


Big data: Everyone seems to be talking about it, but what is big data really? Where is this data coming from, how is it being processed, and how are the results being used?

Well there is no hard and fast rule about what size your database needs to be in order to be considered as “big”. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. The amount, speed, and value of data sources is rapidly increasing.

Previously data was coming from well organised databases with controlled schemas with strong validation conditions. Nowadays Data is messy in structure. Nowadays we are getting data in many forms, such as log files, spreadsheets, audio etc. Data is often non-uniform. The ease of access of the internet means data comes from wide range with many more contributors.

Previously we were getting data from computers, but now there are many more devices. For Example, in Africa 98% of Internet access points are from Mobile. Apart from comouters and mobiles there are many more sources from which we are getting data, like sensors, social media, airlines, e-shopping, nowadays even vehicles are containing number of lines of code. Connectivity to the internet providers, navigation information to the driver, gps which detects the vehicles movement and so on.

The data is really big, below I’m giving few examples:

  1. Wallmart, one million transactions every hour.
  2. eBay, 50 PetaBytes of data every day.
  3. youtube, 100 hours of video every minute.
  4. Google, 40,000 search queries per second.

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

1. Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.
2. Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.
3. Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.


Big Data is not only big, but it’s valuable too.

This info-graphic from Informatica walks through the risks and opportunities associated with leveraging big data in corporations.

  1. Big Data is Timely – 60% of each workday, knowledge workers spend attempting to find and manage data.
  2. Big Data is Accessible – Half of senior executives report that accessing the right data is difficult.
  3. Big Data is Trustworthy– 29% of companies measure the monetary cost of poor data quality. Things as simple as monitoring multiple systems for customer contact information updates can save millions of dollars.
  4. Big Data is Relevant– 43% of companies are dissatisfied with their tools ability to filter out irrelevant data. Something as simple as filtering customers from your web analytics can provide a ton of insight into your acquisition efforts.

In my next tutorial, we will explore more about the Big Data.
Stay Tuned !!!