Understanding Data Science, Data Analytics and Big Data

Just another day

Understanding Data Science, Data Analytics and Big Data

Just another day
It’s Tuesday morning and your alarm rings at 5:30 AM. You turn on your geyser and brush your teeth. Then, you check your emails as you wait for the iron to heat up, but the current goes off. You make do with a somewhat crumpled shirt. For your spouse, the coffee and toast are a bit of a challenge without electricity, so a last-minute change to the breakfast menu: corn flakes and cold milk. You decide to skip exercising altogether and head for the shower.

After a quick breakfast and a hurried conversation, at precisely 8:15 AM, you get into your car and begin the commute to work. On the way, you stumble into a bumper-to-bumper traffic jam stretching forever with no exit. From a chat with a fellow commuter, you realize that there is a procession happening and they have blocked one of the lanes.

The radio plays another ad for a new luxury home that promises a 15 minute drive to your office, when you wonder what happened to those days when this road was empty. Then, a hot new Bollywood number comes up and you start humming along.

Finally, after an hour and a half in excruciating traffic, you reach your office just in time for the daily meeting, but you’re stressed, and somewhat tired from the long drive to work. The way things are

This is the typical day for many office goers in India. They get up, get ready and go to work. Along the way, they make a few decisions, but largely go with the flow. They are usually in reactive mode and, unfortunately, focus on just getting through the day.

But, it doesn’t have to be that way. Imagine this It’s Tuesday morning and the alarm rings at 5:10 AM instead of 5:30 AM as usual. You read about the scheduled power cuts and adapted your routine accordingly. As you wake up, you turn on the iron and then the geyser. While you brush your teeth, your spouse already has the toaster running and is making French Toast. As you finish ironing your shirt, you can smell the hot cup of coffee waiting for you.

Suddenly, without warning, the current goes off. You smile and step out for your morning run.

After your exercise and shower, you have a nice, hot breakfast and coffee along with some fun conversation. Then, you get ready and leave home at 8:30 AM. You take a slightly longer route but reach work within 40 minutes – with plenty of time to spare before your daily meeting. What’s different?

In the first scenario, you went passively with the flow. You did things out of habit and routine. You accepted things as they were. You did not factor in the various parameters—the power outage and the traffic jam that made you fall behind schedule–before planning your day. You applied a stock approach to a unique situation – and expected the usual results.

In the second scenario, you considered the different things that could affect your routine and altered your schedule accordingly. You knew about the power cut so woke a few minutes earlier than usual to turn on the geyser and iron. Your spouse got the toaster and coffee maker going a few minutes ahead of schedule as well. Then, having factored in the day’s traffic conditions, you decided to take an alternate route.

You had information that you gained insight from. You adapted your actions accordingly and achieved a significantly better outcome. However unwittingly, what you did was to utilize the power of analytics.

Welcome to the world of data science.

What is data science?

Data science is the term given to the collective application of data collection, processing, manipulation and interpretation using tools and techniques from mathematics, statistics, computation and domain expertise.

In other words, data science deals with the process of working with data to solve problems. It extends all the way from the collection of data to deriving insights from the data you captured. Applying data science

Let’s consider the story you read above. By using the techniques and concepts integral to data science, hypothetically you managed to avoid a repeat of scenario 1 by applying insights you had gained from an analysis of why your mornings were so hurried, in order to streamline your days and make them better and brighter.

To start things off, you would have to ask yourself the question: “What do I need in order to have a great day?”

The list is likely to include the following variables:

  • Electricity
  • Sleep
  • Hot Water
  • Clothes
  • Breakfast
  • Transportation
  • Traffic

This constellation of factors is what determines the kind of data you will have to collect, process, prune, and evaluate to gain insight into how to optimise your daily routine. Data science is what will help you analyse the combined impact of each variable (data point). Data or ‘Big Data’?

In our simple example of a morning routine, we have considered seven parameters. The resultant insights could make your day significantly better.

But, what if you wanted more? What if you had a complex enough model where you accounted for every single parameter of significance (and not just seven)?

Then, you wouldn’t just be dealing with data–you’d be grappling with Big data.

Big data, as defined by Wikipedia, is this: “Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.”

If we were to simplify it, big data’s all about working with huge datasets and deriving insights from them. These datasets are so huge that the traditional methods don’t work. You need to use specially designed methods of collecting, analysis, storage and processing.

Generally speaking, the bigger the dataset, the better the results – provided that the quality of the dataset is good.

For example, in an ecommerce store, the website collects a whole host of information – referring sites, time spent on site, bounce rate, landing page, visitor flow etc. They track this information on a person-by-person basis; consequently, over a span of a few years, they would be able to build a massive dataset that cannot be handled by traditional methods. That’s when they know that they’re dealing with ‘Big Data’.

So, in our example of the morning routine, you could have an exceptionally large dataset with many more parameters that you can process and analyze. You could have collected data from thousands or millions of people in your city. You could have collected this data over a long time and have recorded several more parameters such as weather, time of day, traffic updates, tweets, household income etc. that you could use in your analysis.

Here’s another way to put the scale of datasets in perspective: if you were to print out a normal sized dataset, it could as thick and big as the daily newspaper. If you were to print out a ‘big data’ dataset, you would need 50 warehouses full of telephone directories.

When dealing with data this voluminous, traditional tools and methods will not help: you would need sophisticated software specifically designed to handle this. Check out our courses on Big Data Analytics – The Fastest Growing Branch of Analytics here. Analyze this

Having collected all this data about your morning, you would need to explore and study it to develop your insights; this is called data analysis. In our example, you would be able to deduce that watching ‘Saasbhikabhibahunthi’ on Monday night makes you wake up later on Tuesday mornings. Or, that doing your laundry on Saturday instead of Sunday will help you have an extra ironed shirt that you can use on Tuesday.

But, what if you decided to look for more detailed, intricate patterns across multiple data sets? Then you would be practicing data analytics.

Data analytics is the application of a sequence of steps (algorithms) or transformations to generate insights from processed datasets.

In our morning routine example, you would look at the complex interplay of specific details. For instance, by comparing daily temperature and car usage, you might discover that temperature plays a significant role in deciding car usage. With a little bit more digging, you discover that this simplistic model holds true only in the summer months. During the rainy season, car usage is maximized. Armed with this knowledge, you would see that there is a prediction of higher than average rainfall for the next day and deduce that the traffic would be higher.

That’s data analytics at work. Deciding to set out earlier than usual, because traffic will be heavier—that’s data analytics’ implementation at work. In Summary

Analytics, big data and data science are industry buzzwords that are often used interchangeably and incorrectly. While data science is the domain in which you would operate, data analysis is one of the core processes that adds value to the data you collect. And, when you’re dealing with massive sets of data that cannot be processed with traditional tools and approaches, you’re dealing with big data.

What do you think of our definition? Does it match yours? And, do the ‘daily routine’ examples work? Want to share your own example? Let us know in the comments section.

At SQTL, we pride ourselves on delivering the best services to meet your data science education needs. Explore our courses here or read about others just like you who have found success with the power of analytics.