Data Alone is Powerless

The goal of data science is to unravel the science behind finding the patterns in data, the patterns that human eyes will fail to find, and the intricate patterns that contain the secrets to life and its manifestations.

What if we ask this fundamental question..."Is data alone enough?"

To answer that question, let's look into some data.

Data science aims to uncover intricate, hidden patterns in data. Data alone is powerless; questions drive its transformation into meaningful insights, as illustrated in the scientific process. This is explored in this article.

Here is a data set consisting of a picture of a horse, an audio of the national anthem of India, and yearly financial data of an export-import business. The question is what is the pattern in this data set?

The answer is none. There is no pattern inside the data. Thus it proves that data alone is powerless. So, what exactly are we missing?

Data always comes with question(s).

Before I give you an example, let me show you this flowchart.

This is exactly how scientists have solved problems using data for hundreds of years. First, you ask a question, then based on that question you collect data. The step of data collection is lovingly called a sample survey. After data collection, you transform the data according to your need to find patterns. Then, you use advanced scientific tools to answer your initial question based on how much pattern you have extracted, and how you have extracted.

Let me give you an example, now.

Let's say you first ask the question if there is any relationship between pressure and volume of a gas. To verify your question, you collect data on the pressure and volume of a particular gas as shown above. This is a very natural step. This is exactly how you collect evidence. Then, you transform and pre-process the data to make a chart out of it. You can also fit advanced models to actually answer your original question. In this case, you can easily see from the chart that as volume increases, the pressure of the gas decreases. Thus, you use this chart to answer your original question "Yes, there is an interesting relationship between volume and pressure of a gas. As volume increases, the pressure of the gas decreases". Now if you ask the further question "How fast does it decrease?", then we will have to use mathematics, and other computational tools to answer the question. But, the process of answering the question remains intact.

I have mentioned the terms human intelligence and artificial intelligence. I will explain those terms in an upcoming post. I will link it here when I will write it. However, you have asked me, what questions I should ask before collecting data. This brings us to the next part of the discussion: The Two Fundamental Questions.

CR Rao has passed away on this day. This post is therefore dedicated to him. CR Rao has written his book on "Statistics & Truth", which is undeniably one of the top three books which have inspired me to teach you the truth of data science - the true understanding of data science. Salute to him for his contribution to the development of statistical science. May he rest in peace.

Learn. Code. Apply.
Statistics. Machine Learning.
cloud-syncearthbullhorn linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram