3 Mathematical Laws that Make Data Science Fun

If you don’t understand these fundamentals, you won’t understand data science. This is the reason they are referred to as the “core” of a data science career.

What are they?

Value of Mathematical Concepts in Data Science

Data Science and machine learning is based on crucial mathematical concepts of Statistics and Probability. They are vital to making ML models and in-depth data engineering and analytics.

Of all concepts and laws, three mathematical laws stand out. Data science professionals must have a knowledge of these. They include:

• The Law of Large Numbers
• Zipf’s Law, and
• Benford’s Law

You may wonder that the job of a data scientist is to work with data, then why mathematical concepts are so vital. While they don’t require 360-degree know-how of mathematical theory, data scientists must know the mathematical laws vital for data projects. Data science professionals need to understand mathematical theory to efficiently solve business problems with machine learning.

From developing machine learning models to effective data analysis and understanding, some mathematical models cannot be overlooked. We list out the laws that can help professionals in data science.

3 Fundamental Laws for Data Science Professionals

1.Law of Large Numbers (LLN)

This is one of the most intuitive laws in mathematics, and probability, and therefore, without a doubt contributes highly in data science.

According to the law of large numbers, as the number of trials increases the results start to get closer to the expected values.

So, if we have a random variable X, and its population mean (or expected value) is Y.

Then, according to this law, if we take several observations for this random variable, the average of all those observations, if taken multiple times, will be the same as your first observation.

For instance, you roll a dice. There are 6 possibilities. The mean for a six-sided dice would be 3.5. So, if we keep rolling the average of results will get closer to 3.5, which is the expected value.

A tricky aspect of this law is that data science professionals will need to conduct multiple experiments or need many occurrences. This can give great insights in big data.

This law can be applied to multiple situations, and due to this reason, it is also among the most misunderstood (and even misapplied) laws. It is often also confused with the Law of Averages that says that the outcomes of a random event will even out. However, we cannot expect the expected value in a smaller sample.

2. Zipf’s Law

This law is used in quantitative linguistics analysis. For instance, McCowen and Doyle have used it for analyzing dolphin’s and baby’s voice in a search for extra-terrestrial beings. What does the law state?

It says that if a natural language dataset corpus is available, then the frequency of any word will be inversely proportional to rank in the frequency table.

Therefore, the most frequent word will occur twice often as the second most frequent word.

3.Benford’s Law

It is a weird phenomenon exhibited by numbers in the real world. It is also called the ‘First-digit law’. According to the law, every number in a field exhibits certain frequency. This goes against normal probability wherein we take the chances for a number to occur as distributed uniformly. So, intuitively, digit 1 should have the same probability as number 9.

But it is not so. The Newcomb-Benford Law states that there is a leading digit in a real-world dataset.

Say, you take the population of countries and group them according to their first number. So, every country with first digit as 1 in their population will come under one, and so on. It will show that certain digits are more common than others. Leading digital is likely to happen more than others.

Some applications for this law in a data science career in the real-world include fraud detection on tax forms, economic numbers, accounting, election results.

Probability forms an important base for data science. These laws make it much more dynamic and hopefully more accurate and in-depth for usage in data science. Based on many theories and experiments, a thorough understanding of these laws can make data science so much more fun and valuable for you and the companies.