COVID-19 Predictions, Dunning-Kruger Effect, and the Hippocratic Oath of a Data Scientist

Data sources on COVID-19 are easy to find. The R and Python libraries make it easy to create beautiful graphics, styles, predictions, insights, and advice. I’ve seen advice in areas like business, politics, and health from people who are clearly uneducated in these areas. We have all seen how “data is promoted”.

Some good friends asked me if I would check the COVID-19 records.

Yes, I am checking these databases. However, my research was born out of knowledge and not to publish my prediction or suggestion. I am not planning any of my tests with the COVID-19 dataset as I am convinced that I am not qualified for it.

Let me touch it a little. I promise I’ll come back and connect the dots.

Pittsburgh, 1995: Two men robbed a bank in the afternoon without any disguise or face – even smiling at the camera as they leave. Later that night, police arrested one of the robbers. The man and his companion believe that cutting lemonade on their bodies makes the camera invisible to them as long as they are not near the source of the fire. You’d think it’s mentally healthy or on too much medication. However, this is not the case. It is a case of self-examination of power.

Inspired by the Pittsburgh theft, Kruger and Dunning from Cornell University decided to study how people make mistakes and positively assess their abilities and skills. This study was later published in 1999 as “The Ignorant and the Uninitiated: The Difficulty Recognizing Your Inability Leads To Excessive Self-Examination”.

The Dunning-Kruger Effect is the prejudice that leads to self-examination. Inexperienced (less known, less experienced, or less confident) only make mistakes, but fail to recognize their mistakes. On the other hand, professionals (with more knowledge and experience) often criticize themselves and recognize their shortcomings.

The power of a state-of-the-art library is amazing. In a few lines of code, you can get an amazing look or worry about the complexity of the implementation. I call these libraries both a blessing and a curse. Blessings for those who are famous or “know what they do not know” and curses for those who “do not know”. At the time of our Science and Data Engineering Bootcamp – around halfway through the Bootcamp – our followers achieved a high level of trust. Why shouldn’t they? With all of the solid R and Python libraries, as well as game data, everyone will think the same. Many of them are amazed at how easy data science, AI, and machine learning are.

About two-thirds of the Bootcamp, when asked to improve specifications through engine planning and redesign, the recently won confidence began to wane. One of the convention attendees immediately exclaimed in frustration and quoted the following:

“How does this machine learn? Why should I do all of the planning, data cleansing, and self-processing? Why can’t we do that? ‘

Time to explain the Dunning-Kruger Effect in class. (This is often a joke unless a visitor insults them with “stupidity.” I kept giving this example). I tell them that data science and machine learning go way beyond libraries, systems, and devices. Domain knowledge in this problem situation is important. Remove toxins, dispose of them. Now let me break the lock.

With the advent of COVID-19, many people have started sharing their work with existing data sources. I love creating things and making an effort. I’ve seen good vision with every possible application. I’ve seen models including predictions of how many cases will occur in the country the next day/week/month. I have found time and time again that these findings and conclusions are not only worrying but also unfounded.

Domain knowledge in the context of a problem is an important prerequisite for solving a difficult representation problem. If you are at least unfamiliar with the principles of infectious diseases, the economy, public order, and health policy, please stop drawing misleading and frightening conclusions – or give people false comfort.

I created an infographic a few months ago called Hippocratic Data Scientist, which was inspired by Hippocratic mathematicians.