3 steps: AI best practice from experts

May,13
2019

The right core: Causality and hypotheses instead of correlation and coincidence

Prof. Dr. Christoph Schlueter Langdon of the Telekom Data Intelligence Hub explains the three steps for success with AI, highlighting the importance of causality rather than correlation in an expert interview in Automotive IT. “Without a hypothesis of the relation between cause-and-effect, fishing expeditions have little use. […] Statistics only provide correlations, not causality. An example: Health and economic performance are positively correlated, but where should you invest the next Euro: in health or economic growth?” explains Schlueter Langdon. Learnings can be transferred into 3 crucial steps: Framing the problem, building causal models and hypotheses, and focus on the right data.

Step 1: Frame the problem as a question.

“At the start, it is important to condense a problem into a question, which you want to answer with the data analysis.” For this purpose, the data scientists need to gain insight into workflows before the start of the analysis in order to understand which step of a process should be optimized with AI. Only then can the desired result be defined, and the appropriate models developed. In most companies, this step fails because of silo thinking among departments and missing information for the data scientists.

Step 2: Answer the question with the causal model and hypotheses.

“Then it’s about further narrowing the focus by forming hypotheses grounded in theory, a so-called causal model. If this causal model cannot fit on a napkin, then you should not continue at all,” suggests Schlueter Langdon. “Only by defining the events relevant to the process step to be optimized can the correct data be extracted from the quantity of data and the problems with data management defied. For more insights on data management issues, see “Data is broken” (link).

Step 3: Avoid Garbage In, Garbage Out with the right data.

“Only then can the right data be identified, refined, and finally analyzed. Another core principle with AI: All information required to answer the question must be included in the data, otherwise there is the risk of Garbage In, Garbage Out (GIGO), an effect further described in “Creating data pools for AI” (link). “No raw iron without iron ore in the rock: Same with data – one has to ensure that it contains the information required to solve a problem,” the data science expert explains.
One example is deep learning: “Especially with Neural Networks, the quality of results depends almost entirely on the quality of the training data,” clarifies the data science expert. For example, in so-called Convolutional Neural Networks (CNNs), the labeling quality directly determines the accuracy of image recognition results. “The description of the training data has to be very granular for each object,” the expert notes.

This article is based on a longer piece in Automotive IT (2019-05): link