Master Data Maintenance as a Basis for Sound Data Analysis

Presumably, no one has the idea to clean up their master data just for the sake of master data itself. There is probably an underlying motive: Master data contains important meta-information that is needed to generate meaningful business intelligence reports or—as we often see—to produce reliable forecasts, e.g., for the demand for items to be produced. It is only when you want to use the master data that you notice if something is off and how crucial a consolidated master dataset would be. But does it have to come to that?

It doesn’t: You can also proactively maintain your master data to make it ready and immediately usable for upcoming applications. And this can be achieved with the help of data analytics and artificial intelligence.

How do you prepare master data for analysis?

High data quality is the basic prerequisite for generating value from data. As soon as machine learning is used, good data quality is also the key factor in learning meaningful patterns from the data and producing usable results. How can inconsistencies in master data be identified and corrected?

Solution: detect and automatically correct anomalies

We advise on consolidating your existing data inventory and highlight inconsistencies in the data. With the help of data analytics methods, anomalies in the data are detected and correction suggestions are made and submitted to the data owners. The data model can be questioned and, if necessary, revised regarding its suitability for analytical use cases. In this way, a unified, complete, and consolidated dataset is created.

Benefits: data quality turns data into valuable assets

What can be achieved as a result?

  • Inconsistencies are corrected,
  • inactive items are removed,
  • newly appearing items are added,
  • implausible entries and outliers are identified,
  • irregularities are detected,
  • anomalies in the temporal structure are identified,
  • variations in spelling are unified.

Example: Data-driven support in identifying anomalies in the data

The following types of anomalies are identified and corrected if needed:

Outliers:

  • An unusually high value in a column: 100.123 instead of 100,123 (e.g., due to booking or input errors during entry)
  • Missing values

Irregularities in otherwise valid correlations:

  • If there is an entry in column A, there is usually also an entry in column D, with only a few exceptions.
  • The values in column B are usually twice as high as those in column C, with only a few exceptions.

Anomalies in the temporal structure:

  • For customer A, no entry is recorded in December 2018.
  • Unusually low demand from customers in a specific month due to a sudden switch from monthly to weekly basis.
You are about to leave our website via an external link. Please note that the content of the linked page is beyond our control.