< Methods

Data quality check (ML)

lab

Why?

Ensure that the data you are using is of sufficient quality to base further conclusions on.

How?

Come up with good test cases for your data. Preferably automate those test cases into test scripts. Keep updating the test cases to account for bugs found in the data.

Ingredients

  • Understanding of the data (e.g. through Exploratory data analysis)
  • A domain expert to answers questions about the data
  • A disciplined mindset to cover all important cases
  • A critical eye on the validity of your data and your conclusions

In practice

Before you can use a data set in further analyses it is important that you detect incomplete, incorrect, inaccurate, or irrelevant parts of the data. Equivalent to testing code, you also need to test the data and be aware that errors in your conclusions could also stem from errors in the data. Possibly, this may lead to a need for more data, or a conclusion that your research question cannot be answered using the data or that your intended software solution may not meet its requirements.

Phase(s) of use

In the following project phase(s) data quality check (ml) can be used:

  • Machine learning