direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

REDS: Estimating the Performance of Error Detection Strategies Based on Dirtiness Profiles

Lupe


Datasets usually suffer from various data quality problems or data errors. At the same time, there are various error detection strategies to detect different kinds of data errors. To effectively detect the data errors, the user has to deploy and test multiple error detection strategies. However, evaluating each error detection strategy on the new dataset requires tedious human evaluation efforts. Therefore, estimating the performance of each strategy upfront is desirable for a more effective strategy selection.

In this project, we propose a new approach to estimate the performance of error detection strategies. The intuition is that error detection strategies will perform similarly on similarly dirty datasets. Therefore, we introduce the novel concept of dirtiness profiles, which make datasets comparable with respect to their dirtiness. Based on the similarity of dirtiness profiles, we estimate the expected performance of the available error detection strategies on the new dataset. 

Check out the project repository and contact the author.

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe