TU Berlin

Big Data Management GroupPaper Accepted for SSDBM 2019


Page Content

to Navigation

Paper Accepted for SSDBM 2019

The paper 

"REDS: Estimating the Performance of Error Detection Strategies Based on Dirtiness Profiles" 

by Mohammad Mahdavi and Ziawasch Abedjan was accepted for SSDBM 2019. 

Here is the abstract:

Datasets usually suffer from various data quality problems or data errors. At the same time, there are various error detection strategies to detect different kinds of data errors. To effectively detect the data errors, the user has to deploy and test multiple error detection strategies. However, evaluating each error detection strategy on a new dataset requires tedious manual evaluation efforts. Therefore, estimating the performance of each strategy upfront is desirable for a more effective strategy selection.

In this paper, we propose a new approach to estimate the performance of error detection strategies. Our intuition is that error detection strategies will perform similarly on similarly dirty datasets. We introduce the novel concept of dirtiness profiles, which make datasets comparable with respect to their dirtiness.  

Our experiments show that our system REDS accurately estimates the performance of error detection strategies and, solely based on automatically extracted features, outperforms the semi-supervised baseline. 


Quick Access

Schnellnavigation zur Seite über Nummerneingabe