direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Es gibt keine deutsche Übersetzung dieser Webseite.

Paper accepted for SSDBM 2018

Our research paper "Meta-data Driven Error Detection by Larysa Visengeriyeva and Ziawasch Abedjan was accepted for the International Conference on Scientific and Statistical Database Management 2018 in Bolzano Italy. 

Abstract:

Scientific data often originates from multiple sources and human agents. The integration of data from different sources must also resolve data quality problems that might occur because of inconsistency or different quality assurance levels of the sources. To identify various data quality problems in a dataset, it is necessary to use several error detection methods. Existing error detection solutions are usually tailored towards one specific type of data errors, such as rule violations or outliers, requiring the application of multiple strategies. Using all possible error detection methods is also not satisfying, as some systems might perform poorly on a particular dataset by producing a large number of false positives and missing some results. However, it is not trivial to assess the effectivity of each strategy upfront. We propose two new holistic approaches for effectively combining off-the-shelf error detection systems. Our approaches are learning-based and incorporate metadata extracted from the dataset at hand. We empirically show, using four real-world datasets, that our method to combine error-detecting strategies achieves an average F-1 score 15% higher than multiple heuristics-based baselines.

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe