Inhalt des Dokuments
------ Links: ------
Example-Driven Error Detection
Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data. However, we can also formulate error detection as a semi-supervised classification problem that only requires domain expertise. The challenges for such an approach are twofold: (1) to represent the data in a way that enables a classification model to identify various kinds of data errors across different data types, and (2) to pick the most promising data values for learning.
We developed an active learning-based system called ED2 that achieves state-of-the-art error detection accuracy without any configuration while requiring only a small fraction of user labels: github.com/BigDaMa/ExampleDrivenErrorDetection .