direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Paper Accepted for VLDB 2020

The paper 

"Baran: Effective Error Correction via a Unified Context Representation and Transfer Learning" 

by Mohammad Mahdavi and Ziawasch Abedjan was accepted for VLDB 2020. 

Here is the abstract:

"Traditional data correction solutions leverage handmaid rules or master data to find the correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to additionally learn corrections from a limited number of example repairs. To effectively generalize example repairs, it is necessary to capture the entire context of each erroneous value. A context comprises the value itself, the co-occurring values inside the same tuple, and all values that define the attribute type. Typically, a corrector based on any of these context information undergoes an individual process of operations that is not always easy to integrate with other types of correctors.

In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple corrector models that can be updated in the same way and can also be pretrained. Because of the holistic nature of our approach, we can generate more correction candidates than state of the art, and because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall performance in terms of precision and recall. 

In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples."

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions