Inhalt des Dokuments
Paper Accepted for VLDB 2020
by Mohammad Mahdavi and Ziawasch Abedjan was accepted for VLDB 2020.
Here is the abstract:
In this paper, we present a new error correction system, Baran, which provides a unifying abstraction for integrating multiple corrector models that can be updated in the same way and can also be pretrained. Because of the holistic nature of our approach, we can generate more correction candidates than state of the art, and because of the underlying context-aware data representation, we achieve high precision. We show that, by pretraining our models based on Wikipedia revisions, our system can further improve its overall performance in terms of precision and recall.
In our experiments, Baran significantly outperforms state-of-the-art error correction systems in terms of effectiveness and human involvement requiring only 20 labeled tuples. "