direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Paper Accepted for LWDA 2019

The paper 

"Towards Automated Data Cleaning Workflows" 

by Mohammad Mahdavi, Felix Neutatz, Larysa Visengeriyeva and Ziawasch Abedjan was accepted for LWDA 2019. 

Here is the abstract:

"The success of AI-based technologies depends crucially on trustful and clean data. Research in data cleaning has provided a variety of approaches to address different data quality problems. Most of them require some prior knowledge about the dataset in order to select and configure the approach correctly. We argue that for unknown datasets, it is unrealistic to know the data quality problems upfront and to formulate all necessary quality constraints in one shot. Pragmatically, the user solves data quality problems by implementing an iterative cleaning process. This incremental approach poses the challenge of identifying the right sequence of cleaning routines and their configurations. In this paper, we highlight our work in progress towards building a cleaning workflow orchestrator that learns from cleaning tasks in the past and proposes promising cleaning workflows for a new dataset. To this end, we highlight new approaches for selecting the most promising error detection routines, aggregating their outputs, and explaining the final results."

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions