TU Berlin

Fachgebiet Big Data ManagementCLRL: Feature Engineering for Cross-Language Record Linkage

isti-logo

Inhalt

zur Navigation

Es gibt keine deutsche Übersetzung dieser Webseite.

CLRL: Feature Engineering for Cross-Language Record Linkage

Lupe

Record linkage aims at identifying duplicate records across datasets. Most existing record linkage techniques have been designed for monolingual datasets.

In this project, we propose a novel approach, CLRL, that links the records in a cross-language setting, where each input dataset is in a different language. CLRL combines monolingual similarity measures with multilingual cross-language word embedding similarities to identify the correspondence of records across datasets. As our experiments show, CLRL outperforms baseline approaches in cross-language data integration settings.

Check out the project repository and contact the author.

Navigation

Direktzugang

Schnellnavigation zur Seite über Nummerneingabe