direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Es gibt keine deutsche Übersetzung dieser Webseite.

CLRL: Feature Engineering for Cross-Language Record Linkage


Record linkage aims at identifying duplicate records across datasets. Most existing record linkage techniques have been designed for monolingual datasets.

In this project, we propose a novel approach, CLRL, that links the records in a cross-language setting, where each input dataset is in a different language. CLRL combines monolingual similarity measures with multilingual cross-language word embedding similarities to identify the correspondence of records across datasets. As our experiments show, CLRL outperforms baseline approaches in cross-language data integration settings.

Check out the project repository and contact the author.

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe