Inhalt
zur Navigation
Es gibt keine deutsche Übersetzung dieser Webseite.
CLRL: Feature Engineering for Cross-Language Record Linkage
Record linkage aims at identifying duplicate records across datasets. Most existing record linkage techniques have been designed for monolingual datasets.
In this project, we propose a novel approach, CLRL, that links the records in a cross-language setting, where each input dataset is in a different language. CLRL combines monolingual similarity measures with multilingual cross-language word embedding similarities to identify the correspondence of records across datasets. As our experiments show, CLRL outperforms baseline approaches in cross-language data integration settings.