Unsupervised Named Entity Recognition and Disambiguation: An Application to Old French Journals

Abstract : In this paper we introduce our method of Unsupervised Named Entity Recognition and Disambiguation (UNERD) that we test on a recently digitized unlabeled corpus of French journals comprising 260 issues from the 19th century. Our study focuses on detecting person, location, and organization names in text. Our original method uses a French entity knowledge base along with a statistical contextual disambiguation approach. We show that our method outperforms supervised approaches when trained on small amounts of annotated data, since manual data annotation is very expensive and time consuming, especially in foreign languages and specific domains.
Type de document :
Communication dans un congrès
ICDM 2014 - 14th Industrial Conference on Data Mining, Jul 2014, St. Petersburg, Russia. Springer, Advances in Data Mining. Applications and Theoretical Aspects, 8557, pp.12-23, 2014, Lecture Notes in Computer Science. 〈10.1007/978-3-319-08976-8_2〉
Liste complète des métadonnées

https://hal-auf.archives-ouvertes.fr/hal-01082963
Contributeur : Alaa Abi Haidar <>
Soumis le : vendredi 14 novembre 2014 - 16:48:33
Dernière modification le : mercredi 21 mars 2018 - 18:58:09

Identifiants

Collections

Citation

Yusra Mosallam, Alaa Abi Haidar, Jean-Gabriel Ganascia. Unsupervised Named Entity Recognition and Disambiguation: An Application to Old French Journals. ICDM 2014 - 14th Industrial Conference on Data Mining, Jul 2014, St. Petersburg, Russia. Springer, Advances in Data Mining. Applications and Theoretical Aspects, 8557, pp.12-23, 2014, Lecture Notes in Computer Science. 〈10.1007/978-3-319-08976-8_2〉. 〈hal-01082963〉

Partager

Métriques

Consultations de la notice

230