Unsupervised Named Entity Recognition and Disambiguation: An Application to Old French Journals

Abstract : In this paper we introduce our method of Unsupervised Named Entity Recognition and Disambiguation (UNERD) that we test on a recently digitized unlabeled corpus of French journals comprising 260 issues from the 19th century. Our study focuses on detecting person, location, and organization names in text. Our original method uses a French entity knowledge base along with a statistical contextual disambiguation approach. We show that our method outperforms supervised approaches when trained on small amounts of annotated data, since manual data annotation is very expensive and time consuming, especially in foreign languages and specific domains.
Liste complète des métadonnées

https://hal-auf.archives-ouvertes.fr/hal-01082963
Contributor : Alaa Abi Haidar <>
Submitted on : Friday, November 14, 2014 - 4:48:33 PM
Last modification on : Thursday, March 21, 2019 - 2:41:45 PM

Identifiers

Citation

Yusra Mosallam, Alaa Abi Haidar, Jean-Gabriel Ganascia. Unsupervised Named Entity Recognition and Disambiguation: An Application to Old French Journals. ICDM 2014 - 14th Industrial Conference on Data Mining, Jul 2014, St. Petersburg, Russia. pp.12-23, ⟨10.1007/978-3-319-08976-8_2⟩. ⟨hal-01082963⟩

Share

Metrics

Record views

300