Skip to Main content Skip to Navigation
Theses

Graph-based contributions to machine-learning

Abstract : A graph is a mathematical object that makes it possible to represent relationships (called edges) between entities (called nodes). Graphs have long been a focal point in a number of problems ranging from work by Euler to PageRank and shortest-path problems. In more recent times, graphs have been used for machine learning.With the advent of social networks and the world-wide web, more and more datasets can be represented using graphs. Those graphs are ever bigger, sometimes with billions of edges and billions of nodes. Designing efficient algorithms for analyzing those datasets has thus proven necessary. This thesis reviews the state of the art and introduces new algorithms for the clustering and the embedding of the nodes of massive graphs. Furthermore, in order to facilitate the handling of large graphs and to apply the techniques under study, we introduce Scikit-network, a free and open-source Python library which was developed during the thesis. Many tasks, such as the classification or the ranking of the nodes using centrality measures, can be carried out thanks to Scikit-network.We also tackle the problem of labeling data. Supervised machine learning techniques require labeled data to be trained. The quality of this labeled data has a heavy influence on the quality of the predictions of those techniques once trained. However, building this data cannot be achieved through the sole use of machines and requires human intervention. We study the data labeling problem in a graph-based setting, and we aim at describing the solutions that require as little human intervention as possible. We characterize those solutions and illustrate how they can be applied in real use-cases.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03634148
Contributor : ABES STAR :  Contact
Submitted on : Thursday, April 7, 2022 - 2:29:10 PM
Last modification on : Friday, April 8, 2022 - 3:06:04 AM

File

109284_LUTZ_2022_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03634148, version 1

Collections

Citation

Quentin Lutz. Graph-based contributions to machine-learning. Data Structures and Algorithms [cs.DS]. Institut Polytechnique de Paris, 2022. English. ⟨NNT : 2022IPPAT010⟩. ⟨tel-03634148⟩

Share

Metrics

Record views

95

Files downloads

65