Label Hierarchy Inference in Property Graph Databases

Bachelor Thesis from the year 2020 in the subject Computer Science - Miscellaneous, grade: 1.1, University of Constance, language: English, abstract: A lot of data contains implicit hierarchical structures, e.g. type hierarchies. The property graph model - among others employed in some graph databases - provides no tools to capture those internally. In this thesis we derive such hierarchies automatically. First a survey is conducted to find the most promising approaches that cluster a data set hierarchically. In the next step various features and vectors thereof are experimented with to extend the methodology to graphs, capturing the structure as well as possible. We found that there is not one specific feature vector that works well for all data sets and forms of representation in a graph, but rather needs to be constructed adaptive, depending on the way data is modelled. Finally, some extensions of a specific algorithm that was used during experimentation - namely Cobweb - are discussed as well as the use case of cardinality estimation in property graph databases, leveraging the hierarchy as an associative multi-level histogram.