A reflection on the impact of model mining from GitHub
Fecha
2023
Título de la revista
ISSN de la revista
Título del volumen
Editor
Elsevier
Resumen
Context: Since 1998, the ACM/IEEE 25th International Conference on Model Driven Engineering Languages
and Systems (MODELS) has been studying all aspects surrounding modeling in software engineering, from
languages and methods to tools and applications. In order to enable empirical studies, the MODELS community
developed a need for having examples of models, especially of models used in real software development
projects. Such models may be used for a range of purposes, but mostly related to domain analysis and software
design (at various levels of abstraction). However, finding such models was very difficult. The most used ones
had their origin in academic books or student projects, which addressed ‘‘artificial’’ applications, i.e., were
not base on real-case scenarios. To address this issue, the authors of this reflection paper, members of the
modeling and of the mining software repositories fields, came together with the aim of creating a dataset
with an abundance of modeling projects by mining GitHub. As a scoping of our effort we targeted models
represented using the UML notation because this is the lingua franca in practice for software modeling. As a
result, almost 100k models from 22k projects were made publicly available, known as the Lindholmen dataset.
Objective: In this paper, we analyze the impact of our research, and compare this to what we envisioned in
2016. We draw practical lessons gained from this effort, reflect on the perils and pitfalls of the dataset, and
point out promising avenues of research.
Method: We base our reflection on the systematic analysis of recent research literature, and especially those
papers citing our dataset and its associated publications.
Results: What we envisioned in the original research when making the dataset available has to a major extent
not come true; however, fellow researchers have found alternative uses of the dataset.
Conclusions: By understanding the possibilities and shortcomings of the current dataset, we aim to offer the
research community i) future research avenues of how the data can be used; and ii) raise awareness of the
limitations, not only to point out threats to validity of research, but also to encourage fellow researchers to
find ideas to overcome them. Our reflections can also be helpful to researchers who want to perform similar
mining efforts.
Descripción
The work of G. Robles has been supported in part by the Spanish Ministry of Science and Innovation (PID2022-139551NB-I0).
Palabras clave
Citación
Gregorio Robles, Michel R.V. Chaudron, Rodi Jolak, Regina Hebig, A reflection on the impact of model mining from GitHub, Information and Software Technology, Volume 164, 2023, 107317, ISSN 0950-5849, https://doi.org/10.1016/j.infsof.2023.107317
Colecciones
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional