Abstract
Background: Licenses are a fundamental element of free, open source software (FOSS), since they express the permissions granted to those receiving the software. Therefore, identification of licenses in source code, and their analysis, is crucial to understand the legal implications of using and distributing FOSS. This has not been ignored by many researchers, who have devoted attention to the topic of license identification and analysis.
Goal: To learn how researchers have identified and classified licenses in FOSS, including which techniques and tools they have used. We were also interested in the evolution of these techniques and tools over time, and the public datasets available in this realm.
Method: We conducted a Systematic Literature Review, which resulted in 50 scientific publications which we analyzed.
Results: We observed that most studies focus on the use or development of specific tools. However, there is a recurring concern about the need to improve these tools, and the techniques they use. Studies presented (and therefore, tools and techniques presented) are usually empirically validated. With respect to techniques, we found that the use of machine-learning techniques is still relatively scarce, with most papers presenting studies based on pattern matching and similar techniques. It is also interesting that reuse of tools is relatively high, and that many of these tools remain available. However, benchmarking studies highlight some specific tools, which, perhaps for that reason, are becoming more common in publications. The availability of datasets oriented towards license identification is limited, but very large datasets have been published during the last years.
Conclusions: Data scarcity and a reliance on existing tools pose significant challenges for this research area. The relatively low use of machine learning techniques, and the scarcity of studies related to the classification of license texts open interesting opportunities for research, which is facilitated by the recent availability of large datasets. Additionally, researchers can also benefit from readily available tools for tasks like comparison and benchmarking.
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
URL external
Date
Description
Citation
Montes-Leon, S., Robles, G., & Gonzalez-Barahona, J. M. (2026). Identification and classification of free, open source software licenses: A systematic literature review. Journal of Systems and Software, 231
Collections
Endorsement
Review
Supplemented By
Referenced By
Document viewer
Select a file to preview:
Reload



