Show simple item record

Improving Medical Entity Recognition in Spanish by Means of Biomedical Language Models

dc.contributor.authorVillaplana, Aitana
dc.contributor.authorMartínez, Raquel
dc.contributor.authorMontalvo, Soto
dc.date.accessioned2024-07-11T07:06:08Z
dc.date.available2024-07-11T07:06:08Z
dc.date.issued2023-12-02
dc.identifier.citationVillaplana A, Martínez R, Montalvo S. Improving Medical Entity Recognition in Spanish by Means of Biomedical Language Models. Electronics. 2023; 12(23):4872.es
dc.identifier.issn2079-9292
dc.identifier.urihttps://hdl.handle.net/10115/37570
dc.description.abstractNamed Entity Recognition (NER) is an important task used to extract relevant information from biomedical texts. Recently, pre-trained language models have made great progress in this task, particularly in English language. However, the performance of pre-trained models in the Spanish biomedical domain has not been evaluated in an experimentation framework designed specifically for the task. We present an approach for named entity recognition in Spanish medical texts that makes use of pre-trained models from the Spanish biomedical domain. We also use data augmentation techniques to improve the identification of less frequent entities in the dataset. The domain-specific models have improved the recognition of name entities in the domain, beating all the systems that were evaluated in the eHealth-KD challenge 2021. Language models from the biomedical domain seem to be more effective in characterizing the specific terminology involved in this task of named entity recognition, where most entities correspond to the "concept" type involving a great number of medical concepts. Regarding data augmentation, only back translation has slightly improved the results. Clearly, the most frequent types of entities in the dataset are better identified. Although the domain-specific language models have outperformed most of the other models, the multilingual generalist model mBERT obtained competitive results.es
dc.language.isoenges
dc.rightsAttribution 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectbiomedical natural language processinges
dc.subjectSpanish biomedical entity recognitiones
dc.subjectpre-trained language modelses
dc.subjectdata augmentationes
dc.titleImproving Medical Entity Recognition in Spanish by Means of Biomedical Language Modelses
dc.typeinfo:eu-repo/semantics/articlees
dc.identifier.doi10.3390/electronics12234872es
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses


Files in this item

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 InternacionalExcept where otherwise noted, this item's license is described as Attribution 4.0 Internacional