DEEP LEARNING ANALYSIS OF DIGITAL VIDEOSTROBOSCOPY FOR VOCAL FOLDS DYNAMICS CHARACTERIZATION

Fecha

2024-07-26

Título de la revista

ISSN de la revista

Título del volumen

Editor

Universidad Rey Juan Carlos

Enlace externo

Resumen

This research project focuses on using deep learning techniques, implemented in Python, to analyze digital videostroboscopy recordings for the characterization of vocal fold dynamics across various pathological conditions. Collaborating with the Otorhinolaryngology Department at the Hospital Universitario de Fuenlabrada, videos depicting conditions such as vocal fold atrophy or paralysis, cysts, sulcus, leukoplakia, polyps, and normal cases are obtained. The data was collected using the digital videostroboscopy technique. To address the challenge of automated segmentation of vocal folds within the videos, the OpenCV library in Python is employed for video preprocessing and initial segmentation of vocal structures. OpenCV is used to perform essential tasks such as frame extraction, noise reduction, and enhancement of the vocal fold regions to prepare the videos for further analysis. This preprocessing step ensures that the following stages of the project receive high-quality inputs for accurate feature extraction. Subsequently, the DeepLabCut project is used for further analysis of the segmented vocal folds. DeepLabCut, a tool for markerless pose estimation, is used to predict the positions of key anatomical landmarks on the vocal folds. These markers provide critical data points that describe the motion and position of the vocal folds across the video frames. The precise prediction of these markers is really important for the extraction of relevant features needed for the classification task. Following the extraction of markers, a series of preprocessing steps are performed using custom Python code. These steps include normalization, rotation, and scaling of this data to ensure comparability between different patients and recording conditions. From these processed coordinates, features are extracted that describe the horizontal and vertical movements of the vocal folds, capturing the dynamic behavior indicative of various pathologies. These features are then fed into several machine learning models to predict whether a given video represents a pathological condition. Various models are explored, including K-Nearest Neighbors (KNN), Decision Trees, Random Forest, Boosting Trees and Multilayer Perceptron (MLP). Hyperparameter tuning and cross-validation are employed to optimize the performance of these models. The models are evaluated using metrics such as accuracy, sensitivity, specificity, positive predictive value, and F1 score, with and without data oversampling to address the class imbalance that is present in the dataset. The results indicate that simpler models like KNN perform well on the small dataset available, whereas more complex models show improved performance with oversampled data. Specifically, the Random Forest model using oversampled data emerges as the best performer, balancing high explainability and predictive accuracy. This research contributes to the field of medical image analysis and offers a potential diagnostic tool for clinicians in the otorhinolaryngology department. The outcomes of this study hold promise for enhancing the efficiency and accuracy of vocal fold pathology diagnosis through the application of state-of-the-art deep learning methodologies to digital videostroboscopy data. By automating the detection and characterization of vocal fold pathologies, this project aims to support clinicians in making faster and more accurate diagnoses, therefore improving patient care.

Descripción

Trabajo Fin de Grado leído en la Universidad Rey Juan Carlos en el curso académico 2023/2024. Directores/as: Antonio José Caamaño Fernández

Citación