Explanation Sets: A framework for Machine Learning explicability
Abstract
The term Machine Learning (ML) was coined by Arthur Samuel in 1959. Since then, more than sixty years have passed, and ML has evolved enormously, especially in the last decade. From the early days of ML, when it was primarily a research topic, to today, when we interact with ML systems on a daily basis, often without even realizing it, we have come a long way. Although the explainability of these ML systems has been considered since their inception, it has become more important than ever due to their integration into our daily lives. Explainable ML addresses this issue, aiming to make predictive models and their decisions understandable to humans. There are several Explainable ML techniques, each with its own goals and scopes. For example, the scope of a technique can be either global, addressing the entire model, or local, focusing on a specific region of interest. While the choice of the technique depends on several factors, the main driving factor is the user, specifically their cognitive biases and what they expect from the system. These preferences and the different types of explanations have been extensively studied in the social sciences. Among these techniques, we emphasize counterfactuals and semifactuals, which have also been incorporated into Explainable ML. They are a contrastive explanation where the user reasons about the differences between the observation of interest and a hypothetical observation that led to the same prediction (semifactual) or a different prediction (counterfactuals). However, within the context of ML, they face some limitations. Both are mainly defined in a classification context and lack a standardized mechanism to enforce user preferences. Counterfactuals typically rely on a single observation, whereas semifactuals do not have a general definition and are associated with different terms. This thesis introduces the Explanation Set framework to address these limitations. The Explanation Sets framework is an approach that unifies counterfactuals and semifactuals through similarity measures and provides users with mechanisms to specify their preferences via a feasible set. Besides providing a unified framework, the definitions based on similarity measures enable the seamless extension of counterfactuals and semifactuals to other tasks, like regressions, by using appropriate similarities. A review of how various techniques from the literature fit this framework is incorporated. The proposed approach was successfully validated in regression and classification tasks, showing how different feasible sets and similarity measures produce different explanations. We also introduce two methods to extract Explanation Sets: Anchor_ES and Random Forest Optimal Counterfactual Set Extractor (RF-OCSE). Anchor_ES expands upon the Anchor method, allowing for user-defined similarity measures and including a feasible set. On the other hand, RF-OCSE is a method to extract counterfactual Explanation Sets from a Random Forest (RF). It involves a partial fusion of Decision Trees (DTs) from a RF into a single DT using a modification of the Classification and Regression Trees (CART) algorithm. The proposed extraction methods were validated through several experiments against existing alternatives on several well-known datasets. The evaluation metrics measure aspects correlated with the quality of the explanations, including the percentage of valid counterfactuals, distance to the factual sample, method stability, and counterfactual set quality. RF-OCSE was the only method supporting set explanations that always yielded valid explanations and took, on average, significantly less time than the alternatives. Conversely, Anchor_ES obtained a good compromise between the fidelity and the coverage, and it emerges as a viable alternative, especially when full access to the model is not possible. In conclusion, we introduce a novel explainability framework that empowers users to tailor explanations to their preferences. Explanation Sets pave the way for incorporating new preferences not currently recognized in the literature in a unified and standardized manner. This simplifies their eventual incorporation into extraction methods. Regarding the extraction methods, we noticed a significant disparity in quality between methods that utilize the internal structure of the model and those that use models as black-boxes, motivating the benefits of the former approach when possible.
Description
Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2023. Directores: Isaac Martín de Diego, Javier Martínez Moguerza
Collections
- Tesis Doctorales [1490]