Abstract
Machine Learning (ML) models are widely used in decision making
procedures in finance, medicine, education, etc. In these areas, ML outcomes can
directly affect humans, e.g. by deciding whether a person should get a loan or be
released from prison. Therefore, we cannot blindly rely on black box ML models
and need to explain the decisions made by them. This motivated the development
of a variety of ML-explainer systems, including LIME and its successor
ANCHOR. Due to the heuristic nature of explanations produced by existing tools,
it is necessary to validate them. We propose a SAT-based method to assess the
quality of explanations produced by ANCHOR. We encode a trained ML model
and an explanation for a given prediction as a propositional formula. Then, by
using a state-of-the-art approximate model counter, we estimate the quality of the
provided explanation as the number of solutions supporting it.