Publikationen

Autor	Kumari, Kavita; Pegoraro, Alessandro; Fereidooni, Hossein; Sadeghi, Ahmad-Reza
Datum	2024
Art	Conference Proceedings
Abstrakt	Explanation methods analyze the features in backdoored input data that contribute to model misclassification. However, current methods like path techniques struggle to detect backdoor patterns in adversarial situations. They fail to grasp the hidden associations of backdoor features with other input features, leading to misclassification. Additionally, they suffer from irrelevant data attribution, imprecise feature connections, baseline dependence, and vulnerability to the "saturation effect". To address these limitations, we propose Xplain. Our method aims to uncover hidden backdoor trigger patterns and the subtle relationships between backdoor features and other input objects, which are the main causes of model misclassification. Our algorithm improves existing path techniques by integrating an additional baseline into the Integrated Gradients (IG) formulation. This ensures that features selected in the baseline persist along the integration path, guaranteeing baseline independence. Additionally, we introduce quantitative noise to interpolate samples along the integration path, which reduces feature dependency and captures non-linear interactions. This approach effectively identifies the relevant features that significantly influence model predictions. Furthermore, Xplain proposes sensitivity analysis to enhance AI system resilience against backdoor attacks. This uncovers clear connections between the backdoor and other input data features, thus shedding light on relevant interactions. We thoroughly test the effectiveness of Xplain on the Imagenet and the multimodal domain of the Visual Question Answering dataset, showing its superiority over current path methods such as Integrated Gradient (IG), left-IG, Guided IG, and Adversarial Gradient Integration (AGI) techniques
Konferenz	33rd USENIX Security Symposium
ISBN	978-1-939133-44-1
In	Proceedings of the 33rd USENIX Security Symposium, p.2937-2953
Publisher	USENIX Association
Url	https://tubiblio.ulb.tu-darmstadt.de/id/eprint/152715