TY - GEN
T1 - Improving LLM Attributions with Randomized Path-Integration
AU - Barkan, Oren
AU - Elisha, Yehonatan
AU - Toib, Yonatan
AU - Weill, Jonathan
AU - Koenigstein, Noam
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - We present Randomized Path-Integration (RPI)-a path-integration method for explaining language models via randomization of the integration path over the attention information in the model.RPI employs integration on internal attention scores and their gradients along a randomized path, which is dynamically established between a baseline representation and the attention scores of the model.The inherent randomness in the integration path originates from modeling the baseline representation as a randomly drawn tensor from a Gaussian diffusion process.As a consequence, RPI generates diverse baselines, yielding a set of candidate attribution maps.This set facilitates the selection of the most effective attribution map based on the specific metric at hand.We present an extensive evaluation, encompassing 11 explanation methods and 5 language models, including the Llama2 and Mistral models.Our results demonstrate that RPI outperforms latest state-of-the-art methods across 4 datasets and 5 evaluation metrics.Our code is available at: https://github.com/rpiconf/rpi.
AB - We present Randomized Path-Integration (RPI)-a path-integration method for explaining language models via randomization of the integration path over the attention information in the model.RPI employs integration on internal attention scores and their gradients along a randomized path, which is dynamically established between a baseline representation and the attention scores of the model.The inherent randomness in the integration path originates from modeling the baseline representation as a randomly drawn tensor from a Gaussian diffusion process.As a consequence, RPI generates diverse baselines, yielding a set of candidate attribution maps.This set facilitates the selection of the most effective attribution map based on the specific metric at hand.We present an extensive evaluation, encompassing 11 explanation methods and 5 language models, including the Llama2 and Mistral models.Our results demonstrate that RPI outperforms latest state-of-the-art methods across 4 datasets and 5 evaluation metrics.Our code is available at: https://github.com/rpiconf/rpi.
UR - http://www.scopus.com/inward/record.url?scp=85214723975&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85214723975
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
SP - 9430
EP - 9446
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Y2 - 12 November 2024 through 16 November 2024
ER -