v1v2 (latest)

Analyzing the Structure of Attention in a Transformer Language Model

7 June 2019

Papers citing "Analyzing the Structure of Attention in a Transformer Language Model"

50 / 225 papers shown

Title
VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language TransformersComputer Vision and Pattern Recognition (CVPR), 2022 Estelle Aflalo Meng Du Shao-Yen Tseng Yongfei Liu Chenfei Wu Nan Duan Vasudev Lal 185 55 0 30 Mar 2022
Measuring the Mixing of Contextual Information in the TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Javier Ferrando Gerard I. Gállego Marta R. Costa-jussá 254 69 0 08 Mar 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source CodeInternational Conference on Software Engineering (ICSE), 2022 Yao Wan Wei Zhao Hongyu Zhang Yulei Sui Guandong Xu Hairong Jin 269 121 0 14 Feb 2022
Investigating Explainability of Generative AI for Code through Scenario-based DesignInternational Conference on Intelligent User Interfaces (IUI), 2022 Jiao Sun Q. V. Liao Michael J. Muller Mayank Agarwal Stephanie Houde Kartik Talamadupula Justin D. Weisz 160 203 0 10 Feb 2022
Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation Raymond Li Wen Xiao Linzi Xing Lanjun Wang Gabriel Murray Giuseppe Carenini ViT 199 9 0 10 Dec 2021
Exploiting a Zoo of Checkpoints for Unseen Tasks Jiaji Huang Qiang Qiu Kenneth Church 158 4 0 05 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review Xiaofei Sun Diyi Yang Xiaoya Li Tianwei Zhang Yuxian Meng Han Qiu Guoyin Wang Eduard H. Hovy Jiwei Li 183 52 0 20 Oct 2021
Improving Transformers with Probabilistic Attention Keys Tam Nguyen T. Nguyen Dung D. Le Duy Khuong Nguyen Viet-Anh Tran Richard G. Baraniuk Nhat Ho Stanley J. Osher 192 35 0 16 Oct 2021
MEDUSA: Multi-scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image AnalysisFrontiers in Medicine (Front. Med.), 2021 Hossein Aboutalebi Maya Pavlova Hayden Gunraj M. Shafiee A. Sabri Amer Alaref Alexander Wong 204 20 0 12 Oct 2021
How BPE Affects Memorization in Transformers Eugene Kharitonov Marco Baroni Dieuwke Hupkes 409 37 0 06 Oct 2021
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks Weicheng Ma Renze Lou Kai Zhang Lili Wang Soroush Vosoughi 138 8 0 13 Sep 2021
Attention-based Contrastive Learning for Winograd Schemas T. Klein Moin Nabi SSL 126 3 0 10 Sep 2021
Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks Weicheng Ma Kai Zhang Renze Lou Lili Wang Soroush Vosoughi 730 19 0 18 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field AttentionNeural Information Processing Systems (NeurIPS), 2021 T. Nguyen Vai Suliafu Stanley J. Osher Long Chen Bao Wang 125 38 0 05 Aug 2021
Multi-Stream Transformers Andrey Kravchenko Anna Rumshisky AI4CE 98 0 0 21 Jul 2021
Transformer-F: A Transformer network with effective methods for learning universal sentence representation Yu Shi 101 1 0 02 Jul 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation Srinadh Bhojanapalli Ayan Chakrabarti Himanshu Jain Sanjiv Kumar Michal Lukasik Andreas Veit 105 10 0 16 Jun 2021
Thinking Like TransformersInternational Conference on Machine Learning (ICML), 2021 Gail Weiss Yoav Goldberg Eran Yahav AI4CE 285 166 0 13 Jun 2021
FedNLP: An interpretable NLP System to Decode Federal Reserve CommunicationsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021 Jean Lee Hoyoul Luis Youn Nicholas Stevens Josiah Poon S. Han 111 10 0 11 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in TransformersFindings (Findings), 2021 Tianchu Ji Shraddhan Jain M. Ferdman Peter Milder H. Andrew Schwartz Niranjan Balasubramanian MQ 224 19 0 02 Jun 2021
Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?Findings (Findings), 2021 Min Namgung Laurent Besacier Vassilina Nikoulina D. Schwab MILM 128 9 0 31 May 2021
On the Interplay Between Fine-tuning and Composition in TransformersFindings (Findings), 2021 Lang-Chi Yu Allyson Ettinger 198 14 0 31 May 2021
Effective Attention Sheds Light On InterpretabilityFindings (Findings), 2021 Kaiser Sun Ana Marasović MILM 129 17 0 18 May 2021
FNet: Mixing Tokens with Fourier TransformsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021 James Lee-Thorp Joshua Ainslie Ilya Eckstein Santiago Ontanon 504 617 0 09 May 2021
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and AttentionWorkshop on Cognitive Modeling and Computational Linguistics (CMCL), 2021 S. Ryu Richard L. Lewis 163 33 0 26 Apr 2021
Morph Call: Probing Morphosyntactic Content of Multilingual Transformers Vladislav Mikhailov O. Serikov Ekaterina Artemova 204 10 0 26 Apr 2021
Knowledge Neurons in Pretrained TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 469 566 0 18 Apr 2021
Supervising Model Attention with Human Explanations for Robust Natural Language InferenceAAAI Conference on Artificial Intelligence (AAAI), 2021 Joe Stacey Yonatan Belinkov Marek Rei 219 51 0 16 Apr 2021
Pose Recognition with Cascade TransformersComputer Vision and Pattern Recognition (CVPR), 2021 Ke Li Shijie Wang Xiang Zhang Yifan Xu Weijian Xu Zhuowen Tu ViT 146 242 0 14 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021 Alana de Santana Correia Esther Luna Colombini HAI 308 249 0 31 Mar 2021
Rethinking Spatial Dimensions of Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021 Byeongho Heo Sangdoo Yun Dongyoon Han Sanghyuk Chun Junsuk Choe Seong Joon Oh ViT 1.2K 680 0 30 Mar 2021
LazyFormer: Self Attention with Lazy Update Chengxuan Ying Guolin Ke Di He Tie-Yan Liu 172 19 0 25 Feb 2021
To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph Sufeng Duan Hai Zhao MILM 272 0 0 16 Jan 2021
On-the-Fly Attention Modulation for Neural GenerationFindings (Findings), 2021 Yue Dong Chandra Bhagavatula Ximing Lu Jena D. Hwang Antoine Bosselut Jackie C.K. Cheung Yejin Choi 273 15 0 02 Jan 2021
On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification Zhengxuan Wu Desmond C. Ong 157 27 0 01 Jan 2021
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 579 1,112 0 29 Dec 2020
Understood in Translation, Transformers for Domain Understanding Dimitrios Christofidellis Matteo Manica L. Georgopoulos Hans Vandierendonck MedIm 101 1 0 18 Dec 2020
Mask-Align: Self-Supervised Neural Word AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Chi Chen Maosong Sun Yang Liu 133 34 0 13 Dec 2020
Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! Wen Xiao Patrick Huber Giuseppe Carenini 109 13 0 03 Dec 2020
Self-Explaining Structures Improve NLP Models Zijun Sun Chun Fan Qinghong Han Xiaofei Sun Yuxian Meng Leilei Gan Jiwei Li MILM XAI LRM FAtt 266 40 0 03 Dec 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural NetworksInternational Conference on Language Resources and Evaluation (LREC), 2020 Ileana Rugina Rumen Dangovski L. Jing Preslav Nakov Marin Soljacic 248 0 0 20 Nov 2020
Focus on the present: a regularization method for the ASR source-target attention layerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020 Nanxin Chen Piotr Żelasko Jesús Villalba Najim Dehak 172 3 0 02 Nov 2020
Influence Patterns for Explaining Information Flow in BERTNeural Information Processing Systems (NeurIPS), 2020 Kaiji Lu Zifan Wang Piotr (Peter) Mardziel Anupam Datta GNN 206 19 0 02 Nov 2020
Improving BERT Performance for Aspect-Based Sentiment Analysis Akbar Karimi L. Rossi Andrea Prati 195 63 0 22 Oct 2020
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling Wenxuan Zhou Kevin Huang Tengyu Ma Jing Huang 347 330 0 21 Oct 2020
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability Yuxian Meng Chun Fan Zijun Sun Eduard H. Hovy Leilei Gan Jiwei Li FAtt 242 10 0 14 Oct 2020
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension Ekta Sood Simon Tannert Diego Frassinelli Andreas Bulling Ngoc Thang Vu HAI 183 60 0 13 Oct 2020
Structured Self-Attention Weights Encode Semantics in Sentiment AnalysisBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2020 Zhengxuan Wu Thanh-Son Nguyen Desmond C. Ong MILM 141 22 0 10 Oct 2020
Assessing Phrasal Representation and Composition in Transformers Lang-Chi Yu Allyson Ettinger CoGe 229 70 0 08 Oct 2020
Linguistic Profiling of a Neural Language ModelInternational Conference on Computational Linguistics (COLING), 2020 Alessio Miaschi D. Brunato F. Dell’Orletta Giulia Venturi 229 49 0 05 Oct 2020