v1v2 (latest)

Analyzing the Structure of Attention in a Transformer Language Model

7 June 2019

Papers citing "Analyzing the Structure of Attention in a Transformer Language Model"

50 / 226 papers shown

Title
Uncovering hidden geometry in Transformers via disentangling position and context Jiajun Song Yiqiao Zhong 224 13 0 07 Oct 2023
One Wide Feedforward is All You NeedConference on Machine Translation (WMT), 2023 Telmo Pires António V. Lopes Yannick Assogba Hendra Setiawan 204 18 0 04 Sep 2023
Transforming the Output of Generative Pre-trained Transformer: The Influence of the PGI Framework on Attention Dynamics Aline Ioste 94 1 0 25 Aug 2023
Robustifying Point Cloud Networks by RefocusingInternational Conference on 3D Vision (3DV), 2023 Meir Yossef Levi Guy Gilboa 3DPC 353 5 0 10 Aug 2023
ALens: An Adaptive Domain-Oriented Abstract Writing Training Tool for Novice Researchers Chen Cheng Ziang Li Zhenhui Peng Quan Li 205 1 0 08 Aug 2023
AI for the Generation and Testing of Ideas Towards an AI Supported Knowledge Development Environment T. Selker 35 3 0 17 Jul 2023
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language ModelsIEEE Transactions on Software Engineering (TSE), 2023 Yuheng Huang Yuheng Huang Zhijie Wang Shengming Zhao Huaming Chen Felix Juefei-Xu Lei Ma 276 34 0 16 Jul 2023
Multi-modal Graph Learning over UMLS Knowledge Graphs Manuel Burger Gunnar Rätsch Rita Kuznetsova 168 6 0 10 Jul 2023
Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation Massimiliano Patacchiola Mingfei Sun Katja Hofmann Richard Turner OffRL 188 1 0 23 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models Ritwik Sinha Zhao Song Wanrong Zhu 225 28 0 04 Jun 2023
Transforming ECG Diagnosis:An In-depth Review of Transformer-based DeepLearning Models in Cardiovascular Disease Detection Zibin Zhao MedIm 119 19 0 02 Jun 2023
Incorporating Distributions of Discourse Structure for Long Document Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Dongqi Pu Yifa Wang Vera Demberg 212 27 0 26 May 2023
End-to-End Simultaneous Speech Translation with Differentiable SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Shaolei Zhang Yang Feng 202 26 0 25 May 2023
VISIT: Visualizing and Interpreting the Semantic Information Flow of TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Shahar Katz Yonatan Belinkov 156 36 0 22 May 2023
AttentionViz: A Global View of Transformer AttentionIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023 Catherine Yeh Yida Chen Aoyu Wu Cynthia Chen Fernanda Viégas Martin Wattenberg ViT 271 87 0 04 May 2023
Towards autonomous system: flexible modular production system enhanced with large language model agentsIEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 2023 Yuchen Xia Manthan Shenoy N. Jazdi M. Weyrich LLMAG AI4CE 317 85 0 28 Apr 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax RegressionNeural Information Processing Systems (NeurIPS), 2023 Shuai Li Zhao Song Yu Xia Tong Yu Wanrong Zhu 172 49 0 26 Apr 2023
State Spaces Aren't Enough: Machine Translation Needs AttentionEuropean Association for Machine Translation Conferences/Workshops (EAMT), 2023 Ali Vardasbi Telmo Pires Robin M. Schmidt Stephan Peitz 131 13 0 25 Apr 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language ModelNeural Information Processing Systems (NeurIPS), 2023 Hao Fei Shengqiong Wu Jingye Li Bobo Li Fei Li Libo Qin Meishan Zhang Hao Fei Tat-Seng Chua 223 103 0 13 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder Z. Fu W. Lam Qian Yu Anthony Man-Cho So Shengding Hu Zhiyuan Liu Nigel Collier AuLLM 143 58 0 08 Apr 2023
PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models Aditi Mishra Utkarsh Soni Anjana Arunkumar Jinbin Huang Bum Chul Kwon Chris Bryan LRM 191 42 0 04 Apr 2023
Language Model Behavior: A Comprehensive SurveyInternational Conference on Computational Logic (ICCL), 2023 Tyler A. Chang Benjamin Bergen VLM LRM LM&MA 328 137 0 20 Mar 2023
Attention-likelihood relationship in transformers Valeria Ruscio Valentino Maiorca Fabrizio Silvestri 58 2 0 15 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic UnderstandingInternational Conference on Machine Learning (ICML), 2023 Yuchen Li Yuan-Fang Li Andrej Risteski 357 79 0 07 Mar 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey Soniya Vijayakumar AI4CE 146 4 0 22 Jan 2023
Dissociating language and thought in large language models Kyle Mahowald Anna A. Ivanova I. Blank Nancy Kanwisher J. Tenenbaum Evelina Fedorenko ELM ReLM 276 228 0 16 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less AttentionInternational Conference on Learning Representations (ICLR), 2023 Shashanka Venkataramanan Amir Ghodrati Yuki M. Asano Fatih Porikli A. Habibian ViT 216 37 0 05 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Tianxing He Jingyu Zhang Tianle Wang Sachin Kumar Dong Wang James R. Glass Yulia Tsvetkov 335 58 0 20 Dec 2022
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion ScaleAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Hritik Bansal Karthik Gopalakrishnan Saket Dingliwal S. Bodapati Katrin Kirchhoff Dan Roth LRM 228 63 0 18 Dec 2022
Attention as a Guide for Simultaneous Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Sara Papi Matteo Negri Marco Turchi 193 39 0 15 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and RoutingAAAI Conference on Artificial Intelligence (AAAI), 2022 Conglong Li Z. Yao Xiaoxia Wu Minjia Zhang Connor Holmes Cheng Li Yuxiong He 311 37 0 07 Dec 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer Bumjin Park Jaesik Choi ViT 120 1 0 18 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers Z. Yao Xiaoxia Wu Conglong Li Connor Holmes Minjia Zhang Cheng-rong Li Yuxiong He 166 13 0 17 Nov 2022
The Architectural Bottleneck PrincipleConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Tiago Pimentel Josef Valvoda Niklas Stoehr Robert Bamler 147 5 0 11 Nov 2022
Improving word mover's distance by leveraging self-attention matrixConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Hiroaki Yamagiwa Sho Yokoi Hidetoshi Shimodaira OT 148 6 0 11 Nov 2022
Parallel Attention Forcing for Machine Translation Qingyun Dou Mark Gales 87 1 0 06 Nov 2022
On the Explainability of Natural Language Processing Deep ModelsACM Computing Surveys (ACM CSUR), 2022 Julia El Zini M. Awad 228 109 0 13 Oct 2022
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code StructureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Nuo Chen Qiushi Sun Renyu Zhu Xiang Li Xuesong Lu Ming Gao 237 10 0 07 Oct 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentBritish Machine Vision Conference (BMVC), 2022 Mustafa Shukor Guillaume Couairon Matthieu Cord VLM CLIP 281 27 0 29 Aug 2022
Sparse Attentive Memory Network for Click-through Rate Prediction with Long SequencesInternational Conference on Information and Knowledge Management (CIKM), 2022 Qianying Lin Wen-Ji Zhou Yanshi Wang Qing Da Qingguo Chen Bing Wang VLM 146 13 0 08 Aug 2022
Beware the Rationalization Trap! When Language Model Explainability Diverges from our Mental Models of Language Rita Sevastjanova Mennatallah El-Assady LRM 200 10 0 14 Jul 2022
AnyMorph: Learning Transferable Polices By Inferring Agent MorphologyInternational Conference on Machine Learning (ICML), 2022 Brandon Trabucco Mariano Phielipp Glen Berseth 150 35 0 17 Jun 2022
Transformer with Fourier Integral Attentions T. Nguyen Minh Pham Tam Nguyen Khai Nguyen Stanley J. Osher Nhat Ho 158 6 0 01 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Alireza Mohammadshahi Vassilina Nikoulina Alexandre Berard Caroline Brun James Henderson Laurent Besacier AI4CE 386 12 0 22 May 2022
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer Zhengyuan Liu Nancy F. Chen 120 2 0 19 May 2022
Are Prompt-based Models Clueless?Annual Meeting of the Association for Computational Linguistics (ACL), 2022 Pride Kavumba Ryo Takahashi Yusuke Oda VLM 309 13 0 19 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations H. Heidenreich Jake Williams 116 1 0 09 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information Chiyu Feng Po-Chun Hsu Hung-yi Lee SSL 154 9 0 08 May 2022
LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Mor Geva Avi Caciularu Guy Dar Paul Roit Shoval Sadde Micah Shlain Bar Tamir Yoav Goldberg KELM 218 31 0 26 Apr 2022
A Review on Language Models as Knowledge Bases Badr AlKhamissi Millicent Li Asli Celikyilmaz Mona T. Diab Marjan Ghazvininejad KELM 280 208 0 12 Apr 2022

All Papers

Analyzing the Structure of Attention in a Transformer Language Model

Papers citing "Analyzing the Structure of Attention in a Transformer Language Model"