Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1906.04284
Cited By
v1
v2 (latest)
Analyzing the Structure of Attention in a Transformer Language Model
7 June 2019
Jesse Vig
Yonatan Belinkov
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Analyzing the Structure of Attention in a Transformer Language Model"
50 / 226 papers shown
Title
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
224
13
0
07 Oct 2023
One Wide Feedforward is All You Need
Conference on Machine Translation (WMT), 2023
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
204
18
0
04 Sep 2023
Transforming the Output of Generative Pre-trained Transformer: The Influence of the PGI Framework on Attention Dynamics
Aline Ioste
94
1
0
25 Aug 2023
Robustifying Point Cloud Networks by Refocusing
International Conference on 3D Vision (3DV), 2023
Meir Yossef Levi
Guy Gilboa
3DPC
353
5
0
10 Aug 2023
ALens: An Adaptive Domain-Oriented Abstract Writing Training Tool for Novice Researchers
Chen Cheng
Ziang Li
Zhenhui Peng
Quan Li
205
1
0
08 Aug 2023
AI for the Generation and Testing of Ideas Towards an AI Supported Knowledge Development Environment
T. Selker
35
3
0
17 Jul 2023
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
IEEE Transactions on Software Engineering (TSE), 2023
Yuheng Huang
Yuheng Huang
Zhijie Wang
Shengming Zhao
Huaming Chen
Felix Juefei-Xu
Lei Ma
276
34
0
16 Jul 2023
Multi-modal Graph Learning over UMLS Knowledge Graphs
Manuel Burger
Gunnar Rätsch
Rita Kuznetsova
168
6
0
10 Jul 2023
Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation
Massimiliano Patacchiola
Mingfei Sun
Katja Hofmann
Richard Turner
OffRL
188
1
0
23 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Wanrong Zhu
225
28
0
04 Jun 2023
Transforming ECG Diagnosis:An In-depth Review of Transformer-based DeepLearning Models in Cardiovascular Disease Detection
Zibin Zhao
MedIm
119
19
0
02 Jun 2023
Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongqi Pu
Yifa Wang
Vera Demberg
212
27
0
26 May 2023
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shaolei Zhang
Yang Feng
202
26
0
25 May 2023
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Shahar Katz
Yonatan Belinkov
156
36
0
22 May 2023
AttentionViz: A Global View of Transformer Attention
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
271
87
0
04 May 2023
Towards autonomous system: flexible modular production system enhanced with large language model agents
IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 2023
Yuchen Xia
Manthan Shenoy
N. Jazdi
M. Weyrich
LLMAG
AI4CE
317
85
0
28 Apr 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Neural Information Processing Systems (NeurIPS), 2023
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Wanrong Zhu
172
49
0
26 Apr 2023
State Spaces Aren't Enough: Machine Translation Needs Attention
European Association for Machine Translation Conferences/Workshops (EAMT), 2023
Ali Vardasbi
Telmo Pires
Robin M. Schmidt
Stephan Peitz
131
13
0
25 Apr 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Neural Information Processing Systems (NeurIPS), 2023
Hao Fei
Shengqiong Wu
Jingye Li
Bobo Li
Fei Li
Libo Qin
Meishan Zhang
Hao Fei
Tat-Seng Chua
223
103
0
13 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Z. Fu
W. Lam
Qian Yu
Anthony Man-Cho So
Shengding Hu
Zhiyuan Liu
Nigel Collier
AuLLM
143
58
0
08 Apr 2023
PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models
Aditi Mishra
Utkarsh Soni
Anjana Arunkumar
Jinbin Huang
Bum Chul Kwon
Chris Bryan
LRM
191
42
0
04 Apr 2023
Language Model Behavior: A Comprehensive Survey
International Conference on Computational Logic (ICCL), 2023
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
328
137
0
20 Mar 2023
Attention-likelihood relationship in transformers
Valeria Ruscio
Valentino Maiorca
Fabrizio Silvestri
58
2
0
15 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
International Conference on Machine Learning (ICML), 2023
Yuchen Li
Yuan-Fang Li
Andrej Risteski
357
79
0
07 Mar 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
146
4
0
22 Jan 2023
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
276
228
0
16 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
International Conference on Learning Representations (ICLR), 2023
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
216
37
0
05 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Dong Wang
James R. Glass
Yulia Tsvetkov
335
58
0
20 Dec 2022
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hritik Bansal
Karthik Gopalakrishnan
Saket Dingliwal
S. Bodapati
Katrin Kirchhoff
Dan Roth
LRM
228
63
0
18 Dec 2022
Attention as a Guide for Simultaneous Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Sara Papi
Matteo Negri
Marco Turchi
193
39
0
15 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
AAAI Conference on Artificial Intelligence (AAAI), 2022
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
311
37
0
07 Dec 2022
Explanation on Pretraining Bias of Finetuned Vision Transformer
Bumjin Park
Jaesik Choi
ViT
120
1
0
18 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
166
13
0
17 Nov 2022
The Architectural Bottleneck Principle
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tiago Pimentel
Josef Valvoda
Niklas Stoehr
Robert Bamler
147
5
0
11 Nov 2022
Improving word mover's distance by leveraging self-attention matrix
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hiroaki Yamagiwa
Sho Yokoi
Hidetoshi Shimodaira
OT
148
6
0
11 Nov 2022
Parallel Attention Forcing for Machine Translation
Qingyun Dou
Mark Gales
87
1
0
06 Nov 2022
On the Explainability of Natural Language Processing Deep Models
ACM Computing Surveys (ACM CSUR), 2022
Julia El Zini
M. Awad
228
109
0
13 Oct 2022
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Nuo Chen
Qiushi Sun
Renyu Zhu
Xiang Li
Xuesong Lu
Ming Gao
237
10
0
07 Oct 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
British Machine Vision Conference (BMVC), 2022
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
281
27
0
29 Aug 2022
Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences
International Conference on Information and Knowledge Management (CIKM), 2022
Qianying Lin
Wen-Ji Zhou
Yanshi Wang
Qing Da
Qingguo Chen
Bing Wang
VLM
146
13
0
08 Aug 2022
Beware the Rationalization Trap! When Language Model Explainability Diverges from our Mental Models of Language
Rita Sevastjanova
Mennatallah El-Assady
LRM
200
10
0
14 Jul 2022
AnyMorph: Learning Transferable Polices By Inferring Agent Morphology
International Conference on Machine Learning (ICML), 2022
Brandon Trabucco
Mariano Phielipp
Glen Berseth
150
35
0
17 Jun 2022
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
158
6
0
01 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
386
12
0
22 May 2022
Learning from Bootstrapping and Stepwise Reinforcement Reward: A Semi-Supervised Framework for Text Style Transfer
Zhengyuan Liu
Nancy F. Chen
120
2
0
19 May 2022
Are Prompt-based Models Clueless?
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Pride Kavumba
Ryo Takahashi
Yusuke Oda
VLM
309
13
0
19 May 2022
EigenNoise: A Contrastive Prior to Warm-Start Representations
H. Heidenreich
Jake Williams
116
1
0
09 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
154
9
0
08 May 2022
LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mor Geva
Avi Caciularu
Guy Dar
Paul Roit
Shoval Sadde
Micah Shlain
Bar Tamir
Yoav Goldberg
KELM
218
31
0
26 Apr 2022
A Review on Language Models as Knowledge Bases
Badr AlKhamissi
Millicent Li
Asli Celikyilmaz
Mona T. Diab
Marjan Ghazvininejad
KELM
280
208
0
12 Apr 2022
Previous
1
2
3
4
5
Next