ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12246
  4. Cited By
Do Attention Heads in BERT Track Syntactic Dependencies?

Do Attention Heads in BERT Track Syntactic Dependencies?

27 November 2019
Phu Mon Htut
Jason Phang
Shikha Bordia
Samuel R. Bowman
ArXivPDFHTML

Papers citing "Do Attention Heads in BERT Track Syntactic Dependencies?"

18 / 18 papers shown
Title
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Z. Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
57
5
0
17 Oct 2024
Linguistically Grounded Analysis of Language Models using Shapley Head Values
Linguistically Grounded Analysis of Language Models using Shapley Head Values
Marcell Richard Fekete
Johannes Bjerva
26
0
0
17 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
80
1
0
02 Oct 2024
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization
  for Language Models
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models
Chengzhengxu Li
Xiaoming Liu
Zhaohan Zhang
Yichen Wang
Chen Liu
Y. Lan
Chao Shen
35
2
0
15 Jun 2024
Inducing Systematicity in Transformers by Attending to Structurally
  Quantized Embeddings
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings
Yichen Jiang
Xiang Zhou
Mohit Bansal
23
1
0
09 Feb 2024
Morphosyntactic probing of multilingual BERT models
Morphosyntactic probing of multilingual BERT models
Judit Ács
Endre Hamerlik
Roy Schwartz
Noah A. Smith
András Kornai
25
9
0
09 Jun 2023
Syntactic Substitutability as Unsupervised Dependency Syntax
Syntactic Substitutability as Unsupervised Dependency Syntax
Jasper Jian
Siva Reddy
8
3
0
29 Nov 2022
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
24
2
0
31 Oct 2022
What does Transformer learn about source code?
What does Transformer learn about source code?
Kechi Zhang
Ge Li
Zhi Jin
ViT
12
8
0
18 Jul 2022
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
Oren Barkan
Edan Hauon
Avi Caciularu
Ori Katz
Itzik Malkiel
Omri Armstrong
Noam Koenigstein
21
37
0
23 Apr 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language
  Models for Source Code
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei-Ye Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
17
105
0
14 Feb 2022
Interpreting Deep Learning Models in Natural Language Processing: A
  Review
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
12
44
0
20 Oct 2021
Enjoy the Salience: Towards Better Transformer-based Faithful
  Explanations with Word Salience
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
G. Chrysostomou
Nikolaos Aletras
13
16
0
31 Aug 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
24
807
0
14 Jun 2021
The Limitations of Limited Context for Constituency Parsing
The Limitations of Limited Context for Constituency Parsing
Yuchen Li
Andrej Risteski
20
4
0
03 Jun 2021
Molecule Attention Transformer
Molecule Attention Transformer
Lukasz Maziarka
Tomasz Danel
Slawomir Mucha
Krzysztof Rataj
Jacek Tabor
Stanislaw Jastrzebski
6
168
0
19 Feb 2020
What you can cram into a single vector: Probing sentence embeddings for
  linguistic properties
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
879
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1