ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.00943
  4. Cited By
How Far Does BERT Look At:Distance-based Clustering and Analysis of
  BERT$'$s Attention
v1v2 (latest)

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT′'′s Attention

International Conference on Computational Linguistics (COLING), 2020
2 November 2020
Yue Guan
Jingwen Leng
Chao Li
Quan Chen
Minyi Guo
ArXiv (abs)PDFHTML

Papers citing "How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT$'$s Attention"

11 / 11 papers shown
Analysis of Argument Structure Constructions in the Large Language Model
  BERT
Analysis of Argument Structure Constructions in the Large Language Model BERT
Pegah Ramezani
Achim Schilling
Patrick Krauss
306
5
0
08 Aug 2024
Analyzing Semantic Change through Lexical Replacements
Analyzing Semantic Change through Lexical Replacements
Francesco Periti
Pierluigi Cassotti
Haim Dubossarsky
Nina Tahmasebi
211
19
0
29 Apr 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for
  Large-scale DNN Training with Virtual Memory Stitching
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
218
32
0
16 Jan 2024
Interpretability Illusions in the Generalization of Simplified Models
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
365
21
0
06 Dec 2023
AttentionMix: Data augmentation method that relies on BERT attention
  mechanism
AttentionMix: Data augmentation method that relies on BERT attention mechanism
Dominik Lewy
Jacek Mańdziuk
285
4
0
20 Sep 2023
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network QuantizationMicro (MICRO), 2022
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
239
101
0
30 Aug 2022
Transkimmer: Transformer Learns to Layer-wise Skim
Transkimmer: Transformer Learns to Layer-wise SkimAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
174
42
0
15 May 2022
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian
  Approximation
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian ApproximationInternational Conference on Learning Representations (ICLR), 2022
Cong Guo
Yuxian Qiu
Jingwen Leng
Xiaotian Gao
Chen Zhang
Yunxin Liu
Fan Yang
Yuhao Zhu
Minyi Guo
MQ
271
86
0
14 Feb 2022
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services
  via Adaptive Compilation and Scheduling
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and SchedulingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022
Zihan Liu
Jingwen Leng
Zhihui Zhang
Quan Chen
Chao Li
Minyi Guo
158
56
0
17 Jan 2022
Block-Skim: Efficient Question Answering for Transformer
Block-Skim: Efficient Question Answering for Transformer
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
Yuhao Zhu
231
33
0
16 Dec 2021
Dual-side Sparse Tensor Core
Dual-side Sparse Tensor CoreInternational Symposium on Computer Architecture (ISCA), 2021
Yang-Feng Wang
Chen Zhang
Zhiqiang Xie
Cong Guo
Yunxin Liu
Jingwen Leng
261
92
0
20 May 2021
1
Page 1 of 1