Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2011.00943
Cited By

How Far Does BERT Look At:Distance-based Clustering and Analysis of
BERT$'$s Attention

v1v2 (latest)

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT $'$ s Attention

International Conference on Computational Linguistics (COLING), 2020

2 November 2020

Jingwen Leng

ArXiv (abs)PDF HTML

Papers citing "How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT$'$s Attention"

11 / 11 papers shown

Analysis of Argument Structure Constructions in the Large Language Model
BERT

Analysis of Argument Structure Constructions in the Large Language Model BERT

Achim Schilling

Patrick Krauss

350

7

0

08 Aug 2024

Analyzing Semantic Change through Lexical Replacements

Analyzing Semantic Change through Lexical Replacements

Francesco Periti

Pierluigi Cassotti

Haim Dubossarsky

285

21

0

29 Apr 2024

GMLake: Efficient and Transparent GPU Memory Defragmentation for
Large-scale DNN Training with Virtual Memory Stitching

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

Cong Guo

Jingwen Leng

Zihan Liu

...

244

33

0

16 Jan 2024

Interpretability Illusions in the Generalization of Simplified Models

Interpretability Illusions in the Generalization of Simplified Models

Andrew Kyle Lampinen

Asma Ghandeharioun

403

20

0

06 Dec 2023

AttentionMix: Data augmentation method that relies on BERT attention
mechanism

AttentionMix: Data augmentation method that relies on BERT attention mechanism

Jacek Mańdziuk

312

4

0

20 Sep 2023

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
Network Quantization

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network QuantizationMicro (MICRO), 2022

Cong Guo

Jingwen Leng

Zihan Liu

Fan Yang

Yuhao Zhu

273

106

0

30 Aug 2022

Transkimmer: Transformer Learns to Layer-wise Skim

Transkimmer: Transformer Learns to Layer-wise SkimAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Jingwen Leng

195

43

0

15 May 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian
Approximation

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian ApproximationInternational Conference on Learning Representations (ICLR), 2022

Cong Guo

Jingwen Leng

Fan Yang

Yuhao Zhu

289

90

0

14 Feb 2022

VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services
via Adaptive Compilation and Scheduling

VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and SchedulingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022

Zihan Liu

Jingwen Leng

194

57

0

17 Jan 2022

Block-Skim: Efficient Question Answering for Transformer

Block-Skim: Efficient Question Answering for Transformer

Jingwen Leng

Yuhao Zhu

279

33

0

16 Dec 2021

Dual-side Sparse Tensor Core

Dual-side Sparse Tensor CoreInternational Symposium on Computer Architecture (ISCA), 2021

Cong Guo

Jingwen Leng

280

95

0

20 May 2021

Page 1 of 1