ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.04393
  4. Cited By
Positional Artefacts Propagate Through Masked Language Model Embeddings
v1v2v3 (latest)

Positional Artefacts Propagate Through Masked Language Model Embeddings

9 November 2020
Ziyang Luo
Artur Kulmizev
Xiaoxi Mao
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Positional Artefacts Propagate Through Masked Language Model Embeddings"

32 / 32 papers shown
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
502
7
0
01 May 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
Jiangming Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQMoMe
241
3
0
07 Mar 2025
Robust AI-Generated Text Detection by Restricted Embeddings
Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kristian Kuznetsov
Eduard Tulchinskii
Laida Kushnareva
German Magai
Serguei Barannikov
Sergey I. Nikolenko
Irina Piontkovskaya
DeLMO
220
18
0
10 Oct 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
225
4
0
27 Jun 2024
Improving Interpretability and Robustness for the Detection of
  AI-Generated Images
Improving Interpretability and Robustness for the Detection of AI-Generated Images
T. Gaintseva
Laida Kushnareva
German Magai
Irina Piontkovskaya
Sergey I. Nikolenko
Ziquan Liu
S. Barannikov
Gregory Slabaugh
269
4
0
21 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
230
6
0
16 Jun 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
303
20
0
23 May 2024
Unveiling Linguistic Regions in Large Language Models
Unveiling Linguistic Regions in Large Language Models
Zhihao Zhang
Jun Zhao
Tao Gui
Tao Gui
Xuanjing Huang
411
24
0
22 Feb 2024
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
683
781
0
20 Jun 2023
Exploring Anisotropy and Outliers in Multilingual Language Models for
  Cross-Lingual Semantic Sentence Similarity
Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence SimilarityAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Katharina Hämmerl
Alina Fastowski
Jindrich Libovický
Kangyang Luo
511
16
0
01 Jun 2023
The Impact of Positional Encoding on Length Generalization in
  Transformers
The Impact of Positional Encoding on Length Generalization in TransformersNeural Information Processing Systems (NeurIPS), 2023
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
495
348
0
31 May 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at ScaleNeural Information Processing Systems (NeurIPS), 2023
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
361
49
0
30 May 2023
Feature-Learning Networks Are Consistent Across Widths At Realistic
  Scales
Feature-Learning Networks Are Consistent Across Widths At Realistic ScalesNeural Information Processing Systems (NeurIPS), 2023
Nikhil Vyas
Alexander B. Atanasov
Blake Bordelon
Depen Morwani
Sabarish Sainathan
Cengiz Pehlevan
506
43
0
28 May 2023
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
  Subspaces of Pre-trained Language Models
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhong Zhang
Bang Liu
Junming Shao
316
20
0
27 May 2023
Latent Positional Information is in the Self-Attention Variance of
  Transformer Language Models Without Positional Embeddings
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ta-Chung Chi
Ting-Han Fan
Li-Wei Chen
Alexander I. Rudnicky
Peter J. Ramadge
VLMMILM
213
23
0
23 May 2023
Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned
  Language Models
Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Na Li
Hanane Kteich
Zied Bouraoui
Steven Schockaert
271
11
0
16 May 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of
  Attention Maps
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention MapsInternational Conference on Learning Representations (ICLR), 2023
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
561
33
0
01 Feb 2023
Representation biases in sentence transformers
Representation biases in sentence transformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Dmitry Nikolaev
Sebastian Padó
298
11
0
30 Jan 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
The case for 4-bit precision: k-bit Inference Scaling LawsInternational Conference on Machine Learning (ICML), 2022
Tim Dettmers
Luke Zettlemoyer
MQ
529
313
0
19 Dec 2022
The Curious Case of Absolute Position Embeddings
The Curious Case of Absolute Position EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Koustuv Sinha
Amirhossein Kazemnejad
Siva Reddy
J. Pineau
Dieuwke Hupkes
Adina Williams
293
21
0
23 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language
  Models
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Yazhe Niu
Shanghang Zhang
Tao Gui
F. Yu
Xianglong Liu
MQ
443
212
0
27 Sep 2022
Isotropic Representation Can Improve Dense Retrieval
Isotropic Representation Can Improve Dense RetrievalPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2022
Euna Jung
J. Park
Jaekeol Choi
Sungyoon Kim
Wonjong Rhee
OOD
288
7
0
01 Sep 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
615
975
0
15 Aug 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Outliers Dimensions that Disrupt Transformers Are Driven by FrequencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
637
61
0
23 May 2022
GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole
  Encoder Layer in Transformers
GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in TransformersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Ali Modarressi
Mohsen Fayyaz
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
ViT
245
54
0
06 May 2022
DecBERT: Enhancing the Language Understanding of BERT with Causal
  Attention Masks
DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks
Ziyang Luo
Yadong Xi
Jing Ma
Zhiwei Yang
Xiaoxi Mao
Changjie Fan
Rongsheng Zhang
215
5
0
19 Apr 2022
Measuring the Mixing of Contextual Information in the Transformer
Measuring the Mixing of Contextual Information in the TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
379
75
0
08 Mar 2022
An Isotropy Analysis in the Multilingual BERT Embedding Space
An Isotropy Analysis in the Multilingual BERT Embedding SpaceFindings (Findings), 2021
S. Rajaee
Mohammad Taher Pilehvar
296
41
0
09 Oct 2021
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with
  Controllable Perturbations
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations
Ekaterina Taktasheva
Vladislav Mikhailov
Ekaterina Artemova
283
15
0
28 Sep 2021
On Isotropy Calibration of Transformers
On Isotropy Calibration of TransformersFirst Workshop on Insights from Negative Results in NLP (Insights), 2021
Yue Ding
Karolis Martinkus
Damian Pascual
Simon Clematide
Roger Wattenhofer
207
1
0
27 Sep 2021
All Bark and No Bite: Rogue Dimensions in Transformer Language Models
  Obscure Representational Quality
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational QualityConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
William Timkey
Marten van Schijndel
574
156
0
09 Sep 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers
BERT Busters: Outlier Dimensions that Disrupt TransformersFindings (Findings), 2021
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
573
116
0
14 May 2021
1
Page 1 of 1