v1v2v3 (latest)

Positional Artefacts Propagate Through Masked Language Model Embeddings

9 November 2020

Ziyang Luo

Artur Kulmizev

Xiaoxi Mao

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Positional Artefacts Propagate Through Masked Language Model Embeddings"

32 / 32 papers shown

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

502

01 May 2025

MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration

241

07 Mar 2025

Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Serguei Barannikov

220

10 Oct 2024

OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

Qi Qi

Jianxin Liao

225

27 Jun 2024

Improving Interpretability and Robustness for the Detection of AI-Generated Images

269

21 Jun 2024

Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

230

16 Jun 2024

Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs

Jaewoo Yang

Hayun Kim

Younghoon Kim

303

23 May 2024

Unveiling Linguistic Regions in Large Language Models

Zhihao Zhang

Jun Zhao

Tao Gui

Xuanjing Huang

411

22 Feb 2024

A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Mingjie Sun

Zhuang Liu

Anna Bair

J. Zico Kolter

683

781

20 Jun 2023

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence SimilarityAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

511

01 Jun 2023

The Impact of Positional Encoding on Length Generalization in TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirhossein Kazemnejad

Inkit Padhi

Karthikeyan N. Ramamurthy

Payel Das

Siva Reddy

495

348

31 May 2023

Intriguing Properties of Quantization at ScaleNeural Information Processing Systems (NeurIPS), 2023

361

30 May 2023

Feature-Learning Networks Are Consistent Across Widths At Realistic ScalesNeural Information Processing Systems (NeurIPS), 2023

Nikhil Vyas

Alexander B. Atanasov

506

28 May 2023

Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhong Zhang

Bang Liu

Junming Shao

316

27 May 2023

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Ta-Chung Chi

Ting-Han Fan

Li-Wei Chen

Alexander I. Rudnicky

Peter J. Ramadge

VLM MILM

213

23 May 2023

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

Na Li

Hanane Kteich

Zied Bouraoui

Steven Schockaert

271

16 May 2023

Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention MapsInternational Conference on Learning Representations (ICLR), 2023

561

01 Feb 2023

Representation biases in sentence transformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Dmitry Nikolaev

Sebastian Padó

298

30 Jan 2023

The case for 4-bit precision: k-bit Inference Scaling LawsInternational Conference on Machine Learning (ICML), 2022

Tim Dettmers

Luke Zettlemoyer

529

313

19 Dec 2022

The Curious Case of Absolute Position EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Koustuv Sinha

Amirhossein Kazemnejad

Siva Reddy

J. Pineau

Dieuwke Hupkes

Adina Williams

293

23 Oct 2022

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language ModelsNeural Information Processing Systems (NeurIPS), 2022

Shanghang Zhang

Xianglong Liu

443

212

27 Sep 2022

Isotropic Representation Can Improve Dense RetrievalPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2022

288

01 Sep 2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Tim Dettmers

M. Lewis

Younes Belkada

Luke Zettlemoyer

615

975

15 Aug 2022

Outliers Dimensions that Disrupt Transformers Are Driven by FrequencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

637

23 May 2022

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in TransformersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Ali Modarressi

Mohsen Fayyaz

Yadollah Yaghoobzadeh

Mohammad Taher Pilehvar

ViT

245

06 May 2022

DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

Changjie Fan

215

19 Apr 2022

Measuring the Mixing of Contextual Information in the TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Javier Ferrando

Gerard I. Gállego

Marta R. Costa-jussá

379

08 Mar 2022

An Isotropy Analysis in the Multilingual BERT Embedding SpaceFindings (Findings), 2021

S. Rajaee

Mohammad Taher Pilehvar

296

09 Oct 2021

Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Ekaterina Taktasheva

Vladislav Mikhailov

Ekaterina Artemova

283

28 Sep 2021

On Isotropy Calibration of TransformersFirst Workshop on Insights from Negative Results in NLP (Insights), 2021

Roger Wattenhofer

207

27 Sep 2021

All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational QualityConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

William Timkey

Marten van Schijndel

574

156

09 Sep 2021

BERT Busters: Outlier Dimensions that Disrupt TransformersFindings (Findings), 2021

573

116

14 May 2021