CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
v1v2v3v4 (latest)

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Transactions of the Association for Computational Linguistics (TACL), 2021

Papers citing "CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation"

50 / 166 papers shown
Title
Zero-Shot Tokenizer Transfer
Zero-Shot Tokenizer TransferNeural Information Processing Systems (NeurIPS), 2024
131
23
0
13 May 2024
On the Effect of (Near) Duplicate Subwords in Language Modelling
On the Effect of (Near) Duplicate Subwords in Language ModellingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
170
3
0
09 Apr 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation
  with Model Performance
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model PerformanceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
181
39
0
10 Mar 2024
Anisotropy Is Inherent to Self-Attention in Transformers
Anisotropy Is Inherent to Self-Attention in TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
152
29
0
22 Jan 2024
Text Rendering Strategies for Pixel Language Models
Text Rendering Strategies for Pixel Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
171
14
0
01 Nov 2023
Learning to Abstract with Nonparametric Variational Information
  Bottleneck
Learning to Abstract with Nonparametric Variational Information BottleneckConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
125
4
0
26 Oct 2023
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
Learn Your Tokens: Word-Pooled Tokenization for Language ModelingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
186
9
0
17 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Optimized Tokenization for Transcribed Error CorrectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
132
0
0
16 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
173
1
0
11 Oct 2023
Assessment of Pre-Trained Models Across Languages and Grammars
Assessment of Pre-Trained Models Across Languages and GrammarsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
135
4
0
20 Sep 2023
Lightweight Adaptation of Neural Language Models via Subspace Embedding
Lightweight Adaptation of Neural Language Models via Subspace EmbeddingInternational Conference on Information and Knowledge Management (CIKM), 2023
116
2
0
16 Aug 2023
Biomedical Language Models are Robust to Sub-optimal Tokenization
Biomedical Language Models are Robust to Sub-optimal TokenizationWorkshop on Biomedical Natural Language Processing (BioNLP), 2023
Bernal Jiménez Gutiérrez
Huan Sun
Yu-Chuan Su
103
8
0
30 Jun 2023
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic
  Sentence Segmentation
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
141
22
0
30 May 2023

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.