Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.06874
Cited By
v1
v2
v3
v4 (latest)
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Transactions of the Association for Computational Linguistics (TACL), 2021
11 March 2021
J. Clark
Dan Garrette
Iulia Turc
John Wieting
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation"
17 / 167 papers shown
Title
Integrating Approaches to Word Representation
Yuval Pinter
NAI
144
5
0
10 Sep 2021
Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Massimo Nicosia
Zhongdi Qu
Yasemin Altun
120
26
0
09 Sep 2021
You should evaluate your language model on marginal likelihood over tokenisations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Kris Cao
Laura Rimell
177
30
0
06 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?
Chantal Amrhein
Rico Sennrich
195
14
0
02 Sep 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
175
303
0
12 Aug 2021
Learning to Look Inside: Augmenting Token-Based Encoders with Character-Level Information
Yuval Pinter
Amanda Stent
Mark Dredze
Jacob Eisenstein
102
7
0
01 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
International Conference on Learning Representations (ICLR), 2021
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
313
691
0
30 Jul 2021
Local Structure Matters Most: Perturbation Study in NLU
Findings (Findings), 2021
Louis Clouâtre
Prasanna Parthasarathi
Payel Das
Sarath Chandar
133
16
0
29 Jul 2021
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
258
182
0
23 Jun 2021
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
278
28
0
16 Jun 2021
Sub-Character Tokenization for Chinese Pretrained Language Models
Transactions of the Association for Computational Linguistics (TACL), 2021
Chenglei Si
Zhengyan Zhang
Yingfa Chen
Fanchao Qi
Xiaozhi Wang
Zhiyuan Liu
Yasheng Wang
Qun Liu
Maosong Sun
152
16
0
01 Jun 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Transactions of the Association for Computational Linguistics (TACL), 2021
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
299
582
0
28 May 2021
Robust Open-Vocabulary Translation from Visual Text Representations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Elizabeth Salesky
David Etter
Matt Post
VLM
218
50
0
16 Apr 2021
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Sebastian Ruder
Noah Constant
Jan A. Botha
Aditya Siddhant
Orhan Firat
...
Pengfei Liu
Junjie Hu
Dan Garrette
Graham Neubig
Melvin Johnson
ELM
AAML
LRM
180
208
0
15 Apr 2021
Inducing Meaningful Units from Character Sequences with Dynamic Capacity Slot Attention
Melika Behjati
James Henderson
OCL
112
1
0
01 Feb 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
350
295
0
31 Dec 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Transactions of the Association for Computational Linguistics (TACL), 2020
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
368
675
0
10 Mar 2020
Previous
1
2
3
4