ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,064 papers shown
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic
  Sentence Segmentation
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
234
24
0
30 May 2023
BigTranslate: Augmenting Large Language Models with Multilingual
  Translation Capability over 100 Languages
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages
Wen Yang
Chong Li
Jiajun Zhang
Chengqing Zong
LRM
397
72
0
29 May 2023
External Language Model Integration for Factorized Neural Transducers
External Language Model Integration for Factorized Neural Transducers
Michael Levit
S. Parthasarathy
Cem Aksoylar
Mohammad Sadegh Rasooli
Shuangyu Chang
242
2
0
26 May 2023
Im-Promptu: In-Context Composition from Image Prompts
Im-Promptu: In-Context Composition from Image PromptsNeural Information Processing Systems (NeurIPS), 2023
Bhishma Dedhia
Michael Chang
Jake C. Snell
Thomas Griffiths
N. Jha
LRMMLLM
370
4
0
26 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
176
8
0
26 May 2023
Diable: Efficient Dialogue State Tracking as Operations on Tables
Diable: Efficient Dialogue State Tracking as Operations on TablesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pietro Lesci
Yoshinari Fujinuma
Momchil Hardalov
Chao Shang
Yassine Benajiba
Lluís Marquez
LMTD
299
8
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image ModelsNeural Information Processing Systems (NeurIPS), 2023
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
208
18
0
26 May 2023
TranSFormer: Slow-Fast Transformer for Machine Translation
TranSFormer: Slow-Fast Transformer for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Bei Li
Yi Jing
Xu Tan
Zhen Xing
Tong Xiao
Jingbo Zhu
172
10
0
26 May 2023
Robustness of Multi-Source MT to Transcription Errors
Robustness of Multi-Source MT to Transcription ErrorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Dominik Machávcek
Peter Polák
Ondrej Bojar
Mary Dabre
170
4
0
26 May 2023
Domain Aligned Prefix Averaging for Domain Generalization in Abstractive
  Summarization
Domain Aligned Prefix Averaging for Domain Generalization in Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pranav Ajit Nair
Sukomal Pal
Pradeepika Verm
MoMe
235
2
0
26 May 2023
End-to-End Simultaneous Speech Translation with Differentiable
  Segmentation
End-to-End Simultaneous Speech Translation with Differentiable SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shaolei Zhang
Yang Feng
260
26
0
25 May 2023
MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by
  Leveraging Unstructured Context in Neural Machine Translation
MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by Leveraging Unstructured Context in Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
S. Vincent
R. Flynn
Carolina Scarton
194
4
0
25 May 2023
Towards Higher Pareto Frontier in Multilingual Machine Translation
Towards Higher Pareto Frontier in Multilingual Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yi-Chong Huang
Xiaocheng Feng
Xinwei Geng
Baohang Li
Bing Qin
187
14
0
25 May 2023
Revisiting non-English Text Simplification: A Unified Multilingual
  Benchmark
Revisiting non-English Text Simplification: A Unified Multilingual BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Michael Joseph Ryan
Tarek Naous
Wei Xu
213
35
0
25 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
673
327
0
25 May 2023
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
David Qiu
David Rim
Shaojin Ding
Oleg Rybakov
Yanzhang He
MQ
192
4
0
24 May 2023
CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
CMOT: Cross-modal Mixup via Optimal Transport for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yan Zhou
Qingkai Fang
Yang Feng
OT
336
40
0
24 May 2023
From Characters to Words: Hierarchical Pre-trained Language Model for
  Open-vocabulary Language Understanding
From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Li Sun
F. Luisier
Kayhan Batmanghelich
D. Florêncio
Changrong Zhang
VLM
175
7
0
23 May 2023
Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural
  Machine Translation
Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation
Frédéric Odermatt
Béni Egressy
Roger Wattenhofer
172
0
0
23 May 2023
How to Choose How to Choose Your Chatbot: A Massively Multi-System
  MultiReference Data Set for Dialog Metric Evaluation
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation
Huda Khayrallah
Zuhaib Akhtar
Edward Cohen
João Sedoc
167
2
0
23 May 2023
NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive
  Decoders
NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive DecodersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Livio Baldini Soares
D. Gillick
Jeremy R. Cole
Tom Kwiatkowski
197
2
0
23 May 2023
Multilingual Pixel Representations for Translation and Effective
  Cross-lingual Transfer
Multilingual Pixel Representations for Translation and Effective Cross-lingual TransferConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Elizabeth Salesky
Neha Verma
Philipp Koehn
Matt Post
290
19
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTaNeural Information Processing Systems (NeurIPS), 2023
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
313
3
0
23 May 2023
Exploring Representational Disparities Between Multilingual and
  Bilingual Translation Models
Exploring Representational Disparities Between Multilingual and Bilingual Translation ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023
Neha Verma
Kenton W. Murray
Kevin Duh
230
0
0
23 May 2023
CompoundPiece: Evaluating and Improving Decompounding Performance of
  Language Models
CompoundPiece: Evaluating and Improving Decompounding Performance of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
223
11
0
23 May 2023
$μ$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge
μμμPLAN: Summarizing using a Content Plan as Cross-Lingual BridgeConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Fantine Huot
Joshua Maynez
Chris Alberti
Reinald Kim Amplayo
Priyanka Agrawal
Constanza Fierro
Shashi Narayan
Mirella Lapata
364
7
0
23 May 2023
Improving speech translation by fusing speech and text
Improving speech translation by fusing speech and textConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenbiao Yin
Zhicheng Liu
Chengqi Zhao
Tao Wang
Jian-Fei Tong
Rong Ye
213
4
0
23 May 2023
Condensing Multilingual Knowledge with Lightweight Language-Specific
  Modules
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoran Xu
Weiting Tan
Shuyue Stella Li
Yunmo Chen
Benjamin Van Durme
Philipp Koehn
Kenton W. Murray
298
7
0
23 May 2023
Challenges in Context-Aware Neural Machine Translation
Challenges in Context-Aware Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Linghao Jin
Jacqueline He
Jonathan May
Xuezhe Ma
209
12
0
23 May 2023
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
  Low-Resource Speech Recognition with Transducers
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
J. Silovský
Liuhui Deng
Arturo Argueta
Tresi Arvizo
Roger Hsiao
Sasha Kuznietsov
Yiu-Chang Lin
Xiaoqiang Xiao
Yuanyuan Zhang
199
3
0
23 May 2023
AxomiyaBERTa: A Phonologically-aware Transformer Model for Assamese
AxomiyaBERTa: A Phonologically-aware Transformer Model for AssameseAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Abhijnan Nath
Sheikh Mannan
Nikhil Krishnaswamy
189
7
0
23 May 2023
Neural Machine Translation for Code Generation
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
330
7
0
22 May 2023
Text Generation with Speech Synthesis for ASR Data Augmentation
Text Generation with Speech Synthesis for ASR Data Augmentation
Zhuangqun Huang
Gil Keren
Ziran Jiang
Shashank Jain
David Goss-Grubbs
...
Antony DÁvirro
Ethan Campbell-Taylor
Jessie Salas
Irina-Elena Veliche
Xi Chen
195
10
0
22 May 2023
Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil
  Demographic Biases in Languages at Scale
Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at ScaleConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Marta R. Costa-jussá
Pierre Yves Andrews
Eric Michael Smith
Prangthip Hansanti
C. Ropers
Elahe Kalbassi
Cynthia Gao
Daniel Licht
Carleigh Wood
184
25
0
22 May 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data
  Age, Domain Coverage, Quality, & Toxicity
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & ToxicityNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
364
209
0
22 May 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
277
16
0
22 May 2023
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
SLaDe: A Portable Small Language Model Decompiler for Optimized AssemblyIEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2023
Jordi Armengol-Estapé
Jackson Woodruff
Chris Cummins
Michael F. P. O'Boyle
195
30
0
21 May 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech RecognitionInterspeech (Interspeech), 2023
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
160
19
0
21 May 2023
Machine Translation by Projecting Text into the Same
  Phonetic-Orthographic Space Using a Common Encoding
Machine Translation by Projecting Text into the Same Phonetic-Orthographic Space Using a Common Encoding
Amit Kumar
Shantipriya Parida
A. Pratap
Anil Kumar Singh
208
2
0
21 May 2023
Lifelong Language Pretraining with Distribution-Specialized Experts
Lifelong Language Pretraining with Distribution-Specialized ExpertsInternational Conference on Machine Learning (ICML), 2023
Wuyang Chen
Yan-Quan Zhou
Nan Du
Yanping Huang
James Laudon
Zhiwen Chen
Claire Cu
KELM
313
77
0
20 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
Marcely Zanon Boito
François Yvon
Hinrich Schütze
ALMLRM
289
136
0
20 May 2023
Can Public Large Language Models Help Private Cross-device Federated
  Learning?
Can Public Large Language Models Help Private Cross-device Federated Learning?
Wei Ping
Yibo Jacky Zhang
Yuan Cao
Yue Liu
H. B. McMahan
Sewoong Oh
Zheng Xu
Manzil Zaheer
FedML
376
46
0
20 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
418
141
0
19 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
DUB: Discrete Unit Back-translation for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
179
34
0
19 May 2023
Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model
Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chantal Amrhein
Florian Schottmann
Rico Sennrich
Samuel Läubli
246
21
0
18 May 2023
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for
  Longer Sequences
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer SequencesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
David C. Uthus
Santiago Ontañón
Joshua Ainslie
Mandy Guo
VLM
155
15
0
18 May 2023
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
  Translation
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
249
6
0
18 May 2023
Massively Multi-Lingual Event Understanding: Extraction, Visualization,
  and Search
Massively Multi-Lingual Event Understanding: Extraction, Visualization, and SearchAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chris Jenkins
Shantanu Agarwal
Joel Barry
Steven Fincke
Elizabeth Boschee
149
7
0
17 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Accelerating Transformer Inference for Translation via Parallel DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
266
117
0
17 May 2023
Searching for Needles in a Haystack: On the Role of Incidental
  Bilingualism in PaLM's Translation Capability
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation CapabilityAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Eleftheria Briakou
Colin Cherry
George F. Foster
173
77
0
17 May 2023
Previous
123...171819...404142
Next