ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 2,064 papers shown
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal
  Data
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
Xinzhe Li
Ming Liu
Shang Gao
MU
222
8
0
02 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language
  Understanding
SMILE: Evaluation and Domain Adaptation for Social Media Language UnderstandingKnowledge Discovery and Data Mining (KDD), 2023
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
158
5
0
30 Jun 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMsNeural Information Processing Systems (NeurIPS), 2023
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
360
69
0
30 Jun 2023
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and
  Few-shot Agents
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
M. Moradshahi
Shangda Wu
Kalika Bali
Monojit Choudhury
Gaël de Chalendar
...
Michael Sun
Aditya Yadavalli
Chaobin You
Deyi Xiong
M. Lam
243
15
0
30 Jun 2023
A Formal Perspective on Byte-Pair Encoding
A Formal Perspective on Byte-Pair EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Tim Vieira
Mrinmaya Sachan
Robert Bamler
205
45
0
29 Jun 2023
Accelerating Transducers through Adjacent Token Merging
Accelerating Transducers through Adjacent Token MergingInterspeech (Interspeech), 2023
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
171
6
0
28 Jun 2023
Extending Context Window of Large Language Models via Positional
  Interpolation
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
435
678
0
27 Jun 2023
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective
  Models on French Biomedical Data
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical DataInternational Conference on Language Resources and Evaluation (LREC), 2023
Rian Touchent
Laurent Romary
Eric Villemonte de la Clergerie
MedIm
212
7
0
27 Jun 2023
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English
  Parallel Corpus
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel CorpusNeural Information Processing Systems (NeurIPS), 2023
David C. Uthus
Garrett Tanzer
Manfred Georg
SLR
273
71
0
27 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species
  Genome
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
289
309
0
26 Jun 2023
MotionGPT: Human Motion as a Foreign Language
MotionGPT: Human Motion as a Foreign LanguageNeural Information Processing Systems (NeurIPS), 2023
Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen
MLLM
292
450
0
26 Jun 2023
Synthetic Alone: Exploring the Dark Side of Synthetic Data for
  Grammatical Error Correction
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
Chanjun Park
Seonmin Koo
Seolhwa Lee
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
167
0
0
26 Jun 2023
Resume Information Extraction via Post-OCR Text Processing
Resume Information Extraction via Post-OCR Text Processing
Selahattin Serdar Helli
Senem Tanberk
Sena Nur Cavsak
79
3
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MAAuLLMVLM
257
396
0
22 Jun 2023
Towards Accurate Translation via Semantically Appropriate Application of
  Lexical Constraints
Towards Accurate Translation via Semantically Appropriate Application of Lexical ConstraintsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yujin Baek
Ko-tik Lee
Dayeon Ki
Hyoung-Gyu Lee
Cheonbok Park
Jaegul Choo
278
5
0
21 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech RecognitionInterspeech (Interspeech), 2023
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
189
5
0
20 Jun 2023
Rehearsal-Free Online Continual Learning for Automatic Speech
  Recognition
Rehearsal-Free Online Continual Learning for Automatic Speech RecognitionInterspeech (Interspeech), 2023
Steven Vander Eeckt
Hugo Van hamme
CLL
113
5
0
19 Jun 2023
Guiding Language Models of Code with Global Context using Monitors
Guiding Language Models of Code with Global Context using Monitors
Lakshya A Agrawal
Aditya Kanade
Navin Goyal
Shuvendu K. Lahiri
S. Rajamani
335
33
0
19 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
  Representation
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech RepresentationInterspeech (Interspeech), 2023
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
Chuxu Zhang
Xie Chen
SSL
154
11
0
15 Jun 2023
Unified model for code-switching speech recognition and language
  identification based on a concatenated tokenizer
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
247
16
0
14 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using
  Simultaneous Interpretation Data
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation DataInternational Workshop on Spoken Language Translation (IWSLT), 2023
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
193
6
0
14 Jun 2023
CipherSniffer: Classifying Cipher Types
CipherSniffer: Classifying Cipher Types
Brendan Artley
G. Mehdiyev
43
1
0
13 Jun 2023
Tokenization with Factorized Subword Encoding
Tokenization with Factorized Subword EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
David Samuel
Lilja Øvrelid
191
2
0
13 Jun 2023
Modality Adaption or Regularization? A Case Study on End-to-End Speech
  Translation
Modality Adaption or Regularization? A Case Study on End-to-End Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yucheng Han
Chen Xu
Tong Xiao
Jingbo Zhu
205
6
0
13 Jun 2023
Measuring Sentiment Bias in Machine Translation
Measuring Sentiment Bias in Machine TranslationInternational Conference on Text, Speech and Dialogue (TSD), 2023
Kai Hartung
Aaricia Herygers
Shubham Kurlekar
Khabbab Zakaria
Taylan Volkan
Sören Gröttrup
Munir Georges
AI4CE
174
8
0
12 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for
  Automatic Speech Recognition
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech RecognitionInterspeech (Interspeech), 2023
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
149
1
0
12 Jun 2023
Learning Multilingual Sentence Representations with Cross-lingual
  Consistency Regularization
Learning Multilingual Sentence Representations with Cross-lingual Consistency RegularizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pengzhi Gao
Liwen Zhang
Zhongjun He
Hua Wu
Haifeng Wang
163
8
0
12 Jun 2023
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural
  Language Processing
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Asaad Alghamdi
Xinyu Duan
Wei Jiang
Zhenhai Wang
Yimeng Wu
...
Yifei Zheng
Mehdi Rezagholizadeh
Baoxing Huai
Peilun Cheng
Abbas Ghaddar
VLM
140
10
0
11 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and BenchmarkNeural Information Processing Systems (NeurIPS), 2023
Zhen-fei Yin
Zhenghao Hu
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Mengwei He
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
277
205
0
11 Jun 2023
Morphosyntactic probing of multilingual BERT models
Morphosyntactic probing of multilingual BERT modelsNatural Language Engineering (NLE), 2023
Judit Ács
Endre Hamerlik
Roy Schwartz
Noah A. Smith
András Kornai
188
17
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in
  End-to-End Automatic Speech Recognition
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech RecognitionInterspeech (Interspeech), 2023
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
121
4
0
09 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
KIT's Multilingual Speech Translation System for IWSLT 2023International Workshop on Spoken Language Translation (IWSLT), 2023
Danni Liu
Thai-Binh Nguyen
Sai Koneru
Enes Yavuz Ugan
Ngoc-Quan Pham
Tuan-Nam Nguyen
Tu Anh Dinh
Carlos Mullov
A. Waibel
Jan Niehues
179
8
0
08 Jun 2023
Privately generating tabular data using language models
Privately generating tabular data using language models
Alexandre Sablayrolles
Yue Wang
Brian Karrer
LMTD
163
5
0
07 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Zambezi Voice: A Multilingual Speech Corpus for Zambian LanguagesInterspeech (Interspeech), 2023
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
261
10
0
07 Jun 2023
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
  Augmentation
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based AugmentationInterspeech (Interspeech), 2023
Massa Baali
Ibrahim Almakky
Shady Shehata
Fakhri Karray
168
4
0
07 Jun 2023
LLMZip: Lossless Text Compression using Large Language Models
LLMZip: Lossless Text Compression using Large Language Models
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
D. Kalathil
J. Chamberland
S. Shakkottai
370
46
0
06 Jun 2023
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of
  Scientific Figure Captioning
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Zhishen Yang
Mary Dabre
Hideki Tanaka
Naoaki Okazaki
328
24
0
06 Jun 2023
Enhancing Language Representation with Constructional Information for
  Natural Language Understanding
Enhancing Language Representation with Constructional Information for Natural Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Lvxiaowei Xu
Jian Wu
Jiawei Peng
Zhilin Gong
Ming Cai
Tianxiang Wang
170
4
0
05 Jun 2023
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
End-to-End Word-Level Pronunciation Assessment with MASK Pre-trainingInterspeech (Interspeech), 2023
Yukang Liang
Kaitao Song
Shaoguang Mao
Huiqiang Jiang
Luna Qiu
Yuqing Yang
Dongsheng Li
Linli Xu
Lili Qiu
CVBM
152
8
0
05 Jun 2023
Cross-Lingual Transfer Learning for Phrase Break Prediction with
  Multilingual Language Model
Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language ModelInterspeech (Interspeech), 2023
Hoyeon Lee
Hyun-Wook Yoon
Jong-Hwan Kim
Jae-Min Kim
VLM
191
3
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
248
57
0
02 Jun 2023
Data-Efficient French Language Modeling with CamemBERTa
Data-Efficient French Language Modeling with CamemBERTaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Wissam Antoun
Benoît Sagot
Djamé Seddah
152
9
0
02 Jun 2023
Assessing the Importance of Frequency versus Compositionality for
  Subword-based Tokenization in NMT
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMTEuropean Association for Machine Translation Conferences/Workshops (EAMT), 2023
Benoist Wolleb
Romain Silvestri
Giorgos Vernikos
Ljiljana Dolamic
Ljiljana Dolamic Andrei Popescu-Belis
199
5
0
02 Jun 2023
Improved Training for End-to-End Streaming Automatic Speech Recognition
  Model with Punctuation
Improved Training for End-to-End Streaming Automatic Speech Recognition Model with PunctuationInterspeech (Interspeech), 2023
Hanbyul Kim
S. Seo
Lukas Lee
Seolki Baek
114
3
0
02 Jun 2023
Hierarchical Attention Encoder Decoder
Hierarchical Attention Encoder Decoder
Asier Mujika
BDL
229
4
0
01 Jun 2023
Strategies for improving low resource speech to text translation relying
  on pre-trained ASR models
Strategies for improving low resource speech to text translation relying on pre-trained ASR modelsInterspeech (Interspeech), 2023
Santosh Kesiraju
Marek Sarvaš
T. Pavlíček
Cécile Macaire
Alejandro Ciuba
162
8
0
31 May 2023
How to Plant Trees in Language Models: Data and Architectural Effects on
  the Emergence of Syntactic Inductive Biases
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Aaron Mueller
Tal Linzen
AI4CE
194
26
0
31 May 2023
Breeding Machine Translations: Evolutionary approach to survive and
  thrive in the world of automated evaluation
Breeding Machine Translations: Evolutionary approach to survive and thrive in the world of automated evaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Josef Jon
Ondrej Bojar
143
10
0
30 May 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at ScaleNeural Information Processing Systems (NeurIPS), 2023
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
231
44
0
30 May 2023
Towards Selection of Text-to-speech Data to Augment ASR Training
Towards Selection of Text-to-speech Data to Augment ASR Training
Shuo Liu
Leda Sari
Chunyang Wu
Gil Keren
Yuan Shangguan
Jay Mahadeokar
Ozlem Kalinli
113
5
0
30 May 2023
Previous
123...161718...404142
Next