Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 2,064 papers shown
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
Xinzhe Li
Ming Liu
Shang Gao
MU
222
8
0
02 Jul 2023
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Knowledge Discovery and Data Mining (KDD), 2023
Vasilisa Bashlovkina
Riley Matthews
Zhaobin Kuang
Simon Baumgartner
Michael Bendersky
158
5
0
30 Jun 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Neural Information Processing Systems (NeurIPS), 2023
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
360
69
0
30 Jun 2023
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
M. Moradshahi
Shangda Wu
Kalika Bali
Monojit Choudhury
Gaël de Chalendar
...
Michael Sun
Aditya Yadavalli
Chaobin You
Deyi Xiong
M. Lam
243
15
0
30 Jun 2023
A Formal Perspective on Byte-Pair Encoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Tim Vieira
Mrinmaya Sachan
Robert Bamler
205
45
0
29 Jun 2023
Accelerating Transducers through Adjacent Token Merging
Interspeech (Interspeech), 2023
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
171
6
0
28 Jun 2023
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
435
678
0
27 Jun 2023
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data
International Conference on Language Resources and Evaluation (LREC), 2023
Rian Touchent
Laurent Romary
Eric Villemonte de la Clergerie
MedIm
212
7
0
27 Jun 2023
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus
Neural Information Processing Systems (NeurIPS), 2023
David C. Uthus
Garrett Tanzer
Manfred Georg
SLR
273
71
0
27 Jun 2023
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou
Yanrong Ji
Weijian Li
Pratik Dutta
R. Davuluri
Han Liu
289
309
0
26 Jun 2023
MotionGPT: Human Motion as a Foreign Language
Neural Information Processing Systems (NeurIPS), 2023
Biao Jiang
Xin Chen
Wen Liu
Jingyi Yu
Gang Yu
Tao Chen
MLLM
292
450
0
26 Jun 2023
Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
Chanjun Park
Seonmin Koo
Seolhwa Lee
Jaehyung Seo
Sugyeong Eo
Hyeonseok Moon
Heu-Jeoung Lim
167
0
0
26 Jun 2023
Resume Information Extraction via Post-OCR Text Processing
Selahattin Serdar Helli
Senem Tanberk
Sena Nur Cavsak
79
3
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
257
396
0
22 Jun 2023
Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yujin Baek
Ko-tik Lee
Dayeon Ki
Hyoung-Gyu Lee
Cheonbok Park
Jaegul Choo
278
5
0
21 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Interspeech (Interspeech), 2023
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
189
5
0
20 Jun 2023
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Interspeech (Interspeech), 2023
Steven Vander Eeckt
Hugo Van hamme
CLL
113
5
0
19 Jun 2023
Guiding Language Models of Code with Global Context using Monitors
Lakshya A Agrawal
Aditya Kanade
Navin Goyal
Shuvendu K. Lahiri
S. Rajamani
335
33
0
19 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Interspeech (Interspeech), 2023
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
Chuxu Zhang
Xie Chen
SSL
154
11
0
15 Jun 2023
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
247
16
0
14 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
International Workshop on Spoken Language Translation (IWSLT), 2023
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
193
6
0
14 Jun 2023
CipherSniffer: Classifying Cipher Types
Brendan Artley
G. Mehdiyev
43
1
0
13 Jun 2023
Tokenization with Factorized Subword Encoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
David Samuel
Lilja Øvrelid
191
2
0
13 Jun 2023
Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yucheng Han
Chen Xu
Tong Xiao
Jingbo Zhu
205
6
0
13 Jun 2023
Measuring Sentiment Bias in Machine Translation
International Conference on Text, Speech and Dialogue (TSD), 2023
Kai Hartung
Aaricia Herygers
Shubham Kurlekar
Khabbab Zakaria
Taylan Volkan
Sören Gröttrup
Munir Georges
AI4CE
174
8
0
12 Jun 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Interspeech (Interspeech), 2023
Belen Alastruey
Lukas Drude
Jahn Heymann
Simon Wiesler
149
1
0
12 Jun 2023
Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Pengzhi Gao
Liwen Zhang
Zhongjun He
Hua Wu
Haifeng Wang
163
8
0
12 Jun 2023
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Asaad Alghamdi
Xinyu Duan
Wei Jiang
Zhenhai Wang
Yimeng Wu
...
Yifei Zheng
Mehdi Rezagholizadeh
Baoxing Huai
Peilun Cheng
Abbas Ghaddar
VLM
140
10
0
11 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Neural Information Processing Systems (NeurIPS), 2023
Zhen-fei Yin
Zhenghao Hu
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Mengwei He
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
277
205
0
11 Jun 2023
Morphosyntactic probing of multilingual BERT models
Natural Language Engineering (NLE), 2023
Judit Ács
Endre Hamerlik
Roy Schwartz
Noah A. Smith
András Kornai
188
17
0
09 Jun 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Interspeech (Interspeech), 2023
Xianzhao Chen
Yist Y. Lin
Kang Wang
Yi He
Zejun Ma
121
4
0
09 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
International Workshop on Spoken Language Translation (IWSLT), 2023
Danni Liu
Thai-Binh Nguyen
Sai Koneru
Enes Yavuz Ugan
Ngoc-Quan Pham
Tuan-Nam Nguyen
Tu Anh Dinh
Carlos Mullov
A. Waibel
Jan Niehues
179
8
0
08 Jun 2023
Privately generating tabular data using language models
Alexandre Sablayrolles
Yue Wang
Brian Karrer
LMTD
163
5
0
07 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Interspeech (Interspeech), 2023
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
261
10
0
07 Jun 2023
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Interspeech (Interspeech), 2023
Massa Baali
Ibrahim Almakky
Shady Shehata
Fakhri Karray
168
4
0
07 Jun 2023
LLMZip: Lossless Text Compression using Large Language Models
Chandra Shekhara Kaushik Valmeekam
Krishna R. Narayanan
D. Kalathil
J. Chamberland
S. Shakkottai
370
46
0
06 Jun 2023
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Zhishen Yang
Mary Dabre
Hideki Tanaka
Naoaki Okazaki
328
24
0
06 Jun 2023
Enhancing Language Representation with Constructional Information for Natural Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Lvxiaowei Xu
Jian Wu
Jiawei Peng
Zhilin Gong
Ming Cai
Tianxiang Wang
170
4
0
05 Jun 2023
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training
Interspeech (Interspeech), 2023
Yukang Liang
Kaitao Song
Shaoguang Mao
Huiqiang Jiang
Luna Qiu
Yuqing Yang
Dongsheng Li
Linli Xu
Lili Qiu
CVBM
152
8
0
05 Jun 2023
Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model
Interspeech (Interspeech), 2023
Hoyeon Lee
Hyun-Wook Yoon
Jong-Hwan Kim
Jae-Min Kim
VLM
191
3
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2023
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
248
57
0
02 Jun 2023
Data-Efficient French Language Modeling with CamemBERTa
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Wissam Antoun
Benoît Sagot
Djamé Seddah
152
9
0
02 Jun 2023
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT
European Association for Machine Translation Conferences/Workshops (EAMT), 2023
Benoist Wolleb
Romain Silvestri
Giorgos Vernikos
Ljiljana Dolamic
Ljiljana Dolamic Andrei Popescu-Belis
199
5
0
02 Jun 2023
Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Interspeech (Interspeech), 2023
Hanbyul Kim
S. Seo
Lukas Lee
Seolki Baek
114
3
0
02 Jun 2023
Hierarchical Attention Encoder Decoder
Asier Mujika
BDL
229
4
0
01 Jun 2023
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Interspeech (Interspeech), 2023
Santosh Kesiraju
Marek Sarvaš
T. Pavlíček
Cécile Macaire
Alejandro Ciuba
162
8
0
31 May 2023
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Aaron Mueller
Tal Linzen
AI4CE
194
26
0
31 May 2023
Breeding Machine Translations: Evolutionary approach to survive and thrive in the world of automated evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Josef Jon
Ondrej Bojar
143
10
0
30 May 2023
Intriguing Properties of Quantization at Scale
Neural Information Processing Systems (NeurIPS), 2023
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
231
44
0
30 May 2023
Towards Selection of Text-to-speech Data to Augment ASR Training
Shuo Liu
Leda Sari
Chunyang Wu
Gil Keren
Yuan Shangguan
Jay Mahadeokar
Ozlem Kalinli
113
5
0
30 May 2023
Previous
1
2
3
...
16
17
18
...
40
41
42
Next