Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 2,064 papers shown
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Automatic Speech Recognition & Understanding (ASRU), 2023
Dennis Fucci
Marco Gaido
Matteo Negri
Mauro Cettolo
L. Bentivogli
189
8
0
10 Oct 2023
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siyang Liu
Naihao Deng
Sahand Sabour
Yilin Jia
Shiyu Huang
Amélie Reymond
386
26
0
09 Oct 2023
Neural Language Model Pruning for Automatic Speech Recognition
Leonardo Emili
Thiago Fraga-Silva
Ernest Pusateri
M. Nußbaum-Thom
Youssef Oualil
222
3
0
05 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
International Conference on Learning Representations (ICLR), 2023
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
551
97
0
04 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiming Wang
Jinyu Li
195
11
0
03 Oct 2023
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
International Conference on Learning Representations (ICLR), 2023
Brian DuSell
David Chiang
394
15
0
03 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
250
27
0
02 Oct 2023
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Junjie Yang
Liang Ding
Li Shen
Matthieu Labeau
Yibing Zhan
Weifeng Liu
Dacheng Tao
VLM
266
5
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
International Conference on Learning Representations (ICLR), 2023
Albert Mohwald
249
26
0
28 Sep 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
253
8
0
27 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
288
5
0
27 Sep 2023
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Hussein
Brian Yan
Antonios Anastasopoulos
Shinji Watanabe
Sanjeev Khudanpur
177
8
0
27 Sep 2023
Speech collage: code-switched audio generation by collaging monolingual corpora
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Hussein
Dorsa Zeinali
Ondˇrej Klejch
Sanjeev Khudanpur
Brian Yan
Shammur A. Chowdhury
Ahmed M. Ali
Shinji Watanabe
Sanjeev Khudanpur
210
10
0
27 Sep 2023
Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023
International Workshop on Spoken Language Translation (IWSLT), 2023
Sara Papi
Marco Gaido
Matteo Negri
243
8
0
27 Sep 2023
Segmentation-Free Streaming Machine Translation
Transactions of the Association for Computational Linguistics (TACL), 2023
Javier Iranzo-Sánchez
Jorge Iranzo-Sánchez
Adria Giménez
Jorge Civera Saiz
Alfons Juan
VOS
242
3
0
26 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
International Conference on Learning Representations (ICLR), 2023
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
319
135
0
25 Sep 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Automatic Speech Recognition & Understanding (ASRU), 2023
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
347
60
0
25 Sep 2023
Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR
Automatic Speech Recognition & Understanding (ASRU), 2023
Sheikh Shams Azam
Tatiana Likhomanenko
Martin Pelikan
Jan Honza Silovsky
185
7
0
22 Sep 2023
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts
Emad A. Alghamdi
Jezia Zakraoui
Fares A. Abanmy
281
3
0
22 Sep 2023
JCoLA: Japanese Corpus of Linguistic Acceptability
International Conference on Language Resources and Evaluation (LREC), 2023
Taiga Someya
Yushi Sugimoto
Yohei Oseki
212
13
0
22 Sep 2023
Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Bar Iluz
Tomasz Limisiewicz
Gabriel Stanovsky
David Marevcek
195
7
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
260
90
0
20 Sep 2023
Sequence-to-Sequence Spanish Pre-trained Language Models
International Conference on Language Resources and Evaluation (LREC), 2023
Vladimir Araujo
Maria Mihaela Truşcǎ
Rodrigo Tufino
Marie-Francine Moens
373
5
0
20 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
217
11
0
20 Sep 2023
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
International Conference on Learning Representations (ICLR), 2023
M. Finkelstein
Subhajit Naskar
Mehdi Mirzazadeh
Apurva Shah
Markus Freitag
413
36
0
19 Sep 2023
A Family of Pretrained Transformer Language Models for Russian
International Conference on Language Resources and Evaluation (LREC), 2023
Dmitry Zmitrovich
Alexander Abramov
Andrey Kalmykov
Maria Tikhonova
Ekaterina Taktasheva
...
Vitalii Kadulin
Sergey Markov
Tatiana Shavrina
Vladislav Mikhailov
Alena Fenogenova
318
50
0
19 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
137
26
0
19 Sep 2023
Language Modeling Is Compression
International Conference on Learning Representations (ICLR), 2023
Grégoire Delétang
Anian Ruoss
Paul-Ambroise Duquenne
Elliot Catt
Tim Genewein
...
Wenliang Kevin Li
Matthew Aitchison
Laurent Orseau
Marcus Hutter
J. Veness
AI4CE
418
201
0
19 Sep 2023
Nebula: Self-Attention for Dynamic Malware Analysis
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023
Dmitrijs Trizna
Christian Scano
Battista Biggio
Fabio Roli
269
30
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
803
927
0
19 Sep 2023
Adapting Large Language Models via Reading Comprehension
Daixuan Cheng
Shaohan Huang
Furu Wei
CLL
SyDa
AI4CE
348
64
0
18 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Interspeech (Interspeech), 2023
Jing Liu
Jianwei Yu
Xie Chen
330
2
0
18 Sep 2023
How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Danni Liu
Jan Niehues
215
3
0
15 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
269
16
0
15 Sep 2023
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shiyi Zhu
Jingting Ye
Wei Jiang
Siqiao Xue
Qi Zhang
Yifan Wu
Jianguo Li
141
6
0
15 Sep 2023
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
162
9
0
14 Sep 2023
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Interspeech (Interspeech), 2023
Peng Wang
Yifan Yang
Zheng Liang
Tian Tan
Shiliang Zhang
Xie Chen
206
1
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
360
86
0
14 Sep 2023
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Dimitris Spathis
F. Kawsar
AI4TS
191
42
0
12 Sep 2023
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Tuan Dung Nguyen
Yuan-Sen Ting
I. Ciucă
Charlie OÑeill
Ze-Chang Sun
...
Alberto Accomazzi
J. P. Naiman
Jesse Cranney
Kevin Schawinski
UniverseTBD
173
37
0
12 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Computer Speech and Language (CSL), 2023
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
Fabien Ringeval
D. Schwab
Laurent Besacier
261
26
0
11 Sep 2023
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Neural Information Processing Systems (NeurIPS), 2023
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
...
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
285
200
0
09 Sep 2023
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
European Signal Processing Conference (EUSIPCO), 2023
Huaibo Zhao
Yosuke Higuchi
Yusuke Kida
Tetsuji Ogawa
Tetsunori Kobayashi
195
1
0
09 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
297
59
0
05 Sep 2023
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shangguan Yuan
Haichuan Yang
Danni Li
Chunyang Wu
Yassir Fathullah
...
Junteng Jia
Jay Mahadeokar
Xin Lei
Michael Seltzer
Vikas Chandra
261
3
0
05 Sep 2023
One Wide Feedforward is All You Need
Conference on Machine Translation (WMT), 2023
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
243
18
0
04 Sep 2023
Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension
Anushrut Jignasu
Kelly O. Marshall
Baskar Ganapathysubramanian
Aditya Balu
Chinmay Hegde
A. Krishnamurthy
ELM
AI4CE
133
12
0
04 Sep 2023
Multilingual Text Representation
Fahim Faisal
203
0
0
02 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhichao Huang
Chutong Meng
Tom Ko
212
41
0
31 Aug 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
360
237
0
31 Aug 2023
Previous
1
2
3
...
14
15
16
...
40
41
42
Next