Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv (abs)
PDF
HTML
Github (10925★)
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 2,063 papers shown
Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
Cyrile Delestre
Yoann Sola
84
0
0
10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications
Spoken Language Technology Workshop (SLT), 2024
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
289
0
0
09 Oct 2024
Generative Model for Less-Resourced Language with 1 billion parameters
Domen Vreš
Martin Božič
Aljaž Potočnik
Tomaž Martinčič
Marko Robnik-Šikonja
171
3
0
09 Oct 2024
Inference over Unseen Entities, Relations and Literals on Knowledge Graphs
Caglar Demir
N'Dah Jean Kouagou
Arnab Sharma
Axel-Cyrille Ngonga Ngomo
177
0
0
09 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
International Conference on Learning Representations (ICLR), 2024
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
1.4K
2
0
07 Oct 2024
Language Model-Driven Data Pruning Enables Efficient Active Learning
Abdul Hameed Azeemi
I. Qazi
Agha Ali Raza
VLM
283
4
0
05 Oct 2024
Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Gunjan Balde
Soumyadeep Roy
Mainack Mondal
Niloy Ganguly
137
4
0
04 Oct 2024
Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Seonjeong Hwang
Yunsu Kim
Gary Geunbae Lee
214
3
0
04 Oct 2024
MELODI: Exploring Memory Compression for Long Contexts
International Conference on Learning Representations (ICLR), 2024
Yinpeng Chen
DeLesley Hutchins
Aren Jansen
Andrey Zhmoginov
David Racz
Jesper Andersen
194
3
0
04 Oct 2024
No Need to Talk: Asynchronous Mixture of Language Models
International Conference on Learning Representations (ICLR), 2024
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
369
3
0
04 Oct 2024
Morphological evaluation of subwords vocabulary used by BETO language model
Óscar García-Sierra
Ana Fernández-Pampillón Cesteros
Miguel Ortega-Martín
216
0
0
03 Oct 2024
Selective Attention Improves Transformer
International Conference on Learning Representations (ICLR), 2024
Yaniv Leviathan
Matan Kalman
Yossi Matias
346
20
0
03 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
International Conference on Learning Representations (ICLR), 2024
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
960
0
0
03 Oct 2024
Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chengyuan Liu
Shihang Wang
Lizhi Qing
Kun Kuang
Yangyang Kang
Changlong Sun
Fei Wu
160
9
0
02 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
155
3
0
01 Oct 2024
Enhancing High-order Interaction Awareness in LLM-based Recommender Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xinfeng Wang
Jin Cui
Fumiyo Fukumoto
Yoshimi Suzuki
241
11
0
30 Sep 2024
Universal Medical Image Representation Learning with Compositional Decoders
Kaini Wang
Ling Yang
Siping Zhou
Guangquan Zhou
Wentao Zhang
Bin Cui
Shuo Li
SSL
MedIm
288
1
0
30 Sep 2024
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
437
11
0
30 Sep 2024
Exploring Language Model Generalization in Low-Resource Extractive QA
International Conference on Computational Linguistics (COLING), 2024
Saptarshi Sengupta
Wenpeng Yin
Preslav Nakov
Shreya Ghosh
Suhang Wang
285
5
0
27 Sep 2024
Convolutional Signal Propagation: A Simple Scalable Algorithm for Hypergraphs
Pavel Procházka
Marek Dědič
Lukáš Bajer
GNN
206
0
0
26 Sep 2024
LangSAMP: Language-Script Aware Multilingual Pretraining
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yihong Liu
Haotian Ye
Chunlan Ma
Mingyang Wang
Hinrich Schütze
VLM
511
2
0
26 Sep 2024
How Transliterations Improve Crosslingual Alignment
International Conference on Computational Linguistics (COLING), 2024
Yihong Liu
Mingyang Wang
Amir Hossein Kargaran
Ayyoob Imani
Orgest Xhelili
Haotian Ye
Chunlan Ma
François Yvon
Hinrich Schütze
206
4
0
25 Sep 2024
EuroLLM: Multilingual Language Models for Europe
Pedro Henrique Martins
Patrick Fernandes
Joao Alves
Nuno M. Guerreiro
Ricardo Rei
...
Pierre Colombo
Barry Haddow
José G. C. de Souza
Alexandra Birch
André F. T. Martins
228
79
0
24 Sep 2024
Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain
Conference on Machine Translation (WMT), 2024
Yuanchang Luo
Zhanglin Wu
Daimeng Wei
Hengchao Shang
Zongyao Li
...
Shaojun Li
Jinlong Yang
Yuhao Xie
Jiawei Zheng Bin Wei
Hao Yang
112
1
0
24 Sep 2024
Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning
Conference on Machine Translation (WMT), 2024
Bin Wei
Jiawei Zhen
Zongyao Li
Zhanglin Wu
Daimeng Wei
...
Yuanchang Luo
Hengchao Shang
Jinlong Yang
Yuhao Xie
Hao Yang
VLM
127
8
0
24 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
172
3
0
24 Sep 2024
HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks
Zhanglin Wu
Yuanchang Luo
Daimeng Wei
Jiawei Zheng
Bin Wei
...
Jiaxin Guo
Shaojun Li
Mengli Zhu
Ning Xie
Hao Yang
206
2
0
23 Sep 2024
Choose the Final Translation from NMT and LLM hypotheses Using MBR Decoding: HW-TSC's Submission to the WMT24 General MT Shared Task
Conference on Machine Translation (WMT), 2024
Zhanglin Wu
Daimeng Wei
Zongyao Li
Hengchao Shang
Jiaxin Guo
Shaojun Li
Zhiqiang Rao
Yuanchang Luo
Ning Xie
Hao Yang
187
7
0
23 Sep 2024
Cross-Domain Content Generation with Domain-Specific Small Language Models
Ankit Maloo
Abhinav Garg
CLL
214
0
0
19 Sep 2024
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
International Conference on Machine Learning (ICML), 2024
Hitesh Tulsiani
David M. Chan
Shalini Ghosh
Garima Lalwani
Prabhat Pandey
Ankish Bansal
Sri Garimella
Ariya Rastrow
Björn Hoffmeister
175
0
0
16 Sep 2024
PixelBytes: Catching Unified Representation for Multimodal Generation
Fabien Furfaro
123
1
0
16 Sep 2024
DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification
Abdelkader El Mahdaouy
Salima Lamsiyah
Meryem Janati Idrissi
H. Alami
Zakaria Yartaoui
Ismail Berrada
142
8
0
13 Sep 2024
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Siqi Li
Danni Liu
Jan Niehues
270
3
0
13 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
European Conference on Artificial Intelligence (ECAI), 2024
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
533
0
0
12 Sep 2024
TeXBLEU: Automatic Metric for Evaluate LaTeX Format
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Kyudan Jung
N. Kim
Hyongon Ryu
Sieun Hyeon
Seung-jun Lee
Hyeok-jae Lee
314
3
0
10 Sep 2024
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Pavel Chizhov
Catherine Arnett
Elizaveta Korotkova
Ivan P. Yamshchikov
236
14
0
06 Sep 2024
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak
Conference on Machine Translation (WMT), 2024
Mukhammadsaid Mamasaidov
Abror Shopulatov
VLM
109
7
0
06 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
International Conference on Learning Representations (ICLR), 2024
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
322
23
0
05 Sep 2024
Multi-modal Situated Reasoning in 3D Scenes
Neural Information Processing Systems (NeurIPS), 2024
Xiongkun Linghu
Jiangyong Huang
Xuesong Niu
Xiaojian Ma
Baoxiong Jia
Siyuan Huang
358
42
0
04 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Spoken Language Technology Workshop (SLT), 2024
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
226
4
0
02 Sep 2024
Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts
International Conference on Computational Linguistics (COLING), 2024
Yingfa Chen
Chenlong Hu
Cong Feng
Chenyang Song
Shi Yu
Xu Han
Zhiyuan Liu
Maosong Sun
155
0
0
02 Sep 2024
Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation
European Association for Machine Translation Conferences/Workshops (EAMT), 2024
Esther Ploeger
Huiyuan Lai
Rik van Noord
Antonio Toral
183
4
0
30 Aug 2024
Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions
Sully F. Chen
Robert J. Steele
Glen M. Hocky
Beakal Lemeneh
S. Lad
Eric Oermann
AI4CE
344
0
0
29 Aug 2024
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough
Konstantin Dobler
Gerard de Melo
204
4
0
28 Aug 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
Bing Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
247
2
0
28 Aug 2024
Positional Description for Numerical Normalization
Interspeech (Interspeech), 2024
Deepanshu Gupta
Javier Latorre
3DGS
159
0
0
22 Aug 2024
Distributional Properties of Subword Regularization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Marco Cognetta
Vilém Zouhar
Naoaki Okazaki
176
0
0
21 Aug 2024
Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
Conference on Machine Translation (WMT), 2024
Sai Koneru
Matthias Huck
M. Exel
Jan Niehues
191
0
0
21 Aug 2024
Goldfish: Monolingual Language Models for 350 Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
271
18
0
19 Aug 2024
Language-Informed Beam Search Decoding for Multilingual Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yilin Yang
Stefan Lee
Prasad Tadepalli
166
2
0
11 Aug 2024
Previous
1
2
3
...
5
6
7
...
40
41
42
Next