Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.06226
Cited By
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
19 August 2018
Taku Kudo
John Richardson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"
50 / 1,923 papers shown
Title
M
3
^3
3
AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
57
1
0
21 Mar 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett
Pamela D. Rivière
Tyler A. Chang
Sean Trott
24
2
0
20 Mar 2024
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
37
6
0
20 Mar 2024
Self-generated Replay Memories for Continual Neural Machine Translation
Michele Resta
Davide Bacciu
CLL
28
2
0
19 Mar 2024
Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
Zhixue Zhao
Nikolaos Aletras
37
3
0
19 Mar 2024
Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems
Bo-Han Lu
Yi-Hsuan Lin
En-Shiun Annie Lee
Richard Tzong-Han Tsai
40
0
0
18 Mar 2024
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
Changsu Choi
Yongbin Jeong
Seoyoon Park
Inho Won
HyeonSeok Lim
...
Yiseul Lee
HyeJin Lee
Younggyun Hahm
Hansaem Kim
Kyungtae Lim
37
11
0
16 Mar 2024
Exploring Chinese Humor Generation: A Study on Two-Part Allegorical Sayings
Rongwu Xu
40
2
0
16 Mar 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
DiPaCo: Distributed Path Composition
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
A. Kuncoro
Yani Donchev
Rachita Chhaparia
Ionel Gog
MarcÁurelio Ranzato
Jiajun Shen
Arthur Szlam
MoE
53
2
0
15 Mar 2024
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär
N. Houlsby
Mostafa Dehghani
Manoj Kumar
VLM
36
4
0
15 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
51
8
0
14 Mar 2024
Token Alignment via Character Matching for Subword Completion
Ben Athiwaratkun
Shiqi Wang
Mingyue Shang
Yuchen Tian
Zijian Wang
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Rob Kwiatowski
Ramesh Nallapati
Bing Xiang
50
4
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
131
441
0
13 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
52
18
0
12 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu
Jiaxing Huang
Peng Gao
Lewei Lu
Xiaoqin Zhang
Shijian Lu
51
4
0
12 Mar 2024
MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
Timothee Mickus
Stig-Arne Gronroos
Joseph Attieh
M. Boggia
Ona de Gibert
Shaoxiong Ji
Niki Andreas Lopi
Alessandro Raganato
Raúl Vázquez
Jörg Tiedemann
33
4
0
12 Mar 2024
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
Can Cui
Imran Ahmad Sheikh
Mostafa Sadeghi
Emmanuel Vincent
47
2
0
11 Mar 2024
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages
Michael Andersland
25
0
0
11 Mar 2024
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
Aisha Khatun
Anisur Rahman
Md. Saiful Islam
Hemayet Ahmed Chowdhury
A. Tasnim
31
3
0
08 Mar 2024
To Err Is Human, but Llamas Can Learn It Too
Agnes Luhtaru
Taido Purason
Martin Vainikko
Maksym Del
Mark Fishel
SyDa
ALM
46
2
0
08 Mar 2024
FFSTC: Fongbe to French Speech Translation Corpus
D. F. Kponou
F. Laleye
E. C. Ezin
29
0
0
08 Mar 2024
Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Shochro Hoshino
Akihiko Kato
Soichiro Murakami
Peinan Zhang
32
1
0
08 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
512
0
07 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim M. Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
47
17
0
07 Mar 2024
gaHealth: An English-Irish Bilingual Corpus of Health Data
Séamus Lankford
Haithem Afli
Orla Ni Loinsigh
Andy Way
54
9
0
06 Mar 2024
BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation
Carinne Cherf
Yuval Pinter
19
1
0
06 Mar 2024
Towards Training A Chinese Large Language Model for Anesthesiology
Zhonghai Wang
Jie Jiang
Yibing Zhan
Bohao Zhou
Yanhong Li
...
Liang Ding
Hua Jin
Jun Peng
Xu Lin
Weifeng Liu
LM&MA
43
3
0
05 Mar 2024
adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds
Séamus Lankford
Haithem Afli
Andy Way
37
28
0
04 Mar 2024
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
45
5
0
04 Mar 2024
Transformers for Low-Resource Languages:Is Féidir Linn!
Séamus Lankford
H. Alfi
Tamás Sarlós
40
17
0
04 Mar 2024
adaptNMT: an open-source, language-agnostic development environment for Neural Machine Translation
Séamus Lankford
Haithem Afli
Andy Way
34
3
0
04 Mar 2024
Human Evaluation of English--Irish Transformer-Based NMT
Séamus Lankford
Haithem Afli
Andy Way
45
10
0
04 Mar 2024
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Amal Rannen-Triki
J. Bornschein
Razvan Pascanu
Marcus Hutter
Andras Gyorgy
Alexandre Galashov
Yee Whye Teh
Michalis K. Titsias
KELM
33
1
0
03 Mar 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
41
2
0
03 Mar 2024
VNLP: Turkish NLP Package
Meliksah Turker
Mehmet Erdi Ari
Aydin Han
45
1
0
02 Mar 2024
VBART: The Turkish LLM
Meliksah Turker
Mehmet Erdi Ari
Aydin Han
VLM
39
4
0
02 Mar 2024
Machine Translation in the Covid domain: an English-Irish case study for LoResMT 2021
Séamus Lankford
Haithem Afli
Andy Way
52
14
0
02 Mar 2024
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
Jinbiao Yang
LLMAG
105
11
0
01 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
46
0
0
01 Mar 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining
Tsz Kin Lam
Alexandra Birch
Barry Haddow
61
2
0
29 Feb 2024
Robust Guidance for Unsupervised Data Selection: Capturing Perplexing Named Entities for Domain-Specific Machine Translation
Seunghyun Ji
H. R. Sinulingga
Darongsae Kwon
46
1
0
29 Feb 2024
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
35
12
0
29 Feb 2024
Advancing Generative AI for Portuguese with Open Decoder Gervásio PT*
Rodrigo Santos
Joao Silva
Luís Gomes
João Rodrigues
António Branco
46
10
0
29 Feb 2024
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
61
28
0
28 Feb 2024
A Language Model based Framework for New Concept Placement in Ontologies
Hang Dong
Jiaoyan Chen
Yuan He
Yongsheng Gao
Ian Horrocks
40
7
0
27 Feb 2024
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning
Qizhi Pei
Lijun Wu
Kaiyuan Gao
Xiaozhuan Liang
Yin Fang
Jinhua Zhu
Shufang Xie
Tao Qin
Rui Yan
AI4CE
51
33
0
27 Feb 2024
Nemotron-4 15B Technical Report
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
M. Patwary
Sandeep Subramanian
...
Ashwath Aithal
Oleksii Kuchaiev
M. Shoeybi
Jonathan Cohen
Bryan Catanzaro
39
22
0
26 Feb 2024
Quantum linear algebra is all you need for Transformer architectures
Naixu Guo
Zhan Yu
Matthew Choi
Aman Agrawal
Kouhei Nakaji
Alán Aspuru-Guzik
Patrick Rebentrost
AI4CE
35
16
0
26 Feb 2024
Generative AI in Vision: A Survey on Models, Metrics and Applications
Gaurav Raut
Apoorv Singh
VLM
MedIm
43
6
0
26 Feb 2024
Previous
1
2
3
...
7
8
9
...
37
38
39
Next