ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.12777
  4. Cited By
Improving Multilingual Models with Language-Clustered Vocabularies

Improving Multilingual Models with Language-Clustered Vocabularies

24 October 2020
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
    VLM
ArXivPDFHTML

Papers citing "Improving Multilingual Models with Language-Clustered Vocabularies"

15 / 15 papers shown
Title
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
24
2
0
12 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
36
0
0
07 Oct 2024
SUTRA: Scalable Multilingual Language Model Architecture
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
21
4
0
07 May 2024
A Systematic Analysis of Subwords and Cross-Lingual Transfer in
  Multilingual Translation
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Francois Meyer
Jan Buys
16
1
0
29 Mar 2024
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Fei Yuan
Shuai Yuan
Zhiyong Wu
Lei Li
12
9
0
15 Nov 2023
Analyzing Cognitive Plausibility of Subword Tokenization
Analyzing Cognitive Plausibility of Subword Tokenization
Lisa Beinborn
Yuval Pinter
11
17
0
20 Oct 2023
CodeBPE: Investigating Subtokenization Options for Large Language Model
  Pretraining on Source Code
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
Nadezhda Chirkova
Sergey Troshin
8
8
0
01 Aug 2023
An Efficient Multilingual Language Model Compression through Vocabulary
  Trimming
An Efficient Multilingual Language Model Compression through Vocabulary Trimming
Asahi Ushio
Yi Zhou
Jose Camacho-Collados
17
7
0
24 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
18
95
0
20 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip H. S. Torr
Adel Bibi
14
96
0
17 May 2023
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using
  Multilingual BERT
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT
Beiduo Chen
Wu Guo
Quan Liu
Kun Tao
8
1
0
17 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular
  Transformers
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Victoria Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
16
138
0
12 May 2022
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
23
137
0
20 Dec 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1