ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14268
  4. Cited By
How Robust is Neural Machine Translation to Language Imbalance in
  Multilingual Tokenizer Training?

How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

29 April 2022
Shiyue Zhang
Vishrav Chaudhary
Naman Goyal
James Cross
Guillaume Wenzek
Mohit Bansal
Francisco Guzman
ArXivPDFHTML

Papers citing "How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?"

16 / 16 papers shown
Title
Tokenization is Sensitive to Language Variation
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
75
1
0
24 Feb 2025
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Hyunji Lee
Danni Liu
Supriti Sinhamahapatra
Jan Niehues
103
0
0
21 Feb 2025
Investigating the translation capabilities of Large Language Models
  trained on parallel data only
Investigating the translation capabilities of Large Language Models trained on parallel data only
Javier García Gilabert
Carlos Escolano
Aleix Sant Savall
Francesca de Luca Fornaciari
Audrey Mash
Xixian Liao
Maite Melero
LRM
36
2
0
13 Jun 2024
Comparing Explanation Faithfulness between Multilingual and Monolingual
  Fine-tuned Language Models
Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
Zhixue Zhao
Nikolaos Aletras
24
3
0
19 Mar 2024
Tokenizer Choice For LLM Training: Negligible or Crucial?
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Richard Rutmann
Max Lübbering
...
Malte Ostendorff
Samuel Weinbach
R. Sifa
Stefan Kesselheim
Nicolas Flores-Herr
13
47
0
12 Oct 2023
Do All Languages Cost the Same? Tokenization in the Era of Commercial
  Language Models
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
23
80
0
23 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip H. S. Torr
Adel Bibi
16
96
0
17 May 2023
MEGA: Multilingual Evaluation of Generative AI
MEGA: Multilingual Evaluation of Generative AI
Kabir Ahuja
Harshita Diddee
Rishav Hada
Millicent Ochieng
Krithika Ramesh
...
T. Ganu
Sameer Segal
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
LM&MA
LRM
ELM
13
262
0
22 Mar 2023
Incorporating Context into Subword Vocabularies
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
27
8
0
13 Oct 2022
Multilingual Bidirectional Unsupervised Translation Through Multilingual
  Finetuning and Back-Translation
Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation
Bryan Li
Mohammad Sadegh Rasooli
Ajay Patel
Chris Callison-Burch
26
4
0
06 Sep 2022
Language Modelling with Pixels
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
20
46
0
14 Jul 2022
Masked Part-Of-Speech Model: Does Modeling Long Context Help
  Unsupervised POS-tagging?
Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?
Xiang Zhou
Shiyue Zhang
Mohit Bansal
9
0
0
30 Jun 2022
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
65
0
24 Oct 2020
Six Challenges for Neural Machine Translation
Six Challenges for Neural Machine Translation
Philipp Koehn
Rebecca Knowles
AAML
AIMat
208
1,202
0
12 Jun 2017
Multi-Way, Multilingual Neural Machine Translation with a Shared
  Attention Mechanism
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
Orhan Firat
Kyunghyun Cho
Yoshua Bengio
LRM
AIMat
206
622
0
06 Jan 2016
1