Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08849
Cited By
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
15 November 2023
Yihong Liu
Peiqin Lin
Mingyang Wang
Hinrich Schütze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining"
10 / 10 papers shown
Title
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
25
0
0
05 May 2025
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
Enes Özeren
Yihong Liu
Hinrich Schütze
28
0
0
21 Apr 2025
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
32
2
0
12 Oct 2024
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models
Nandini Mundra
Aditya Nanda Kishore
Raj Dabre
Ratish Puduppully
Anoop Kunchukuttan
Mitesh Khapra
25
3
0
08 Jul 2024
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Peiqin Lin
André F. T. Martins
Hinrich Schütze
49
2
0
29 Jun 2024
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Peiqin Lin
André F. T. Martins
Hinrich Schütze
RALM
45
2
0
08 May 2024
A study of conceptual language similarity: comparison and evaluation
Haotian Ye
Yihong Liu
Hinrich Schütze
11
2
0
22 May 2023
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models
Terra Blevins
Hila Gonen
Luke Zettlemoyer
LRM
52
26
0
24 May 2022
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
93
142
0
24 Oct 2020
Multilingual BERT Post-Pretraining Alignment
Lin Pan
Chung-Wei Hang
Haode Qi
Abhishek Shah
Saloni Potdar
Mo Yu
94
44
0
23 Oct 2020
1