Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2109.04607
Cited By
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
10 September 2021
Fajri Koto
Jey Han Lau
Timothy Baldwin
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization"
24 / 24 papers shown
Towards Data-Efficient Medical Imaging: A Generative and Semi-Supervised Framework
Mosong Ma
Tania Stathaki
Michalis Lazarou
MedIm
GAN
276
0
0
07 Oct 2025
A Gamified Evaluation and Recruitment Platform for Low Resource Language Machine Translation Systems
Carlos Rafael Catalan
ELM
165
0
0
13 Jun 2025
Token Distillation: Attention-aware Input Embeddings For New Tokens
Konstantin Dobler
Desmond Elliott
Gerard de Melo
VLM
487
1
0
26 May 2025
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Luca Moroni
Giovanni Puccetti
Pere-Lluís Huguet Cabot
Andrei Stefan Bejgu
Edoardo Barba
Alessio Miaschi
F. Dell’Orletta
Andrea Esuli
Roberto Navigli
317
6
0
23 Apr 2025
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan
Chen Lyu
Hafiz Muhammad Umer
Sahrish Khan
Mahathi Parvatham
Lois Arthurs
Simon Cullen
Shelley Wilson
Arshad Jhumka
Gabriele Pergola
204
4
0
09 Mar 2025
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation
Muhammad Amien Ibrahim
Faisal
Tora Sangputra Yopie Winarto
Zefanya Delvin Sulistiya
232
1
0
06 Mar 2025
Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting
Fajri Koto
Rituraj Joshi
Nurdaulet Mukhituly
Yanjie Wang
Zhuohan Xie
...
Sarath Chandran
Avraham Sheinin
Natalia Vassilieva
Neha Sengupta
Larry Murray
ALM
KELM
428
3
0
03 Mar 2025
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Lucky Susanto
M. Wijanarko
Prasetia Anugrah Pratama
Zilu Tang
Fariz Akyas
Traci Hong
Ika Idris
Alham Fikri Aji
Derry Wijaya
232
0
0
01 Mar 2025
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Fajri Koto
ELM
345
4
0
13 Sep 2024
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
420
25
0
09 Apr 2024
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
255
14
0
08 Apr 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
180
8
0
04 Mar 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
264
24
0
03 Feb 2024
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
781
239
0
31 Oct 2023
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Quoc-Nam Nguyen
Thang Chau Phan
Duc-Vu Nguyen
Kiet Van Nguyen
344
19
0
17 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Fajri Koto
Nurul Aisyah
Jinyan Su
Timothy Baldwin
AI4Ed
LRM
ELM
319
61
0
07 Oct 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
322
16
0
19 Sep 2023
RoBERTweet: A BERT Language Model for Romanian Tweets
International Conference on Applications of Natural Language to Data Bases (NLDB), 2023
Iulian-Marius Tuaiatu
Andrei-Marius Avram
Dumitru-Clementin Cercel
Florin-Catalin Pop
160
5
0
11 Jun 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
488
70
0
19 Dec 2022
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zheng-Xin Yong
Hailey Schoelkopf
Niklas Muennighoff
Alham Fikri Aji
David Ifeoluwa Adelani
...
Genta Indra Winata
Stella Biderman
Edward Raff
Dragomir R. Radev
Vassilina Nikoulina
CLL
VLM
AI4CE
LRM
390
106
0
19 Dec 2022
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
Samuel Cahyawijaya
Alham Fikri Aji
Holy Lovenia
Genta Indra Winata
Bryan Wilie
...
Fajri Koto
David Moeljadi
Karissa Vincentio
Ade Romadhony
Ayu Purwarianti
386
6
0
21 Jul 2022
Language Modelling with Pixels
International Conference on Learning Representations (ICLR), 2022
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
336
59
0
14 Jul 2022
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
222
3
0
15 Jun 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
228
135
0
24 Mar 2022
1
Page 1 of 1