ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.04607
  4. Cited By
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with
  Effective Domain-Specific Vocabulary Initialization

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
10 September 2021
Fajri Koto
Jey Han Lau
Timothy Baldwin
    VLM
ArXiv (abs)PDFHTML

Papers citing "IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization"

24 / 24 papers shown
Towards Data-Efficient Medical Imaging: A Generative and Semi-Supervised Framework
Towards Data-Efficient Medical Imaging: A Generative and Semi-Supervised Framework
Mosong Ma
Tania Stathaki
Michalis Lazarou
MedImGAN
276
0
0
07 Oct 2025
A Gamified Evaluation and Recruitment Platform for Low Resource Language Machine Translation Systems
A Gamified Evaluation and Recruitment Platform for Low Resource Language Machine Translation Systems
Carlos Rafael Catalan
ELM
165
0
0
13 Jun 2025
Token Distillation: Attention-aware Input Embeddings For New Tokens
Token Distillation: Attention-aware Input Embeddings For New Tokens
Konstantin Dobler
Desmond Elliott
Gerard de Melo
VLM
487
1
0
26 May 2025
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary AdaptationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Luca Moroni
Giovanni Puccetti
Pere-Lluís Huguet Cabot
Andrei Stefan Bejgu
Edoardo Barba
Alessio Miaschi
F. Dell’Orletta
Andrea Esuli
Roberto Navigli
317
6
0
23 Apr 2025
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan
Chen Lyu
Hafiz Muhammad Umer
Sahrish Khan
Mahathi Parvatham
Lois Arthurs
Simon Cullen
Shelley Wilson
Arshad Jhumka
Gabriele Pergola
204
4
0
09 Mar 2025
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation
Muhammad Amien Ibrahim
Faisal
Tora Sangputra Yopie Winarto
Zefanya Delvin Sulistiya
232
1
0
06 Mar 2025
Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting
Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting
Fajri Koto
Rituraj Joshi
Nurdaulet Mukhituly
Yanjie Wang
Zhuohan Xie
...
Sarath Chandran
Avraham Sheinin
Natalia Vassilieva
Neha Sengupta
Larry Murray
ALMKELM
428
3
0
03 Mar 2025
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics InformationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Lucky Susanto
M. Wijanarko
Prasetia Anugrah Pratama
Zilu Tang
Fariz Akyas
Traci Hong
Ika Idris
Alham Fikri Aji
Derry Wijaya
232
0
0
01 Mar 2025
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in IndonesiaNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Fajri Koto
ELM
345
4
0
13 Sep 2024
Cendol: Open Instruction-tuned Generative Large Language Models for
  Indonesian Languages
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
420
25
0
09 Apr 2024
SambaLingo: Teaching Large Language Models New Languages
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
255
14
0
08 Apr 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
180
8
0
04 Mar 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a
  Multilingual Sentiment Lexicon
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
264
24
0
03 Feb 2024
ChipNeMo: Domain-Adapted LLMs for Chip Design
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
781
239
0
31 Oct 2023
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text
  Processing
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text ProcessingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Quoc-Nam Nguyen
Thang Chau Phan
Duc-Vu Nguyen
Kiet Van Nguyen
344
19
0
17 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLUConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Fajri Koto
Nurul Aisyah
Jinyan Su
Timothy Baldwin
AI4EdLRMELM
319
61
0
07 Oct 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and
  Extremely Low-Resource Languages
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource LanguagesInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
322
16
0
19 Sep 2023
RoBERTweet: A BERT Language Model for Romanian Tweets
RoBERTweet: A BERT Language Model for Romanian TweetsInternational Conference on Applications of Natural Language to Data Bases (NLDB), 2023
Iulian-Marius Tuaiatu
Andrei-Marius Avram
Dumitru-Clementin Cercel
Florin-Catalin Pop
160
5
0
11 Jun 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
NusaCrowd: Open Source Initiative for Indonesian NLP ResourcesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
488
70
0
19 Dec 2022
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot PromptingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zheng-Xin Yong
Hailey Schoelkopf
Niklas Muennighoff
Alham Fikri Aji
David Ifeoluwa Adelani
...
Genta Indra Winata
Stella Biderman
Edward Raff
Dragomir R. Radev
Vassilina Nikoulina
CLLVLMAI4CELRM
390
106
0
19 Dec 2022
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian
  Languages
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
Samuel Cahyawijaya
Alham Fikri Aji
Holy Lovenia
Genta Indra Winata
Bryan Wilie
...
Fajri Koto
David Moeljadi
Karissa Vincentio
Ade Romadhony
Ayu Purwarianti
386
6
0
21 Jul 2022
Language Modelling with Pixels
Language Modelling with PixelsInternational Conference on Learning Representations (ICLR), 2022
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
336
59
0
14 Jul 2022
Location-based Twitter Filtering for the Creation of Low-Resource
  Language Datasets in Indonesian Local Languages
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
222
3
0
15 Jun 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented
  Languages and Dialects in Indonesia
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in IndonesiaAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
228
135
0
24 Mar 2022
1
Page 1 of 1