Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.04607
Cited By
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
10 September 2021
Fajri Koto
Jey Han Lau
Timothy Baldwin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization"
22 / 22 papers shown
Title
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
Luca Moroni
Giovanni Puccetti
Pere-Lluís Huguet Cabot
Andrei Stefan Bejgu
Edoardo Barba
Alessio Miaschi
F. Dell’Orletta
Andrea Esuli
Roberto Navigli
20
0
0
23 Apr 2025
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan
Chen Lyu
Hafiz Muhammad Umer
Sahrish Khan
Mahathi Parvatham
Lois Arthurs
Simon Cullen
Shelley Wilson
Arshad Jhumka
Gabriele Pergola
37
0
0
09 Mar 2025
Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh
Fajri Koto
Rituraj Joshi
Nurdaulet Mukhituly
Y. Wang
Zhuohan Xie
...
Avraham Sheinin
Natalia Vassilieva
Neha Sengupta
Larry Murray
Preslav Nakov
ALM
KELM
36
0
0
03 Mar 2025
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information
Lucky Susanto
M. Wijanarko
Prasetia Anugrah Pratama
Zilu Tang
Fariz Akyas
Traci Hong
Ika Idris
Alham Fikri Aji
Derry Wijaya
34
0
0
01 Mar 2025
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
Fajri Koto
ELM
31
2
0
13 Sep 2024
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
42
15
0
09 Apr 2024
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
19
6
0
08 Apr 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
19
1
0
04 Mar 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
38
7
0
03 Feb 2024
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
37
73
0
31 Oct 2023
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
Quoc-Nam Nguyen
Thang Chau Phan
Duc-Vu Nguyen
Kiet Van Nguyen
14
2
0
17 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4Ed
LRM
ELM
17
37
0
07 Oct 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
34
7
0
19 Sep 2023
RoBERTweet: A BERT Language Model for Romanian Tweets
Iulian-Marius Tuaiatu
Andrei-Marius Avram
Dumitru-Clementin Cercel
Florin-Catalin Pop
11
1
0
11 Jun 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
11
47
0
19 Dec 2022
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
Zheng-Xin Yong
Hailey Schoelkopf
Niklas Muennighoff
Alham Fikri Aji
David Ifeoluwa Adelani
...
Genta Indra Winata
Stella Biderman
Edward Raff
Dragomir R. Radev
Vassilina Nikoulina
CLL
VLM
AI4CE
LRM
24
81
0
19 Dec 2022
NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
Samuel Cahyawijaya
Alham Fikri Aji
Holy Lovenia
Genta Indra Winata
Bryan Wilie
...
Fajri Koto
David Moeljadi
Karissa Vincentio
Ade Romadhony
Ayu Purwarianti
24
5
0
21 Jul 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
12
46
0
14 Jul 2022
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
8
3
0
15 Jun 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
32
98
0
24 Mar 2022
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
29,632
0
16 Jan 2013
1