Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2509.08824
Cited By
Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora
10 September 2025
Thales Sales Almeida
Rodrigo Nogueira
Hélio Pedrini
Re-assign community
ArXiv (abs)
PDF
HTML
Github (871★)
Papers citing
"Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora"
2 / 2 papers shown
PoETa v2: Toward More Robust Evaluation of Large Language Models in Portuguese
IEEE Access (IEEE Access), 2025
Thales Sales Almeida
Ramon Pires
Hugo Queiroz Abonizio
Rodrigo Nogueira
Hélio Pedrini
78
1
0
21 Nov 2025
BRoverbs -- Measuring how much LLMs understand Portuguese proverbs
Thales Sales Almeida
Giovana K. Bonás
João Guilherme Alves Santos
134
2
0
10 Sep 2025
1
Page 1 of 1