ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.06814
  4. Cited By
BERTIN: Efficient Pre-Training of a Spanish Language Model using
  Perplexity Sampling

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

14 July 2022
Javier de la Rosa
E. G. Ponferrada
Paulo Villegas
Pablo González de Prado Salas
Manu Romero
María Grandury
ArXivPDFHTML

Papers citing "BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling"

7 / 7 papers shown
Title
Data Quality Control in Federated Instruction-tuning of Large Language Models
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Rui Ye
Fengting Yuchi
W. Zhao
Jingjing Qu
Y. Wang
Siheng Chen
ALM
FedML
45
0
0
15 Oct 2024
Comparing Styles across Languages: A Cross-Cultural Exploration of Politeness
Comparing Styles across Languages: A Cross-Cultural Exploration of Politeness
Shreya Havaldar
Matthew Pressimone
Eric Wong
Lyle Ungar
50
2
0
11 Oct 2023
Generative AI Text Classification using Ensemble LLM Approaches
Generative AI Text Classification using Ensemble LLM Approaches
Harika Abburi
Michael Suesserman
Nirmala Pudota
Balaji Veeramani
Edward Bowen
Sanmitra Bhattacharya
DeLMO
19
44
0
14 Sep 2023
Findings of the VarDial Evaluation Campaign 2023
Findings of the VarDial Evaluation Campaign 2023
Noëmi Aepli
Çagri Çöltekin
Rob van der Goot
T. Jauhiainen
Mourhaf Kazzaz
Nikola Ljubesic
Kai North
Barbara Plank
Yves Scherrer
Marcos Zampieri
6
29
0
31 May 2023
A Spanish dataset for Targeted Sentiment Analysis of political headlines
A Spanish dataset for Targeted Sentiment Analysis of political headlines
Tomás Alves Salgueiro
E. Zapata
D. Furman
Juan Manuel Pérez
Pablo Nicolás Fernández Larrosa
25
7
0
30 Aug 2022
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
240
590
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
1