ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.08092
  4. Cited By
A standardized Project Gutenberg corpus for statistical analysis of
  natural language and quantitative linguistics

A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics

19 December 2018
Martin Gerlach
Francesc Font-Clos
ArXiv (abs)PDFHTML

Papers citing "A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics"

50 / 54 papers shown
When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection
When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection
Lang Gao
Xuhui Li
Chenxi Wang
Mingzhe Li
Wei Liu
Zirui Song
J. Zhang
Rui Yan
Preslav Nakov
Xiuying Chen
DeLMO
315
1
0
10 Apr 2026
Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts
Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts
Paulo J. N. Pinto
A. Pinho
Diogo Pratas
AI4CE
287
0
0
28 Nov 2025
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation
Nan Bao
Yifan Zhao
Lin Zhu
Jia Li
164
0
0
11 Nov 2025
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
Patrick Haller
Jonas Golde
Alan Akbik
128
1
0
04 Nov 2025
LLM one-shot style transfer for Authorship Attribution and Verification
LLM one-shot style transfer for Authorship Attribution and Verification
Pablo Miralles-González
Javier Huertas-Tato
Alejandro Martín
David Camacho
DeLMO
287
1
0
15 Oct 2025
Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling
Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling
Bianca-Mihaela Ganescu
Suchir Salhan
Andrew Caines
P. Buttery
VLM
175
2
0
09 Oct 2025
LongTail-Swap: benchmarking language models' abilities on rare words
LongTail-Swap: benchmarking language models' abilities on rare words
Robin Algayres
Charles-Éric Saint-James
Mahi Luthra
Jiayi Shen
Dongyan Lin
Youssef Benchekroun
Rashel Moritz
Juan Pino
Emmanuel Dupoux
149
1
0
05 Oct 2025
Scale-free Characteristics of Multilingual Legal Texts and the Limitations of LLMs
Scale-free Characteristics of Multilingual Legal Texts and the Limitations of LLMsInternational Conference on Text, Speech and Dialogue (TSD), 2025
Haoyang Chen
Kumiko Tanaka-Ishii
AILaw
124
0
0
22 Sep 2025
Once Upon a Time: Interactive Learning for Storytelling with Small Language Models
Once Upon a Time: Interactive Learning for Storytelling with Small Language Models
Jonas Mayer Martins
Ali Hamza Bashir
Muhammad Rehan Khalid
Lisa Beinborn
192
0
0
19 Sep 2025
Influence-driven Curriculum Learning for Pre-training on Limited Data
Influence-driven Curriculum Learning for Pre-training on Limited Data
Loris Schoenegger
Lukas Thoma
Terra Blevins
Benjamin Roth
285
1
0
21 Aug 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Robert Bamler
727
202
0
10 Apr 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
591
3
0
24 Feb 2025
BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
Lucas Charpentier
Leshem Choshen
Robert Bamler
Mustafa Omer Gul
Michael Y. Hu
...
Candace Ross
Raj Sanjay Shah
Alex Warstadt
Ethan Gotlieb Wilcox
Adina Williams
418
31
0
15 Feb 2025
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property TypeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Seokwon Song
Taehyun Lee
Jaewoo Ahn
Jae Hyuk Sung
Gunhee Kim
CoGe
719
1
0
10 Feb 2025
A Distributional Perspective on Word Learning in Neural Language Models
A Distributional Perspective on Word Learning in Neural Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Filippo Ficarra
Robert Bamler
Alex Warstadt
282
2
0
09 Feb 2025
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Omnilingual MT Team
Pierre Yves Andrews
Mikel Artetxe
Mariano Coria Meglioli
Marta R. Costa-jussá
...
Eduardo Sánchez
Ioannis Tsiamas
Arina Turkatenko
Albert Ventayol-Boada
Shireen Yates
537
4
0
06 Feb 2025
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Robert Bamler
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
530
53
0
06 Dec 2024
AntLM: Bridging Causal and Masked Language Models
AntLM: Bridging Causal and Masked Language Models
Xinru Yu
Bin Guo
Shiwei Luo
Jiadong Wang
Changzhi Sun
Man Lan
CLL
378
6
0
04 Dec 2024
When Babies Teach Babies: Can student knowledge sharing outperform
  Teacher-Guided Distillation on small datasets?
When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?
Srikrishna Iyer
FedML
451
0
0
25 Nov 2024
What Should Baby Models Read? Exploring Sample-Efficient Data
  Composition on Model Performance
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
Hong Meng Yam
Nathan J Paek
278
2
0
11 Nov 2024
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
Building, Reusing, and Generalizing Abstract Representations from Concrete SequencesInternational Conference on Learning Representations (ICLR), 2024
Shuchen Wu
Mirko Thalmann
Peter Dayan
Zeynep Akata
Eric Schulz
VLM
327
2
0
27 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMsInternational Conference on Learning Representations (ICLR), 2024
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
597
40
0
08 Oct 2024
Customizing Large Language Model Generation Style using
  Parameter-Efficient Finetuning
Customizing Large Language Model Generation Style using Parameter-Efficient FinetuningInternational Conference on Natural Language Generation (INLG), 2024
Xinyue Liu
Harshita Diddee
Daphne Ippolito
ALM
209
12
0
06 Sep 2024
Capturing Style in Author and Document Representation
Capturing Style in Author and Document Representation
Enzo Terreau
Antoine Gourru
Julien Velcin
313
2
0
18 Jul 2024
M2QA: Multi-domain Multilingual Question Answering
M2QA: Multi-domain Multilingual Question Answering
Leon Arne Engländer
Hannah Sterz
Clifton A. Poth
Jonas Pfeiffer
Ilia Kuznetsov
Iryna Gurevych
VLM
380
6
0
01 Jul 2024
YuLan: An Open-source Large Language Model
YuLan: An Open-source Large Language Model
Yutao Zhu
Kun Zhou
Kelong Mao
Wentong Chen
Yiding Sun
...
Wenbing Huang
Ze-Feng Gao
Yueguo Chen
Weizheng Lu
Ji-Rong Wen
ALMELM
201
3
0
28 Jun 2024
BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM
BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM
Zhewen Shen
Aditya Joshi
Ruey-Cheng Chen
CLL
298
5
0
17 Jun 2024
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models
Harsh Nishant Lalai
Aashish Anantha Ramakrishnan
Raj Sanjay Shah
Dongwon Lee
WaLMVLM
308
5
0
17 Jun 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
275
14
0
14 May 2024
[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining
  on a developmentally plausible corpus
[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
Leshem Choshen
Robert Bamler
Michael Y. Hu
Tal Linzen
Aaron Mueller
Candace Ross
Alex Warstadt
Ethan Gotlieb Wilcox
Adina Williams
Chengxu Zhuang
383
39
0
09 Apr 2024
Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs
Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs
Kanishka Misra
Kyle Mahowald
601
51
0
28 Mar 2024
Not all layers are equally as important: Every Layer Counts BERT
Not all layers are equally as important: Every Layer Counts BERT
Lucas Georges Gabriel Charpentier
David Samuel
311
32
0
03 Nov 2023
Mean BERTs make erratic language teachers: the effectiveness of latent
  bootstrapping in low-resource settings
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
David Samuel
234
4
0
30 Oct 2023
BabyStories: Can Reinforcement Learning Teach Baby Language Models to
  Write Better Stories?
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
Xingmeng Zhao
Tongnian Wang
Sheri Osborn
Anthony Rios
291
11
0
25 Oct 2023
LoRAShear: Efficient Large Language Model Structured Pruning and
  Knowledge Recovery
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Tianyi Chen
Tianyu Ding
Badal Yadav
Ilya Zharkov
Luming Liang
390
42
0
24 Oct 2023
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency
  by Automatic Task Formation
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
Jaap Jumelet
Michael Hanna
Marianne de Heer Kloots
Anna Langedijk
Charlotte Pouw
Oskar van der Wal
255
4
0
17 Oct 2023
Understanding writing style in social media with a supervised
  contrastively pre-trained transformer
Understanding writing style in social media with a supervised contrastively pre-trained transformerKnowledge-Based Systems (KBS), 2023
Javier Huertas-Tato
Alejandro Martín
David Camacho
420
15
0
17 Oct 2023
A Methodology for Generative Spelling Correction via Natural Spelling
  Errors Emulation across Multiple Domains and Languages
A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and LanguagesFindings (Findings), 2023
Nikita Martynov
Mark Baushenko
Anastasia Kozlova
Katerina Kolomeytseva
Aleksandr Abramov
Alena Fenogenova
315
10
0
18 Aug 2023
Quantifying the Dissimilarity of Texts
Quantifying the Dissimilarity of Texts
Benjamin Shade
E. Altmann
183
4
0
03 May 2023
Extension of Dictionary-Based Compression Algorithms for the
  Quantitative Visualization of Patterns from Log Files
Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files
Igor Cherepanov
Jonathan Geraldi Joewono
Arjan Kuijper
Jörn Kohlhammer
249
0
0
10 Apr 2023
Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on
  a developmentally plausible corpus
Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
Alex Warstadt
Leshem Choshen
Aaron Mueller
Adina Williams
Ethan Gotlieb Wilcox
Chengxu Zhuang
318
77
0
27 Jan 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme PredictionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
192
34
0
20 Jan 2023
PART: Pre-trained Authorship Representation Transformer
PART: Pre-trained Authorship Representation Transformer
Javier Huertas-Tato
Álvaro Huertas-García
Alejandro Martín
463
16
0
30 Sep 2022
On the State of the Art in Authorship Attribution and Authorship
  Verification
On the State of the Art in Authorship Attribution and Authorship Verification
Jacob Tyo
Bhuwan Dhingra
Zachary Chase Lipton
329
37
0
14 Sep 2022
A decomposition of book structure through ousiometric fluctuations in
  cumulative word-time
A decomposition of book structure through ousiometric fluctuations in cumulative word-timeHumanities and Social Sciences Communications (HSSC), 2022
M. Fudolig
Thayer Alshaabi
Kathryn Cramer
C. Danforth
P. Dodds
584
5
0
19 Aug 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys (ACM CSUR), 2022
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Bo Pan
810
43
0
19 Jul 2022
Text characterization based on recurrence networks
Text characterization based on recurrence networksInformation Sciences (Inf. Sci.), 2022
Bárbara C. e Souza
F. N. Silva
Henrique F. de Arruda
Giovana D. da Silva
L. D. F. Costa
D. R. Amancio
AI4CE
182
10
0
17 Jan 2022
Risks of AI Foundation Models in Education
Risks of AI Foundation Models in Education
Su Lin Blodgett
Michael A. Madaio
UQCV
173
18
0
19 Oct 2021
Joint prediction of truecasing and punctuation for conversational speech
  in low-resource scenarios
Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
R. Pappagari
Piotr Żelasko
Agnieszka Mikołajczyk
Piotr Pęzik
Najim Dehak
169
12
0
13 Sep 2021
A Statistical Model of Word Rank Evolution
A Statistical Model of Word Rank Evolution
Alex John Quijano
Rick Dale
Suzanne S. Sindi
410
0
0
21 Jul 2021
12
Next
Page 1 of 2