Large Language Models are not Models of Natural Language: they are Corpus Models

13 December 2021

Papers citing "Large Language Models are not Models of Natural Language: they are Corpus Models"

8 / 8 papers shown

Title
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training Michał Perełkiewicz Rafał Poświata 48 1 0 10 Jul 2024
Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education Vahid Ashrafimoghari Necdet Gurkan Jordan W. Suchow ELM 37 6 0 02 Jan 2024
The Quo Vadis of the Relationship between Language and Large Language Models Evelina Leivada Vittoria Dentella Elliot Murphy 38 3 0 17 Oct 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes Anna Rogers A. Luccioni 53 19 0 14 Aug 2023
The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial Travis LaCroix 33 3 0 02 Jul 2022
Deduplicating Training Data Makes Language Models Better Katherine Lee Daphne Ippolito A. Nystrom Chiyuan Zhang Douglas Eck Chris Callison-Burch Nicholas Carlini SyDa 242 599 0 14 Jul 2021
Measuring Coding Challenge Competence With APPS Dan Hendrycks Steven Basart Saurav Kadavath Mantas Mazeika Akul Arora ... Collin Burns Samir Puranik Horace He D. Song Jacob Steinhardt ELM AIMat ALM 208 631 0 20 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 282 2,007 0 31 Dec 2020