What Kind of Language Is Hard to Language-Model?

11 June 2019

Papers citing "What Kind of Language Is Hard to Language-Model?"

43 / 43 papers shown

Title
Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs Xiulin Yang Tatsuya Aoyama Yuekun Yao Ethan Wilcox 48 1 0 26 Feb 2025
Towards Typologically Aware Rescoring to Mitigate Unfaithfulness in Lower-Resource Languages Tsan Tsai Chan Xin Tong Thi Thu Uyen Hoang Barbare Tepnadze Wojciech Stempniak 36 0 0 24 Feb 2025
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5 Thao Anh Dang Limor Raviv Lukas Galke 25 1 0 15 Oct 2024
Emergent Word Order Universals from Cognitively-Motivated Language Models Tatsuki Kuribayashi Ryo Ueda Ryosuke Yoshida Yohei Oseki Ted Briscoe Timothy Baldwin 36 2 0 19 Feb 2024
CreoleVal: Multilingual Multitask Benchmarks for Creoles Heather Lent Kushal Tatariya Raj Dabre Yiyi Chen Marcell Richard Fekete ... Miryam de Lhoneux Daniel Hershcovich Michel DeGraff Anders Sogaard Johannes Bjerva SLR 41 9 0 30 Oct 2023
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution Jaap Jumelet Willem H. Zuidema 32 4 0 23 Oct 2023
Curricular Transfer Learning for Sentence Encoded Tasks Jader Martins Camboim de Sá Matheus Ferraroni Sanches R. R. Souza Júlio Cesar dos Reis Leandro A. Villas 21 0 0 03 Aug 2023
Testing the Predictions of Surprisal Theory in 11 Languages Ethan Gotlieb Wilcox Tiago Pimentel Clara Meister Ryan Cotterell R. Levy LRM 44 63 0 07 Jul 2023
Answering Unanswered Questions through Semantic Reformulations in Spoken QA Pedro Faustini Zhiyu Zoey Chen B. Fetahu Oleg Rokhlenko S. Malmasi KELM 34 2 0 27 May 2023
Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training Miriam Anschütz Joshua Oehms Thomas Wimmer Bartlomiej Jezierski Georg Groh 19 21 0 22 May 2023
Still no evidence for an effect of the proportion of non-native speakers on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023) Alexander Koplenig 22 1 0 29 Apr 2023
Dissociating language and thought in large language models Kyle Mahowald Anna A. Ivanova I. Blank Nancy Kanwisher J. Tenenbaum Evelina Fedorenko ELM ReLM 29 209 0 16 Jan 2023
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation Arturo Oncevay Kervy Rivas Rojas Liz Karen Chavez Sanchez Roberto Zariquiey 19 0 0 05 Oct 2022
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages Paul Soulos Sudha Rao Caitlin Smith Eric Rosen Asli Celikyilmaz ... Coleman Haley Roland Fernandez Hamid Palangi Jianfeng Gao P. Smolensky 24 6 0 11 Aug 2022
Lost in Space Marking Cassandra L. Jacobs Yuval Pinter 14 1 0 02 Aug 2022
What do tokens know about their characters and how do they know it? Ayush Kaushal Kyle Mahowald 14 28 0 06 Jun 2022
DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages Gabriele Sarti Arianna Bisazza Ana Guerberof Arenas Antonio Toral 36 7 0 24 May 2022
Disentangling Uncertainty in Machine Translation Evaluation Chrysoula Zerva T. Glushkova Ricardo Rei André F.T. Martins UD UQCV 38 9 0 13 Apr 2022
How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns Stephanie Brandl Ruixiang Cui Anders Søgaard 25 20 0 11 Apr 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey ... Arun Raja Chenglei Si Wilson Y. Lee Benoît Sagot Samson Tan 30 140 0 20 Dec 2021
Balancing Average and Worst-case Accuracy in Multitask Learning Paul Michel Sebastian Ruder Dani Yogatama 13 11 0 12 Oct 2021
Comparing Text Representations: A Theory-Driven Approach Gregory Yauney David M. Mimno 26 6 0 15 Sep 2021
You should evaluate your language model on marginal likelihood over tokenisations Kris Cao Laura Rimell 28 23 0 06 Sep 2021
Towards Zero-shot Language Modeling E. Ponti Ivan Vulić Ryan Cotterell Roi Reichart Anna Korhonen 27 19 0 06 Aug 2021
On the Difficulty of Translating Free-Order Case-Marking Languages Arianna Bisazza A. Ustun Stephan Sportel 36 9 0 13 Jul 2021
Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction Maria Ryskina Eduard H. Hovy Taylor Berg-Kirkpatrick Matthew R. Gormley 24 2 0 24 Jun 2021
Examining the Inductive Bias of Neural Language Models with Artificial Languages Jennifer C. White Ryan Cotterell 17 43 0 02 Jun 2021
Effective Batching for Recurrent Neural Network Grammars Hiroshi Noji Yohei Oseki GNN 13 16 0 31 May 2021
The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus Samia Touileb Jeremy Barnes 16 11 0 16 May 2021
A Cognitive Regularizer for Language Modeling Jason W. Wei Clara Meister Ryan Cotterell 16 21 0 15 May 2021
Convex Aggregation for Opinion Summarization Hayate Iso Xiaolan Wang Yoshihiko Suhara Stefanos Angelidis W. Tan 20 32 0 03 Apr 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language Models Angeliki Lazaridou A. Kuncoro E. Gribovskaya Devang Agrawal Adam Liska ... Sebastian Ruder Dani Yogatama Kris Cao Susannah Young Phil Blunsom VLM 30 207 0 03 Feb 2021
Morphology Matters: A Multilingual Language Modeling Analysis Hyunji Hayley Park Katherine J. Zhang Coleman Haley K. Steimel Han Liu Lane Schwartz 45 47 0 11 Dec 2020
Revisiting Neural Language Modelling with Syllables Arturo Oncevay Kervy Rivas Rojas 16 2 0 24 Oct 2020
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset Brian Roark Lawrence Wolf-Sonkin Christo Kirov Sabrina J. Mielke Cibu Johny Isin Demirsahin Keith B. Hall 12 71 0 02 Jul 2020
Surprisal-Triggered Conditional Computation with Neural Networks Loren Lugosch Derek Nowrouzezahrai B. Meyer 19 6 0 02 Jun 2020
Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages Tyler A. Chang Anna N. Rafferty 17 2 0 17 May 2020
Neural Polysynthetic Language Modelling Lane Schwartz Francis M. Tyers Lori S. Levin Christo Kirov Patrick Littell ... Vasilisa Andriyanets Aldrian Obaja Muis Naoki Otani J. Park Zhisong Zhang 16 24 0 11 May 2020
Phonotactic Complexity and its Trade-offs Tiago Pimentel Brian Roark Ryan Cotterell 20 37 0 07 May 2020
2kenize: Tying Subword Sequences for Chinese Script Conversion Pranav A Isabelle Augenstein 22 1 0 07 May 2020
It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information Emanuele Bugliarello Sabrina J. Mielke Antonios Anastasopoulos Ryan Cotterell Naoaki Okazaki 26 23 0 05 May 2020
Evaluating Transformer-Based Multilingual Text Classification Sophie Groenwold Samhita Honnavalli Li-hsueh Ou Aesha Parekh Sharon Levy Diba Mirza William Yang Wang 17 2 0 29 Apr 2020
Inherent Dependency Displacement Bias of Transition-Based Algorithms Mark Anderson Carlos Gómez-Rodríguez 11 4 0 31 Mar 2020