Title
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs Fahmida Liza Piya Rahmatollah Beheshti 120 0 0 23 Apr 2025
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour Oscar Key Piotr Nawrot Pasquale Minervini Matt J. Kusner 15 41 0 12 Jul 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model BigScience Workshop : Teven Le Scao Angela Fan Christopher Akiki ... Zhongli Xie Zifan Ye M. Bras Younes Belkada Thomas Wolf VLM 92 2,306 0 09 Nov 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost Sungjun Cho Seonwoo Min Jinwoo Kim Moontae Lee Honglak Lee Seunghoon Hong 30 3 0 27 Oct 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? Yi Tay Mostafa Dehghani Samira Abnar Hyung Won Chung W. Fedus J. Rao Sharan Narang Vinh Q. Tran Dani Yogatama Donald Metzler AI4CE 32 100 0 21 Jul 2022
UL2: Unifying Language Learning Paradigms Yi Tay Mostafa Dehghani Vinh Q. Tran Xavier Garcia Jason W. Wei ... Tal Schuster H. Zheng Denny Zhou N. Houlsby Donald Metzler AI4CE 57 294 0 10 May 2022
To Know by the Company Words Keep and What Else Lies in the Vicinity Jake Williams H. Heidenreich 16 0 0 30 Apr 2022
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks Dennis Ulmer Christian Hardmeier J. Frellsen 40 42 0 14 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals Payal Bajaj Chenyan Xiong Guolin Ke Xiaodong Liu Di He Saurabh Tiwary Tie-Yan Liu Paul N. Bennett Xia Song Jianfeng Gao 42 32 0 13 Apr 2022
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval Jun Rao Fei-Yue Wang Liang Ding Shuhan Qi Yibing Zhan Weifeng Liu Dacheng Tao OOD 29 28 0 08 Mar 2022
Transformer Quality in Linear Time Weizhe Hua Zihang Dai Hanxiao Liu Quoc V. Le 71 222 0 21 Feb 2022
cosFormer: Rethinking Softmax in Attention Zhen Qin Weixuan Sun Huicai Deng Dongxu Li Yunshen Wei Baohong Lv Junjie Yan Lingpeng Kong Yiran Zhong 24 211 0 17 Feb 2022
Improving Neural Machine Translation by Denoising Training Liang Ding Keqin Peng Dacheng Tao VLM AI4CE 25 6 0 19 Jan 2022
The Efficiency Misnomer Daoyuan Chen Liuyi Yao Dawei Gao Ashish Vaswani Yaliang Li 32 98 0 25 Oct 2021
HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO Katharina Eggensperger Philip Muller Neeratyoy Mallik Matthias Feurer René Sass Aaron Klein Noor H. Awad Marius Lindauer Frank Hutter 35 100 0 14 Sep 2021
SHAPE: Shifted Absolute Position Embedding for Transformers Shun Kiyono Sosuke Kobayashi Jun Suzuki Kentaro Inui 231 45 0 13 Sep 2021
Revisiting Deep Learning Models for Tabular Data Yu. V. Gorishniy Ivan Rubachev Valentin Khrulkov Artem Babenko LMTD 19 696 0 22 Jun 2021
Do Transformers Really Perform Bad for Graph Representation? Chengxuan Ying Tianle Cai Shengjie Luo Shuxin Zheng Guolin Ke Di He Yanming Shen Tie-Yan Liu GNN 23 433 0 09 Jun 2021
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs Emanuele Bugliarello Ryan Cotterell Naoaki Okazaki Desmond Elliott 24 119 0 30 Nov 2020
Rethinking embedding coupling in pre-trained language models Hyung Won Chung Thibault Févry Henry Tsai Melvin Johnson Sebastian Ruder 93 142 0 24 Oct 2020