Title |
---|
![]() Monotonic Location Attention for Length Generalization Jishnu Ray Chowdhury Cornelia Caragea |
![]() Faith and Fate: Limits of Transformers on Compositionality Nouha Dziri Ximing Lu Melanie Sclar Xiang Lorraine Li Liwei Jian ...Sean Welleck Xiang Ren Allyson Ettinger Zaïd Harchaoui Yejin Choi |
![]() Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes Lokesh Nagalapatti Chun-Liang Li Chih-Kuan Yeh Hootan Nakhost Yasuhisa Fujii Alexander Ratner Ranjay Krishna Chen-Yu Lee Tomas Pfister |