Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.11641
Cited By
PIQA: Reasoning about Physical Commonsense in Natural Language
26 November 2019
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PIQA: Reasoning about Physical Commonsense in Natural Language"
50 / 202 papers shown
Title
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
Hao Kang
Zichun Yu
Chenyan Xiong
MoE
46
0
0
26 May 2025
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang
Bingcong Li
G. Giannakis
143
0
0
24 May 2025
Training with Pseudo-Code for Instruction Following
Praveen Venkateswaran
Rudra Murthy
Riyaz Ahmad Bhat
Danish Contractor
ALM
LRM
39
0
0
23 May 2025
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
Xinran Gu
Kaifeng Lyu
Jiazheng Li
Jingzhao Zhang
37
0
0
23 May 2025
COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection
Jaewon Cheon
Pilsung Kang
38
0
0
23 May 2025
DataRater: Meta-Learned Dataset Curation
Dan A. Calian
Gregory Farquhar
Iurii Kemaev
Luisa M. Zintgraf
Matteo Hessel
...
András Gyorgy
Tom Schaul
Jeffrey Dean
Hado van Hasselt
David Silver
117
0
0
23 May 2025
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Wafa Alghallabi
Ritesh Thawkar
Sara Ghaboura
Ketan More
Omkar Thawakar
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
80
0
0
23 May 2025
CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning
Jinyuan Feng
Chaopeng Wei
Tenghai Qiu
Tianyi Hu
Zhiqiang Pu
MoE
55
0
0
23 May 2025
LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
106
0
0
22 May 2025
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
Zhihang Cai
Xingjun Zhang
Zhendong Tan
Zheng Wei
MQ
110
0
0
22 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
38
0
0
22 May 2025
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu
Sophia Ananiadou
MoMe
KELM
CLL
57
0
0
22 May 2025
Social Bias in Popular Question-Answering Benchmarks
Angelie Kraft
Judith Simon
Sonja Schimmler
59
0
0
21 May 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team
Ao Liu
Botong Zhou
Can Xu
Chayse Zhou
...
Bingxin Qu
Bolin Ni
Boyu Wu
Chen Li
Cheng-peng Jiang
MoE
LRM
AI4CE
82
0
0
21 May 2025
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
Philipp D. Siedler
17
0
0
21 May 2025
ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models
Raghav Singhal
Kaustubh Ponkshe
Rohit Vartak
Praneeth Vepakomma
62
0
0
20 May 2025
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Jianwei Li
Jung-Eng Kim
AAML
100
0
0
19 May 2025
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs
Shmulik Markovich-Golan
Daniel Ohayon
Itay Niv
Yair Hanani
MQ
78
0
0
19 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
56
0
0
18 May 2025
Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Tianjiao Li
Chua Jia Jim Deryl
Mak Lee Onn
Gee Wah Ng
Kezhi Mao
LRM
69
0
0
15 May 2025
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
353
0
0
05 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
171
2
0
01 May 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
365
0
0
21 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
67
0
0
19 Apr 2025
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
123
0
0
16 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
58
0
0
13 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
65
0
0
13 Apr 2025
Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
MQ
79
0
0
07 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
98
1
0
05 Apr 2025
Efficient Evaluation of Large Language Models via Collaborative Filtering
Xu-Xiang Zhong
Chao Yi
Han-Jia Ye
65
0
0
05 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
77
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
168
0
0
03 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
124
0
0
02 Apr 2025
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
124
0
0
01 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
353
0
0
28 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
74
0
0
24 Mar 2025
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Roberto Garcia
Jerry Liu
Daniel Sorvisto
Sabri Eyuboglu
117
0
0
23 Mar 2025
Mixture of Lookup Experts
Shibo Jie
Yehui Tang
Kai Han
Yongqian Li
Duyu Tang
Zhi-Hong Deng
Yunhe Wang
MoE
84
1
0
20 Mar 2025
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
83
6
0
17 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
425
0
0
14 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
65
0
0
13 Mar 2025
MetaXCR: Reinforcement-Based Meta-Transfer Learning for Cross-Lingual Commonsense Reasoning
Jie He
Yu Fu
OffRL
LRM
101
2
0
09 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
131
3
0
07 Mar 2025
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Shwai He
Weilin Cai
Jiayi Huang
Ang Li
MoE
102
2
0
07 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
169
2
0
07 Mar 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
64
0
0
06 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
64
0
0
06 Mar 2025
The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation
Jie He
Tao Wang
Deyi Xiong
Qun Liu
ELM
LRM
121
29
0
05 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
92
0
0
03 Mar 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Yuanmin Huang
Hongjian Zou
Qi Ding
Yixuan Liao
Xiao Chen
Qian Liu
Junxian He
100
3
0
02 Mar 2025
1
2
3
4
5
Next