First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

21 August 2024

Yujie Wang

Papers citing "First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models"

1 / 1 papers shown

Title
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training Zexuan Zhong Mengzhou Xia Danqi Chen Mike Lewis MoE 49 15 0 06 May 2024