v1v2v3 (latest)

InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion

6 January 2025

Papers citing "InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion"

15 / 15 papers shown

Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

394

04 Jun 2025

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

442

28 Jan 2025

Cautious Optimizers: Improving Training with One Line of Code

694

25 Nov 2024

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction DataInternational Conference on Learning Representations (ICLR), 2024

413

123

02 Oct 2024

FuseChat: Knowledge Fusion of Chat Models

Xiaojun Quan

353

15 Aug 2024

Qwen2 Technical Report

Bowen Yu

...

Yuqiong Liu

Zeyu Cui

Zhenru Zhang

Zhifang Guo

Zhi-Wei Fan

OSLM VLM MU

628

1,685

15 Jul 2024

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

...

Jiashi Li

Chenggang Zhao

Chong Ruan

Fuli Luo

Wenfeng Liang

MoE LRM ELM VLM

306

358

17 Jun 2024

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I. Timiryasov

J. Tastet

350

03 Aug 2023

Editing Models with Task ArithmeticInternational Conference on Learning Representations (ICLR), 2022

1.2K

734

08 Dec 2022

Training Verifiers to Solve Math Word Problems

...

1.1K

6,736

27 Oct 2021

Evaluating Large Language Models Trained on Code

...

1.9K

7,722

07 Jul 2021

Measuring Mathematical Problem Solving With the MATH Dataset

903

3,885

05 Mar 2021

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.0K

52,173

28 May 2020

Sequence-Level Knowledge DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2016

Yoon Kim

Alexander M. Rush

447

1,199

25 Jun 2016

Distilling the Knowledge in a Neural Network

797

22,387

09 Mar 2015