927
v1v2 (latest)

mHC: Manifold-Constrained Hyper-Connections

Zhenda Xie
Yixuan Wei
Huanqi Cao
Chenggang Zhao
Chengqi Deng
Jiashi Li
Damai Dai
Huazuo Gao
Jiang Chang
Kuai Yu
Liang Zhao
Shangyan Zhou
Zhean Xu
Zhengyan Zhang
Wangding Zeng
Shengding Hu
Yuqing Wang
Jingyang Yuan
Lean Wang
Wenfeng Liang
Main:14 Pages
8 Figures
Bibliography:4 Pages
4 Tables
Appendix:1 Pages
Abstract

Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.

View on arXiv
Comments on this paper