Self Identity MappingNeural Networks (NN), 2025 |
Weight decay induces low-rank attention layersNeural Information Processing Systems (NeurIPS), 2024 |
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (ICLR), 2024 |
mGTE: Generalized Long-Context Text Representation and Reranking Models
for Multilingual Text RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Xin Zhang Yanzhao Zhang Dingkun Long Wen Xie Ziqi Dai ...Pengjun Xie Fei Huang Meishan Zhang Wenjie Li Min Zhang |
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural
NetworksInternational Conference on Machine Learning (ICML), 2023 |
On the Overlooked Structure of Stochastic GradientsNeural Information Processing Systems (NeurIPS), 2022 |
Residual-Concatenate Neural Network with Deep Regularization Layers for
Binary ClassificationInternational Conference Intelligent Computing and Control Systems (ICICCS), 2022 |