Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.00823
Cited By
v1
v2
v3
v4 (latest)
M6: A Chinese Multimodal Pretrainer
1 March 2021
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
Yichang Zhang
Peng Wang
Ang Wang
Le Jiang
Chencan Wu
Jie Zhang
Jianwei Zhang
Xu Zou
Zhikang Li
X. Deng
Jie Liu
Jinbao Xue
Huiling Zhou
Jianxin Ma
Jin Yu
Yong Li
Jialin Li
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"M6: A Chinese Multimodal Pretrainer"
50 / 92 papers shown
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Ji Ao
Dawei Leng
Yuhui Yin
VLM
340
5
0
13 Oct 2025
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Yuyang Liu
Qiuhe Hong
Linlan Huang
Alexandra Gomez-Villa
Dipam Goswami
Xialei Liu
Joost van de Weijer
Yonghong Tian
CLL
KELM
VLM
259
9
0
06 Aug 2025
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Hailong Ning
Siying Wang
Tao Lei
Xiaopeng Cao
Huanmin Dou
Bin Zhao
Asoke K. Nandi
Petia Radeva
198
3
0
22 May 2025
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Neural Information Processing Systems (NeurIPS), 2024
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
Xiaochen Li
Yanzhe Zhang
Shouda Liu
Tengjiao Wang
MoE
252
4
0
13 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
564
45
0
08 Nov 2024
TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model
Yihao Zhao
Enhao Zhong
Cuiyun Yuan
Yang Li
Man Zhao
Chunxia Li
Jun Hu
Chenbin Liu
VLM
MedIm
359
1
0
05 Sep 2024
LARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding
ACM Conference on Recommender Systems (RecSys), 2024
Zhizhong Wan
Bin Yin
Junjie Xie
Fei Jiang
Xiang Li
Jialin Li
3DV
249
11
0
21 Aug 2024
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yuze Zhao
Jintao Huang
Jinghan Hu
Xingjun Wang
Yunlin Mao
...
Zhikai Wu
Baole Ai
Ang Wang
Wenmeng Zhou
Yingda Chen
598
262
0
10 Aug 2024
Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Irwin King
289
9
0
02 Aug 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
199
2
0
19 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
189
8
0
01 Jul 2024
OmniControlNet: Dual-stage Integration for Conditional Image Generation
Yilin Wang
Haiyang Xu
Xiang Zhang
Zeyuan Chen
Zhizhou Sha
Zirui Wang
Zhuowen Tu
VLM
391
28
0
09 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
312
11
0
01 Jun 2024
HetHub: A Heterogeneous distributed hybrid training system for large-scale models
Si Xu
Zixiao Huang
Yan Zeng
Shengen Yan
Xuefei Ning
...
Zhezheng Lin
Hao Zhang
Sheng Wang
Guohao Dai
Yu Wang
GNN
104
0
0
25 May 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
483
776
0
11 Jan 2024
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
Ruoyu Zhao
Mingrui Zhu
Shiyin Dong
Nannan Wang
Xinbo Gao
DiffM
330
22
0
24 Nov 2023
LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation
Kai Mei
Zelong Li
VLM
518
18
0
26 Oct 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
254
2
0
24 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
778
1,891
0
24 Aug 2023
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification
International Conference on Information and Knowledge Management (CIKM), 2023
Chenyu Zhao
Yunjiang Jiang
Yiming Qiu
Han Zhang
Wen-Yun Yang
RALM
361
9
0
18 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
IEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
216
10
0
18 Aug 2023
Exploring Data Redundancy in Real-world Image Classification through Data Selection
Zhenyu Tang
Shaoting Zhang
Xiaosong Wang
198
3
0
25 Jun 2023
M3PT: A Multi-Modal Model for POI Tagging
Knowledge Discovery and Data Mining (KDD), 2023
Jingsong Yang
Guanzhou Han
Deqing Yang
Jingping Liu
Yanghua Xiao
Xiang Xu
Baohua Wu
Shenghua Ni
228
4
0
16 Jun 2023
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning
Xiao Dong
Runhu Huang
Xiaoyong Wei
Zequn Jie
Jianxing Yu
Jian Yin
Xiaodan Liang
VLM
DiffM
171
2
0
01 Jun 2023
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Neural Information Processing Systems (NeurIPS), 2023
Shihao Zhao
Dongdong Chen
Yen-Chun Chen
Jianmin Bao
Shaozhe Hao
Lu Yuan
Kwan-Yee K. Wong
493
435
0
25 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
695
159
0
18 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
335
17
0
17 May 2023
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
Zheng Yuan
HU Xue
Kun Wang
Yongming Liu
Kun Wang
VLM
MLLM
484
14
0
12 May 2023
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction
Wang-Cheng Kang
Jianmo Ni
Nikhil Mehta
M. Sathiamoorthy
Lichan Hong
Ed H. Chi
D. Cheng
288
170
0
10 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
223
19
0
08 May 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Tengjiao Wang
MoE
269
75
0
08 Apr 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
175
2
0
11 Mar 2023
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches
Siyu Wang
Zongyan Cao
Chang Si
Lansong Diao
Jiamang Wang
W. Lin
154
0
0
03 Mar 2023
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
163
3
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
640
292
0
20 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
212
4
0
16 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAI
MedIm
286
28
0
05 Feb 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2023
Ming Tao
Bingkun Bao
Hao Tang
Changsheng Xu
DiffM
VLM
290
148
0
30 Jan 2023
BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Haowen Hou
Xiaopeng Yan
Yigeng Zhang
Fengzong Lian
Zhanhui Kang
BDL
245
3
0
29 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
243
5
0
19 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Interspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
289
13
0
29 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Computer Vision and Pattern Recognition (CVPR), 2022
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
266
52
0
21 Nov 2022
Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models
Zhihong Pan
Xiaoxia Zhou
Hao Tian
DiffM
290
33
0
14 Nov 2022
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zhihong Pan
Xiaoxia Zhou
Hao Tian
DiffM
267
18
0
14 Nov 2022
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Yanxin Long
Jianhua Han
Runhu Huang
Xu Hang
Yi Zhu
Chunjing Xu
Xiaodan Liang
VLM
ObjD
313
30
0
02 Nov 2022
Masked Vision-Language Transformer in Fashion
Machine Intelligence Research (MIR), 2022
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
286
27
0
27 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLM
VLM
368
80
0
14 Oct 2022
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
407
4
0
05 Oct 2022
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
European Conference on Computer Vision (ECCV), 2022
Quan Liu
Youpeng Wen
Jianhua Han
Chunjing Xu
Hang Xu
Xiaodan Liang
VLM
320
93
0
18 Jul 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Ruihao Zhang
Jingren Zhou
Leilei Gan
Haiqing Chen
Hongxia Yang
VLM
367
14
0
29 Jun 2022
1
2
Next
Page 1 of 2