ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00823
  4. Cited By
M6: A Chinese Multimodal Pretrainer
v1v2v3v4 (latest)

M6: A Chinese Multimodal Pretrainer

1 March 2021
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
Yichang Zhang
Peng Wang
Ang Wang
Le Jiang
Chencan Wu
Jie Zhang
Jianwei Zhang
Xu Zou
Zhikang Li
X. Deng
Jie Liu
Jinbao Xue
Huiling Zhou
Jianxin Ma
Jin Yu
Yong Li
Jialin Li
Jingren Zhou
J. Tang
Hongxia Yang
    VLMMoE
ArXiv (abs)PDFHTMLGithub

Papers citing "M6: A Chinese Multimodal Pretrainer"

50 / 92 papers shown
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Ji Ao
Dawei Leng
Yuhui Yin
VLM
340
5
0
13 Oct 2025
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Yuyang Liu
Qiuhe Hong
Linlan Huang
Alexandra Gomez-Villa
Dipam Goswami
Xialei Liu
Joost van de Weijer
Yonghong Tian
CLLKELMVLM
259
9
0
06 Aug 2025
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Hailong Ning
Siying Wang
Tao Lei
Xiaopeng Cao
Huanmin Dou
Bin Zhao
Asoke K. Nandi
Petia Radeva
198
3
0
22 May 2025
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive
  Hashing
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive HashingNeural Information Processing Systems (NeurIPS), 2024
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
Xiaochen Li
Yanzhe Zhang
Shouda Liu
Tengjiao Wang
MoE
252
4
0
13 Nov 2024
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
564
45
0
08 Nov 2024
TG-LMM: Enhancing Medical Image Segmentation Accuracy through
  Text-Guided Large Multi-Modal Model
TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model
Yihao Zhao
Enhao Zhong
Cuiyun Yuan
Yang Li
Man Zhao
Chunxia Li
Jun Hu
Chenbin Liu
VLMMedIm
359
1
0
05 Sep 2024
LARR: Large Language Model Aided Real-time Scene Recommendation with
  Semantic Understanding
LARR: Large Language Model Aided Real-time Scene Recommendation with Semantic UnderstandingACM Conference on Recommender Systems (RecSys), 2024
Zhizhong Wan
Bin Yin
Junjie Xie
Fei Jiang
Xiang Li
Jialin Li
3DV
249
11
0
21 Aug 2024
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
SWIFT:A Scalable lightWeight Infrastructure for Fine-TuningAAAI Conference on Artificial Intelligence (AAAI), 2024
Yuze Zhao
Jintao Huang
Jinghan Hu
Xingjun Wang
Yunlin Mao
...
Zhikai Wu
Baole Ai
Ang Wang
Wenmeng Zhou
Yingda Chen
598
262
0
10 Aug 2024
Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following
Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Irwin King
289
9
0
02 Aug 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
199
2
0
19 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLMCoGe
189
8
0
01 Jul 2024
OmniControlNet: Dual-stage Integration for Conditional Image Generation
OmniControlNet: Dual-stage Integration for Conditional Image Generation
Yilin Wang
Haiyang Xu
Xiang Zhang
Zeyuan Chen
Zhizhou Sha
Zirui Wang
Zhuowen Tu
VLM
391
28
0
09 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
312
11
0
01 Jun 2024
HetHub: A Heterogeneous distributed hybrid training system for
  large-scale models
HetHub: A Heterogeneous distributed hybrid training system for large-scale models
Si Xu
Zixiao Huang
Yan Zeng
Shengen Yan
Xuefei Ning
...
Zhezheng Lin
Hao Zhang
Sheng Wang
Guohao Dai
Yu Wang
GNN
104
0
0
25 May 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in
  Mixture-of-Experts Language Models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
483
776
0
11 Jan 2024
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image
  Personalization
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
Ruoyu Zhao
Mingrui Zhu
Shiyin Dong
Nannan Wang
Xinbo Gao
DiffM
330
22
0
24 Nov 2023
LightLM: A Lightweight Deep and Narrow Language Model for Generative
  Recommendation
LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation
Kai Mei
Zelong Li
VLM
518
18
0
26 Oct 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
254
2
0
24 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
778
1,891
0
24 Aug 2023
Differentiable Retrieval Augmentation via Generative Language Modeling
  for E-commerce Query Intent Classification
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent ClassificationInternational Conference on Information and Knowledge Management (CIKM), 2023
Chenyu Zhao
Yunjiang Jiang
Yiming Qiu
Han Zhang
Wen-Yun Yang
RALM
361
9
0
18 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal
  Discrimination Capability
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination CapabilityIEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
216
10
0
18 Aug 2023
Exploring Data Redundancy in Real-world Image Classification through
  Data Selection
Exploring Data Redundancy in Real-world Image Classification through Data Selection
Zhenyu Tang
Shaoting Zhang
Xiaosong Wang
198
3
0
25 Jun 2023
M3PT: A Multi-Modal Model for POI Tagging
M3PT: A Multi-Modal Model for POI TaggingKnowledge Discovery and Data Mining (KDD), 2023
Jingsong Yang
Guanzhou Han
Deqing Yang
Jingping Liu
Yanghua Xiao
Xiang Xu
Baohua Wu
Shenghua Ni
228
4
0
16 Jun 2023
UniDiff: Advancing Vision-Language Models with Generative and
  Discriminative Learning
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning
Xiao Dong
Runhu Huang
Xiaoyong Wei
Zequn Jie
Jianxing Yu
Jian Yin
Xiaodan Liang
VLMDiffM
171
2
0
01 Jun 2023
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Shihao Zhao
Dongdong Chen
Yen-Chun Chen
Jianmin Bao
Shaozhe Hao
Lu Yuan
Kwan-Yee K. Wong
493
435
0
25 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
695
159
0
18 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
335
17
0
17 May 2023
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models
  with Enhanced Adapter
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
Zheng Yuan
HU Xue
Kun Wang
Yongming Liu
Kun Wang
VLMMLLM
484
14
0
12 May 2023
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating
  Prediction
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction
Wang-Cheng Kang
Jianmo Ni
Nikhil Mehta
M. Sathiamoorthy
Lichan Hong
Ed H. Chi
D. Cheng
288
170
0
10 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on
  Joint Textual and Visual Clues
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual CluesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
223
19
0
08 May 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via
  Dynamic Device Placement
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Xiaonan Nie
Xupeng Miao
Zilong Wang
Zichao Yang
Jilong Xue
Lingxiao Ma
Gang-Ming Cao
Tengjiao Wang
MoE
269
75
0
08 Apr 2023
OCCL: a Deadlock-free Library for GPU Collective Communication
OCCL: a Deadlock-free Library for GPU Collective Communication
Lichen Pan
Juncheng Liu
Jinhui Yuan
Rongkai Zhang
Pengze Li
Zhen Xiao
175
2
0
11 Mar 2023
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by
  Adaptive Group-Scheduling for Micro-Batches
Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches
Siyu Wang
Zongyan Cao
Chang Si
Lansong Diao
Jiamang Wang
W. Lin
154
0
0
03 Mar 2023
Entity-Level Text-Guided Image Manipulation
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
163
3
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive SurveyMachine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CEVLM
640
292
0
20 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on
  Production AI Platform
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
212
4
0
16 Feb 2023
Towards energy-efficient Deep Learning: An overview of energy-efficient
  approaches along the Deep Learning Lifecycle
Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle
Vanessa Mehlin
Sigurd Schacht
Carsten Lanquillon
HAIMedIm
286
28
0
05 Feb 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
GALIP: Generative Adversarial CLIPs for Text-to-Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Ming Tao
Bingkun Bao
Hao Tang
Changsheng Xu
DiffMVLM
290
148
0
30 Jan 2023
BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Haowen Hou
Xiaopeng Yan
Yigeng Zhang
Fengzong Lian
Zhanhui Kang
BDL
245
3
0
29 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Transferring General Multimodal Pretrained Models to Text RecognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
243
5
0
19 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
289
13
0
29 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
  Unified Vision Language Model
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language ModelComputer Vision and Pattern Recognition (CVPR), 2022
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
266
52
0
21 Nov 2022
Extreme Generative Image Compression by Learning Text Embedding from
  Diffusion Models
Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models
Zhihong Pan
Xiaoxia Zhou
Hao Tian
DiffM
290
33
0
14 Nov 2022
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
  Generation
Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zhihong Pan
Xiaoxia Zhou
Hao Tian
DiffM
267
18
0
14 Nov 2022
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
  Object Detection
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object DetectionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Yanxin Long
Jianhua Han
Runhu Huang
Xu Hang
Yi Zhu
Chunjing Xu
Xiaodan Liang
VLMObjD
313
30
0
02 Nov 2022
Masked Vision-Language Transformer in Fashion
Masked Vision-Language Transformer in FashionMachine Intelligence Research (MIR), 2022
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
286
27
0
27 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in
  Vision-Language Pre-training
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-trainingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLMVLM
368
80
0
14 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
407
4
0
05 Oct 2022
Open-world Semantic Segmentation via Contrasting and Clustering
  Vision-Language Embedding
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language EmbeddingEuropean Conference on Computer Vision (ECCV), 2022
Quan Liu
Youpeng Wen
Jianhua Han
Chunjing Xu
Hang Xu
Xiaodan Liang
VLM
320
93
0
18 Jul 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Ruihao Zhang
Jingren Zhou
Leilei Gan
Haiqing Chen
Hongxia Yang
VLM
367
14
0
29 Jun 2022
12
Next
Page 1 of 2