ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.06767
  4. Cited By
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training
  Benchmark

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

14 February 2022
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
Xiaodan Liang
Lewei Yao
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
    VLM
ArXivPDFHTML

Papers citing "Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark"

50 / 64 papers shown
Title
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
42
0
0
08 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
M. Li
Lei Sun
X. Chu
DiffM
79
0
0
06 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
76
1
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
62
0
0
05 May 2025
Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models
Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models
Zhenyu Yu
Mohd Yamani Idna Idris
Pei Wang
Yuelong Xia
DiffM
24
0
0
18 Apr 2025
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
X. Zhang
Yarong Zeng
Xinting Huang
Hu Hu
Runquan Xie
Han Hu
Zhanhui Kang
MLLM
VLM
45
0
0
17 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Z. Wu
Y. Zhang
...
Bohan Zeng
W. Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGen
VLM
67
0
0
14 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
81
1
0
02 Apr 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Shitian Zhao
Qilong Wu
Xinyue Li
Bo Zhang
Ming-xing Li
...
H. Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
43
0
0
27 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
43
0
0
04 Mar 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
53
0
0
23 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
44
0
0
19 Feb 2025
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Pandeng Li
Yun Zheng
Hongtao Xie
Yun Zheng
Hongtao Xie
VLM
CoGe
96
0
0
19 Feb 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
102
1
0
20 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
170
0
0
18 Dec 2024
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye
Yongkun Du
Yunbo Tao
Z. Chen
DiffM
96
0
0
02 Dec 2024
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video
  Comprehension with Video-Text Duet Interaction Format
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
Yueqian Wang
Xiaojun Meng
Y. Wang
Jianxin Liang
Jiansheng Wei
Huishuai Zhang
Dongyan Zhao
VGen
75
8
0
27 Nov 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
62
24
0
21 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
62
25
0
10 Oct 2024
POINTS: Improving Your Vision-language Model with Affordable Strategies
POINTS: Improving Your Vision-language Model with Affordable Strategies
Yuan Liu
Zhongyin Zhao
Ziyuan Zhuang
Le Tian
Xiao Zhou
Jie Zhou
VLM
35
5
0
07 Sep 2024
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Haoran Wei
Chenglong Liu
Jinyue Chen
Jia Wang
Lingyu Kong
...
Liang Zhao
Jianjian Sun
Yuang Peng
Chunrui Han
Xiangyu Zhang
VLM
44
41
0
03 Sep 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Ran He
Rongrong Ji
Yunsheng Wu
Caifeng Shan
Xing Sun
MLLM
34
79
0
09 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
47
94
0
09 Aug 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
28
0
0
19 Jul 2024
How Control Information Influences Multilingual Text Image Generation
  and Editing?
How Control Information Influences Multilingual Text Image Generation and Editing?
Boqiang Zhang
Zuan Gao
Yadong Qu
Hongtao Xie
DiffM
37
5
0
16 Jul 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma
Yonglin Deng
Chen Chen
H. Lu
Zhenyu Yang
Zhenyu Yang
VLM
DiffM
82
6
0
02 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
32
5
0
01 Jul 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
42
9
0
11 Jun 2024
RIGID: A Training-free and Model-Agnostic Framework for Robust
  AI-Generated Image Detection
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection
Zhiyuan He
Pin-Yu Chen
Tsung-Yi Ho
31
12
0
30 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
41
14
0
28 May 2024
Multilingual Diversity Improves Vision-Language Representations
Multilingual Diversity Improves Vision-Language Representations
Thao Nguyen
Matthew Wallingford
Sebastin Santy
Wei-Chiu Ma
Sewoong Oh
Ludwig Schmidt
Pang Wei Koh
Ranjay Krishna
VLM
32
5
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
49
522
0
25 Apr 2024
A Progressive Framework of Vision-language Knowledge Distillation and
  Alignment for Multilingual Scene
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
36
1
0
17 Apr 2024
Refining Text-to-Image Generation: Towards Accurate Training-Free
  Glyph-Enhanced Image Generation
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
Sanyam Lakhanpal
Shivang Chopra
Vinija Jain
Aman Chadha
Man Luo
27
9
0
25 Mar 2024
See Through Their Minds: Learning Transferable Neural Representation
  from Cross-Subject fMRI
See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI
Yulong Liu
Yongqiang Ma
Guibo Zhu
Haodong Jing
Nanning Zheng
14
4
0
11 Mar 2024
WildFake: A Large-scale Challenging Dataset for AI-Generated Images
  Detection
WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection
Yan Hong
Jianfu Zhang
67
9
0
19 Feb 2024
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale
  Efficient Pretraining
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Qingpei Guo
Furong Xu
Hanxiao Zhang
Wang Ren
Ziping Ma
Lin Ju
Jian Wang
Jingdong Chen
Ming Yang
VLM
MLLM
25
2
0
29 Jan 2024
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with
  Large Vision-Language Model Support
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Xiaojun Wu
Di Zhang
Ruyi Gan
Junyu Lu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
VLM
21
6
0
26 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
175
0
24 Jan 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short
  Video Search Scenarios
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
18
0
0
19 Jan 2024
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
  Time-Decoupled Training and Reusable Coop-Diffusion
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu
Yuanfan Guo
Jianhua Han
Minzhe Niu
Yihan Zeng
Songcen Xu
Zeyi Huang
Zhao Zhong
Wei Zhang
Hang Xu
26
4
0
27 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
156
918
0
21 Dec 2023
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
Zhiwei Zha
Jiaan Wang
Zhixu Li
Xiangru Zhu
Wei Song
Yanghua Xiao
VLM
23
2
0
16 Dec 2023
Large Scale Foundation Models for Intelligent Manufacturing
  Applications: A Survey
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCV
AI4TS
AI4CE
27
4
0
11 Dec 2023
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image
  Diffusion Model for Interior Design
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design
Ruyi Gan
Xiaojun Wu
Junyu Lu
Yuanhe Tian
Di Zhang
...
Renliang Sun
Chang Liu
Jiaxing Zhang
Pingjian Zhang
Yan Song
53
4
0
07 Dec 2023
Large Language Models Meet Computer Vision: A Brief Survey
Large Language Models Meet Computer Vision: A Brief Survey
Raby Hamadi
LM&MA
21
4
0
28 Nov 2023
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation
  in non-English Text-to-Image Generation
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jiancang Ma
Chen Chen
Qingsong Xie
H. Lu
DiffM
VLM
20
3
0
28 Nov 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task
  Instruction Tuning
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
15
7
0
12 Oct 2023
Practical Membership Inference Attacks Against Large-Scale Multi-Modal
  Models: A Pilot Study
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
Myeongseob Ko
Ming Jin
Chenguang Wang
Ruoxi Jia
31
27
0
29 Sep 2023
12
Next