ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.12714
  4. Cited By
VIGC: Visual Instruction Generation and Correction

VIGC: Visual Instruction Generation and Correction

24 August 2023
Bin Wang
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
Pan Zhang
Xiao-wen Dong
Weijia Li
Wei Li
Jiaqi Wang
Conghui He
    MLLM
ArXivPDFHTML

Papers citing "VIGC: Visual Instruction Generation and Correction"

14 / 14 papers shown
Title
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models
Sangmin Woo
Kang Zhou
Yun Zhou
Shuai Wang
Sheng Guan
Haibo Ding
Lin Lee Cheong
VPVLM
74
0
0
30 Apr 2025
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang
Yifei Gao
Jitao Sang
MLLM
96
2
0
24 Nov 2024
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Jin Chen
Kaijing Ma
Haojian Huang
Jiayu Shen
Han Fang
Xianghao Zang
Chao Ban
73
2
0
17 Sep 2024
Generating Faithful and Salient Text from Multimodal Data
Generating Faithful and Salient Text from Multimodal Data
Tahsina Hashem
Weiqing Wang
Derry Tanti Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
11
0
0
06 Sep 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
27
9
0
22 Jul 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
68
136
0
29 Apr 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
38
2
0
07 Mar 2024
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
27
222
0
26 Sep 2023
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and
  Chinese Large Models
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models
Conghui He
Zhenjiang Jin
Chaoxi Xu
Jiantao Qiu
Bin Wang
Wei Li
Hang Yan
Jiaqi Wang
Da Lin
56
32
0
21 Aug 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
154
576
0
06 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Guiding Visual Question Generation
Guiding Visual Question Generation
Nihir Vedd
Zixu Wang
Marek Rei
Yishu Miao
Lucia Specia
58
23
0
15 Oct 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
1