ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.00716
  4. Cited By
JourneyDB: A Benchmark for Generative Image Understanding

JourneyDB: A Benchmark for Generative Image Understanding

3 July 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
Xiaoshi Wu
Renrui Zhang
Aojun Zhou
Zipeng Qin
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
ArXivPDFHTML

Papers citing "JourneyDB: A Benchmark for Generative Image Understanding"

38 / 88 papers shown
Title
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
35
159
0
22 Aug 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Jinxiu Liu
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Qi Liu
Guo-Jun Qi
VGen
VLM
21
1
0
07 Aug 2024
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
  using Instruct Prompts
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
Ciara Rowles
Shimon Vainer
Dante De Nigris
Slava Elizarov
Konstantin Kutsy
Simon Donné
DiffM
24
9
0
06 Aug 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
26
8
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
30
9
0
22 Jul 2024
StyleShot: A Snapshot on Any Style
StyleShot: A Snapshot on Any Style
Junyao Gao
Yanchen Liu
Yanan Sun
Yinhao Tang
Yanhong Zeng
Kai Chen
Cairong Zhao
TTA
3DH
VLM
58
14
0
01 Jul 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of
  Text-to-Time-lapse Video Generation
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan
Jinfa Huang
Yongqi Xu
Yaoyang Liu
Shaofeng Zhang
Yujun Shi
Ruijie Zhu
Xinhua Cheng
Jiebo Luo
Li Yuan
EGVM
VGen
66
1
0
26 Jun 2024
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned
  Data for Evaluating Text-to-Image Models
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Mengping Yang
Cheng Zhang
Hao Li
42
7
0
24 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic
  Representations in Naturalistic Images
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
25
1
0
19 Jun 2024
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Neural Residual Diffusion Models for Deep Scalable Vision Generation
Zhiyuan Ma
Liangliang Zhao
Biqing Qi
Bowen Zhou
DiffM
53
2
0
19 Jun 2024
MotionClone: Training-Free Motion Cloning for Controllable Video
  Generation
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling
Jiazi Bu
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Tong Wu
H. Chen
Jiaqi Wang
Yi Jin
VGen
DiffM
18
34
0
08 Jun 2024
DiffusionPID: Interpreting Diffusion via Partial Information
  Decomposition
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Shaurya Dewan
Rushikesh Zawar
Prakanshul Saxena
Yingshan Chang
Andrew F. Luo
Yonatan Bisk
DiffM
29
3
0
07 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Yizhou Sun
Kai-Wei Chang
Aditya Grover
EGVM
VGen
26
36
0
05 Jun 2024
EasyAnimate: A High-Performance Long Video Generation Method based on
  Transformer Architecture
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Jiaqi Xu
Xinyi Zou
Kunzhe Huang
Yunkuo Chen
Bo Liu
Mengli Cheng
Xing Shi
Jun Huang
VGen
22
34
0
29 May 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
24
5
0
27 May 2024
An Empirical Study and Analysis of Text-to-Image Generation Using Large
  Language Model-Powered Textual Representation
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Zhiyu Tan
Mengping Yang
Luozheng Qin
Hao Yang
Ye Qian
Qiang-feng Zhou
Cheng Zhang
Hao Li
60
3
0
21 May 2024
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
Hanshu Yan
Xingchao Liu
Jiachun Pan
Jun Hao Liew
Qiang Liu
Jiashi Feng
19
40
0
13 May 2024
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional
  Image Editing
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing
Yuying Ge
Sijie Zhao
Chen Li
Yixiao Ge
Ying Shan
22
18
0
07 May 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
Auto-Encoding Morph-Tokens for Multimodal LLM
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
MLLM
22
16
0
03 May 2024
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with
  Reward Feedback Learning
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
Weifeng Chen
Jiacheng Zhang
Jie Wu
Hefeng Wu
Xuefeng Xiao
Liang Lin
18
12
0
23 Apr 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
52
103
0
22 Apr 2024
UniFL: Improve Stable Diffusion via Unified Feedback Learning
UniFL: Improve Stable Diffusion via Unified Feedback Learning
Jiacheng Zhang
Jie Wu
Yuxi Ren
Xin Xia
Huafeng Kuang
...
Jiashi Li
Xuefeng Xiao
Min Zheng
Lean Fu
Guanbin Li
29
2
0
08 Apr 2024
Diffusion Deepfake
Diffusion Deepfake
Chaitali Bhattacharyya
Hanxiao Wang
Feng Zhang
Sung-Ha Kim
Xiatian Zhu
19
5
0
02 Apr 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
24
165
0
21 Mar 2024
Bridging Different Language Models and Generative Vision Models for
  Text-to-Image Generation
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao
Shaozhe Hao
Bojia Zi
Huaizhe Xu
Kwan-Yee Kenneth Wong
DiffM
VLM
52
7
0
12 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
54
39
0
08 Mar 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
19
114
0
19 Feb 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen
Yong Zhang
Xiaodong Cun
Menghan Xia
Xintao Wang
Chao-Liang Weng
Ying Shan
VGen
DiffM
115
269
0
17 Jan 2024
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational
  Score Distillation
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Hoang Nguyen
Anh Tran
DiffM
10
55
0
08 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
6
6
0
06 Dec 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
38
102
0
09 Nov 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing
Menghan Xia
Yong Zhang
Haoxin Chen
Wangbo Yu
Hanyuan Liu
Xintao Wang
Tien-Tsin Wong
Ying Shan
VGen
11
198
0
18 Oct 2023
AI-Generated Images as Data Source: The Dawn of Synthetic Era
AI-Generated Images as Data Source: The Dawn of Synthetic Era
Zuhao Yang
Fangneng Zhan
Kunhao Liu
Muyu Xu
Shijian Lu
EGVM
25
18
0
03 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
Making LLaMA SEE and Draw with SEED Tokenizer
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
17
127
0
02 Oct 2023
PixArt-$α$: Fast Training of Diffusion Transformer for
  Photorealistic Text-to-Image Synthesis
PixArt-ααα: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen
Jincheng Yu
Chongjian Ge
Lewei Yao
Enze Xie
...
Zhongdao Wang
James T. Kwok
Ping Luo
Huchuan Lu
Zhenguo Li
DiffM
17
300
0
30 Sep 2023
Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image
  Diffusion Models
Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias
Dominic Rampas
Mats L. Richter
Christopher Pal
Marc Aubreville
DiffM
VLM
11
42
0
01 Jun 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
121
181
0
22 Oct 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
Previous
12