ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01291
  4. Cited By
Evaluating Text-to-Visual Generation with Image-to-Text Generation

Evaluating Text-to-Visual Generation with Image-to-Text Generation

1 April 2024
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
    EGVM
ArXivPDFHTML

Papers citing "Evaluating Text-to-Visual Generation with Image-to-Text Generation"

50 / 100 papers shown
Title
Large Language Models for Computer-Aided Design: A Survey
Large Language Models for Computer-Aided Design: A Survey
Licheng Zhang
Bach Le
Naveed Akhtar
Siew-Kei Lam
Tuan Ngo
3DV
AI4CE
20
0
0
13 May 2025
InstanceGen: Image Generation with Instance-level Instructions
InstanceGen: Image Generation with Instance-level Instructions
Etai Sella
Yanir Kleiman
Hadar Averbuch-Elor
16
0
0
08 May 2025
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Fu Feng
Yucheng Xie
Xu Yang
Jing Wang
Xin Geng
DiffM
29
0
0
06 May 2025
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Lu Ling
C. Lin
Tsung-Yi Lin
Yifan Ding
Y. Zeng
Yichen Sheng
Yunhao Ge
Ming-Yu Liu
Aniket Bera
Zhaoshuo Li
VGen
3DV
42
0
0
05 May 2025
Improving Physical Object State Representation in Text-to-Image Generative Systems
Improving Physical Object State Representation in Text-to-Image Generative Systems
Tianle Chen
Chaitanya Chakka
Deepti Ghadiyaram
25
0
0
04 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
60
0
0
01 May 2025
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Chenhan Jiang
Yihan Zeng
Hang Xu
Dit-Yan Yeung
44
0
0
28 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
...
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
Idan Szpektor
EGVM
VGen
30
0
0
24 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
65
1
0
24 Apr 2025
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
Ali Anaissi
Junaid Akram
Kunal Chaturvedi
Ali Braytee
22
0
0
23 Apr 2025
Towards Understanding Camera Motions in Any Video
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin
Siyuan Cen
Daniel Jiang
Jay Karhade
Hewei Wang
...
Rushikesh Zawar
Xue Bai
Yilun Du
Chuang Gan
Deva Ramanan
VGen
23
0
0
21 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Y. Hu
Y. Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Tingting Gao
Di Zhang
31
0
0
14 Apr 2025
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
Rui Chen
Lei Sun
Jing Tang
Geng Li
Xiangxiang Chu
LRM
24
0
0
14 Apr 2025
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Ning Li
Jingran Zhang
Justin Cui
MLLM
70
1
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Y. Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
72
0
0
09 Apr 2025
Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling
Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling
Jaskirat Singh
Junshen Kevin Chen
Jonas Kohler
Michael Cohen
DiffM
VGen
33
0
0
08 Apr 2025
Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects
Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects
Gal Fiebelman
Hadar Averbuch-Elor
Sagie Benaim
3DGS
26
1
0
07 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGe
LRM
32
0
0
07 Apr 2025
Imperative vs. Declarative Programming Paradigms for Open-Universe Scene Generation
Imperative vs. Declarative Programming Paradigms for Open-Universe Scene Generation
Maxim Gumin
Do Heon Han
Seung Jean Yoo
Aditya Ganeshan
R. K. Jones
Rio Aguina-Kang
Stewart Morris
Daniel E. Ritchie
23
0
0
07 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
76
1
0
02 Apr 2025
A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
AT^\text{T}TA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang
Zhimin Sun
Yuzhen Du
Ran Yi
Guangben Lu
T. Hu
Luying Li
Lizhuang Ma
Fangyuan Zou
DiffM
35
0
0
02 Apr 2025
AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models
AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models
Kristen M. Edwards
Farnaz Tehranchi
Scarlett R. Miller
Faez Ahmed
61
0
0
01 Apr 2025
Prompting Forgetting: Unlearning in GANs via Textual Guidance
Prompting Forgetting: Unlearning in GANs via Textual Guidance
Piyush Nagasubramaniam
Neeraj Karamchandani
Chen Wu
Sencun Zhu
DiffM
AILaw
MU
49
0
0
01 Apr 2025
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
Leander Girrbach
Stephan Alaniz
Genevieve Smith
Zeynep Akata
40
0
0
30 Mar 2025
On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
H. Seo
Junseo Bang
Haechang Lee
Joohoon Lee
Byung Hyun Lee
Se Young Chun
46
0
0
29 Mar 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
46
0
0
28 Mar 2025
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Beak
Hyeonwoo Kim
Hanbyul Joo
41
0
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
48
1
0
25 Mar 2025
Can Text-to-Video Generation help Video-Language Alignment?
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella
Massimiliano Mancini
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
55
0
0
24 Mar 2025
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models
Ketan Suhaas Saichandran
Xavier Thomas
Prakhar Kaushik
Deepti Ghadiyaram
DiffM
73
0
0
22 Mar 2025
HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation
HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation
Hou In Derek Pun
Hou In Ivan Tam
Austin T. Wang
Xiaoliang Huo
Angel X. Chang
Manolis Savva
3DV
46
0
0
21 Mar 2025
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang
Kai Li
Chengjiang Long
Christian Hane
Peihong Guo
Scott Delp
Ehsan Adeli
L. Fei-Fei
DiffM
3DGS
47
0
0
20 Mar 2025
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha
Kwanyoung Lee
Ye-Chan Kim
Hyunwoo Oh
Dong-Jin Kim
41
0
0
20 Mar 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
37
0
0
19 Mar 2025
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li
Mi Zhang
Yiming Sun
Min Yang
DiffM
45
1
0
19 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
54
1
0
18 Mar 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu
Rui Zhu
Shuhuai Ren
Jiacong Wang
Haoyuan Guo
Xu Sun
Lu Jiang
64
1
0
13 Mar 2025
Investigating and Improving Counter-Stereotypical Action Relation in Text-to-Image Diffusion Models
Sina Malakouti
Adriana Kovashka
EGVM
59
0
0
13 Mar 2025
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes
Yanming Zhang
Jun-Kun Chen
Jipeng Lyu
Yu-Xiong Wang
DiffM
VGen
44
0
0
13 Mar 2025
LuciBot: Automated Robot Policy Learning from Generated Videos
Xiaowen Qiu
Yian Wang
Jiting Cai
Zhehuan Chen
Chunru Lin
Tsun-Hsuan Wang
Chuang Gan
LM&Ro
VGen
67
0
0
12 Mar 2025
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
...
Qi Zhang
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
DiffM
VLM
52
5
0
10 Mar 2025
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
Xiaoliang Ju
Hongsheng Li
3DGS
36
0
0
10 Mar 2025
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Juhyeon Park
P. Y. Kim
Jiook Cha
Shinjae Yoo
Taesup Moon
45
0
0
09 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
64
1
0
08 Mar 2025
GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning
Zhun Mou
Bin Xia
Zhengchao Huang
Wenming Yang
Jiaya Jia
VGen
ELM
LRM
58
0
0
04 Mar 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
76
0
0
27 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
47
1
0
22 Feb 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li
Zhelun Shi
Xuhao Hu
Bowen Dong
Yiran Qin
Xihui Liu
Lu Sheng
Jing Shao
109
1
0
21 Feb 2025
Multi-Agent Multimodal Models for Multicultural Text to Image Generation
Multi-Agent Multimodal Models for Multicultural Text to Image Generation
Parth Bhalerao
Mounika Yalamarty
Brian Trinh
Oana Ignat
32
0
0
21 Feb 2025
Can Hallucination Correction Improve Video-Language Alignment?
Can Hallucination Correction Improve Video-Language Alignment?
Lingjun Zhao
Mingyang Xie
Paola Cascante-Bonilla
Hal Daumé III
Kwonjoon Lee
HILM
VLM
57
0
0
20 Feb 2025
12
Next