ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01291
  4. Cited By
Evaluating Text-to-Visual Generation with Image-to-Text Generation

Evaluating Text-to-Visual Generation with Image-to-Text Generation

1 April 2024
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
    EGVM
ArXivPDFHTML

Papers citing "Evaluating Text-to-Visual Generation with Image-to-Text Generation"

50 / 100 papers shown
Title
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Shubham Parashar
Blake Olson
Sambhav Khurana
Eric Li
Hongyi Ling
James Caverlee
Shuiwang Ji
LRM
ReLM
81
8
0
18 Feb 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli
Ayhan Suleymanzade
Na Min An
Huzama Ahmad
Eunsu Kim
Junyeong Park
James Thorne
Alice H. Oh
89
0
0
13 Feb 2025
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Xiaowen Qiu
Jincheng Yang
Yian Wang
Zhehuan Chen
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
81
4
0
04 Feb 2025
FFA Sora, video generation as fundus fluorescein angiography simulator
FFA Sora, video generation as fundus fluorescein angiography simulator
Xinyuan Wu
Lili Wang
Ruoyu Chen
Bowen Liu
Weiyi Zhang
Xi Yang
Yifan Feng
M. He
Danli Shi
VGen
36
1
0
23 Dec 2024
CAP: Evaluation of Persuasive and Creative Image Generation
CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh
Adriana Kovashka
EGVM
85
1
0
10 Dec 2024
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in
  Foreground-conditioned Inpainting
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu
Yuzhen Du
Zhimin Sun
Ran Yi
Yifan Qi
Yizhe Tang
Tianyi Wang
Lizhuang Ma
Fangyuan Zou
DiffM
70
1
0
05 Dec 2024
Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin
Jisu Nam
Jiyoung Kim
Dahyun Chung
Yeong-Seok Kim
Joonhyung Park
Heonjeong Chu
Seungryong Kim
DiffM
73
0
0
04 Dec 2024
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
116
4
0
28 Nov 2024
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image
  Synthesis
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao
C. Li
X. U. Wang
Andi Zhang
Rui Sun
Zizhe Wang
Yao Zhu
DiffM
57
0
0
25 Nov 2024
Automatic Evaluation for Text-to-image Generation: Task-decomposed
  Framework, Distilled Training, and Meta-evaluation Benchmark
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Rong-Cheng Tu
Zi-Ao Ma
Tian Lan
Yuehao Zhao
Heyan Huang
Xian-Ling Mao
MLLM
VLM
EGVM
98
3
0
23 Nov 2024
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical
  2D Inpainting
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting
Yian Wang
Xiaowen Qiu
Jiageng Liu
Zhehuan Chen
Jiting Cai
Yufei Wang
Tsun-Hsuan Wang
Zhou Xian
Chuang Gan
VGen
AI4CE
43
5
0
14 Nov 2024
Evaluating the Generation of Spatial Relations in Text and Image
  Generative Models
Evaluating the Generation of Spatial Relations in Text and Image Generative Models
Shang Hong Sim
Clarence Lee
A. Tan
Cheston Tan
EGVM
23
2
0
12 Nov 2024
ProEdit: Simple Progression is All You Need for High-Quality 3D Scene
  Editing
ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing
Jun-Kun Chen
Yu-Xiong Wang
DiffM
35
4
0
07 Nov 2024
Natural Language Inference Improves Compositionality in Vision-Language
  Models
Natural Language Inference Improves Compositionality in Vision-Language Models
Paola Cascante-Bonilla
Yu Hou
Yang Trista Cao
Hal Daumé III
Rachel Rudinger
ReLM
CoGe
VLM
33
3
0
29 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
52
4
0
23 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
51
20
0
18 Oct 2024
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Luping Liu
Chao Du
Tianyu Pang
Zehan Wang
Chongxuan Li
Dong Xu
VLM
48
5
0
15 Oct 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark
  for Video Generation
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng
Jiaqi Liao
Xinyu Tan
Wenqi Shao
Quanfeng Lu
Kaipeng Zhang
Yu Cheng
Dianqi Li
Yu Qiao
Ping Luo
VGen
EGVM
29
23
0
07 Oct 2024
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu
Tong Xiao
Rui Wang
Wang Zhu
Pengchuan Zhang
Guan Pang
Robin Jia
Lawrence Chen
55
5
0
07 Oct 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with
  Preferences through $f$-divergence Minimization
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through fff-divergence Minimization
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
29
2
0
15 Sep 2024
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent
  Noising-and-Denoising Process
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process
Yang Luo
Y. Zhang
Zhaofan Qiu
Ting Yao
Zhineng Chen
Yu-Gang Jiang
Tao Mei
DiffM
19
4
0
11 Sep 2024
ConceptMix: A Compositional Image Generation Benchmark with Controllable
  Difficulty
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
Xindi Wu
Dingli Yu
Yangsibo Huang
Olga Russakovsky
Sanjeev Arora
CoGe
EGVM
37
0
0
26 Aug 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
72
1
0
26 Aug 2024
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
Bingyuan Wang
Qifeng Chen
Zeyu Wang
39
7
0
22 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
32
10
0
17 Aug 2024
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle
  Asset Generation in Autonomous Driving
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu
Zheyuan Yang
Guile Wu
Y. Ren
Kejian Lin
Bingbing Liu
Yang Liu
Jinjun Shan
20
5
0
09 Jul 2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He
Yangsibo Huang
Weijia Shi
Tinghao Xie
Haotian Liu
Yue Wang
Luke Zettlemoyer
Chiyuan Zhang
Danqi Chen
Peter Henderson
39
9
0
20 Jun 2024
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual
  Generation
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Baiqi Li
Zhiqiu Lin
Deepak Pathak
Jiayao Li
Yixin Fei
...
Tiffany Ling
Xide Xia
Pengchuan Zhang
Graham Neubig
Deva Ramanan
EGVM
42
24
0
19 Jun 2024
Consistency-diversity-realism Pareto fronts of conditional image
  generative models
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi
Marlene Careil
Melissa Hall
Oscar Manas
Matthew Muckley
Jakob Verbeek
Adriana Romero Soriano
M. Drozdzal
39
4
0
14 Jun 2024
BiVLC: Extending Vision-Language Compositionality Evaluation with
  Text-to-Image Retrieval
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
Imanol Miranda
Ander Salaberria
Eneko Agirre
Gorka Azkune
CoGe
28
0
0
14 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGe
LRM
30
1
0
12 Jun 2024
DiffusionPID: Interpreting Diffusion via Partial Information
  Decomposition
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Shaurya Dewan
Rushikesh Zawar
Prakanshul Saxena
Yingshan Chang
Andrew F. Luo
Yonatan Bisk
DiffM
32
3
0
07 Jun 2024
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network
  Motion Retargeting
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
Zenghao Chai
Chen Tang
Yongkang Wong
Mohan Kankanhalli
DiffM
24
7
0
07 Jun 2024
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise
  Optimization
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
L. Eyring
Shyamgopal Karthik
Karsten Roth
Alexey Dosovitskiy
Zeynep Akata
68
16
0
06 Jun 2024
Semantic Similarity Score for Measuring Visual Similarity at Semantic
  Level
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
Senran Fan
Zhicheng Bao
Chen Dong
Haotai Liang
Xiaodong Xu
Ping Zhang
21
1
0
06 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Yizhou Sun
Kai-Wei Chang
Aditya Grover
EGVM
VGen
29
36
0
05 Jun 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang
H. Wu
Chunyi Li
Yingjie Zhou
Wei Sun
Xiongkuo Min
Zijian Chen
Xiaohong Liu
Weisi Lin
Guangtao Zhai
EGVM
41
14
0
05 Jun 2024
Query2CAD: Generating CAD models using natural language queries
Query2CAD: Generating CAD models using natural language queries
Akshay Badagabettu
Sai Sravan Yarlagadda
A. Farimani
23
13
0
31 May 2024
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman
Noam Rotstein
Roy Ganz
Ron Kimmel
DiffM
28
14
0
28 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
113
13
0
25 Apr 2024
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
136
124
0
07 Nov 2023
Language Models as Black-Box Optimizers for Vision-Language Models
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu
Zhiqiu Lin
Samuel Yu
Ryan Lee
Tiffany Ling
Deepak Pathak
Deva Ramanan
VLM
22
28
0
12 Sep 2023
Revisiting the Role of Language Priors in Vision-Language Models
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
12
7
0
02 Jun 2023
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E: Generating Conditional 3D Implicit Functions
Heewoo Jun
Alex Nichol
DiffM
184
300
0
03 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image
  Generation
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
160
345
0
02 May 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
30
44
0
25 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
515
0
02 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of
  Text-to-Image Generation Models
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Mohit Bansal
ViT
132
167
0
08 Feb 2022
Previous
12