ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.06242
  4. Cited By
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

10 November 2023
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
    VLM
ArXivPDFHTML

Papers citing "Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks"

50 / 105 papers shown
Title
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
Benjamin Raphael Ernhofer
Daniil Prokhorov
Jannica Langner
Dominik Bollmann
23
0
0
09 May 2025
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Karthik Reddy Kanjula
Surya Guthikonda
Nahid Alam
Shayekh Bin Islam
14
0
0
09 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
50
0
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
31
0
0
04 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
50
0
0
03 May 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Gabriel Sarch
Balasaravanan Thoravi Kumaravel
Sahithya Ravi
Vibhav Vineet
A. D. Wilson
42
0
0
02 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Y. Chen
H. Li
Xiaoshen Han
Z. Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
75
0
0
30 Apr 2025
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
Modesto Castrillón-Santana
Oliverio J. Santana
David Freire-Obregón
Daniel Hernández-Sosa
J. Lorenzo-Navarro
48
0
0
30 Apr 2025
Neural network task specialization via domain constraining
Neural network task specialization via domain constraining
Roman Malashin
Daniil Ilyukhin
49
0
0
28 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Z. Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Mohit Bansal
Huaxiu Yao
54
0
0
27 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
X. Zhang
Gang Yu
Daxin Jiang
DiffM
93
2
0
24 Apr 2025
Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
Ilyass Taouil
Haizhou Zhao
Angela Dai
Majid Khadiv
DiffM
43
0
0
23 Apr 2025
UFO2: The Desktop AgentOS
UFO2: The Desktop AgentOS
Chaoyun Zhang
He Huang
Chiming Ni
J. Mu
Si Qin
...
Minghua Ma
Jian-Guang Lou
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LLMAG
34
0
0
20 Apr 2025
Logits DeConfusion with CLIP for Few-Shot Learning
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li
F. Liu
Zehua Hao
X. Wang
Lingling Li
X. Liu
Puhua Chen
Wenping Ma
VLM
47
0
0
16 Apr 2025
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model
Kaiyu Li
Zepeng Xin
Li Pang
Chao Pang
Yupeng Deng
Jing Yao
Guisong Xia
Deyu Meng
Zhi Wang
Xiangyong Cao
VLM
LRM
37
0
0
13 Apr 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
32
0
0
11 Apr 2025
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu
Qizhi Chen
Z. Wang
Yiwen Tang
Yiting Zhang
Chi Yan
Dong Wang
X. Li
Bin Zhao
CoGe
47
0
0
10 Apr 2025
Ternarization of Vision Language Models for use on edge devices
Ternarization of Vision Language Models for use on edge devices
Ben Crulis
Cyril de Runz
Barthélémy Serres
Gilles Venturini
VLM
50
0
0
07 Apr 2025
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
Sofian Chaybouti
Walid Bousselham
Moritz Wolter
Hilde Kuehne
38
0
0
07 Apr 2025
VISTA-OCR: Towards generative and interactive end to end OCR models
VISTA-OCR: Towards generative and interactive end to end OCR models
Laziz Hamdi
Amine Tamasna
Pascal Boisson
Thierry Paquet
33
0
0
04 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
N. Werghi
37
0
0
03 Apr 2025
A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
AT^\text{T}TA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang
Zhimin Sun
Yuzhen Du
Ran Yi
Guangben Lu
T. Hu
Luying Li
Lizhuang Ma
Fangyuan Zou
DiffM
35
0
0
02 Apr 2025
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie
D. Pakhomov
Zhonghao Wang
Zongze Wu
Ziyan Chen
...
Haitian Zheng
Zhifei Zhang
Zhe Lin
Jiantao Zhou
Chao Dong
35
0
0
01 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
46
0
0
01 Apr 2025
IntrinsiX: High-Quality PBR Generation using Image Priors
IntrinsiX: High-Quality PBR Generation using Image Priors
Peter Kocsis
Lukas Höllein
Matthias Nießner
33
0
0
01 Apr 2025
EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell Analysis
EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell Analysis
Nils Friederich
Angelo Jovin Yamachui Sitcheu
Annika Nassal
Erenus Yildiz
Matthias Pesch
...
D. Kohlheyer
Hanno Scharr
Johannes Seiffarth
K. Nöh
Ralf Mikut
29
0
0
30 Mar 2025
From Panels to Prose: Generating Literary Narratives from Comics
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva
Andrew Zisserman
39
0
0
30 Mar 2025
Efficient Adaptation For Remote Sensing Visual Grounding
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
42
0
0
29 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Shitian Zhao
Qilong Wu
Xinyue Li
Bo Zhang
Ming-xing Li
...
H. Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
43
0
0
27 Mar 2025
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation
Niccolo Avogaro
Thomas Frick
Mattia Rigotti
A. Bartezzaghi
Filip Janicki
C. Malossi
Konrad Schindler
Roy Assaf
MLLM
VLM
55
0
0
25 Mar 2025
M3: 3D-Spatial MultiModal Memory
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
54
0
0
20 Mar 2025
A Recipe for Generating 3D Worlds From a Single Image
A Recipe for Generating 3D Worlds From a Single Image
Katja Schwarz
Denys Rozumnyi
Samuel Rota Buló
Lorenzo Porzi
Peter Kontschieder
VGen
74
1
0
20 Mar 2025
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making
Mohamed Salim Aissi
Clemence Grislain
Mohamed Chetouani
Olivier Sigaud
Laure Soulier
Nicolas Thome
LRM
39
0
0
19 Mar 2025
MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling
MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling
Damian Boborzi
Phillip Mueller
Jonas Emrich
Dominik Schmid
Sebastian Mueller
Lars Mikelsons
DiffM
65
0
0
18 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
56
0
0
13 Mar 2025
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
68
0
0
12 Mar 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
LRM
MLLM
47
0
0
10 Mar 2025
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias
Mingxiao Li
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
EGVM
36
0
0
09 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
65
0
0
05 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
48
0
0
05 Mar 2025
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions
Jun Yu Li
Che Liu
Wenjia Bai
Rossella Arcucci
Cosmin I. Bercea
Julia A. Schnabel
39
0
0
05 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
67
1
0
04 Mar 2025
Teaching Metric Distance to Autoregressive Multimodal Foundational Models
Jiwan Chung
Saejin Kim
Yongrae Jo
J. Park
Dongjun Min
Youngjae Yu
69
0
0
04 Mar 2025
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin
Xin Yang
Meixi Chen
Yingjie Xu
D. Yan
Leyi Wu
Xinli Xu
Lie Xu
Shunsi Zhang
Ying-Cong Chen
52
1
0
03 Mar 2025
ClipGrader: Leveraging Vision-Language Models for Robust Label Quality Assessment in Object Detection
Hong Lu
Yali Bian
Rahul C. Shah
ObjD
VLM
76
0
0
03 Mar 2025
EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models
EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models
Nastaran Darabi
Devashri Naik
Sina Tayebati
Dinithi Jayasuriya
Ranganath Krishnan
A. R. Trivedi
AAML
39
0
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
110
1
0
21 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
H. Zhang
X. Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Y. Yang
Zhe Gan
CLIP
VLM
59
7
0
20 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
128
3
0
18 Feb 2025
123
Next