ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1405.0312
  4. Cited By
Microsoft COCO: Common Objects in Context

Microsoft COCO: Common Objects in Context

1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
    ObjD
ArXivPDFHTML

Papers citing "Microsoft COCO: Common Objects in Context"

50 / 652 papers shown
Title
CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation
CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation
Shipeng Liu
Liang Zhao
Dengfeng Chen
SSL
144
1
0
19 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
400
1
0
19 Nov 2024
SL-YOLO: A Stronger and Lighter Drone Target Detection Model
Defan Chen
Luchan Zhang
ObjD
116
2
0
18 Nov 2024
Conceptwm: A Diffusion Model Watermark for Concept Protection
Liangqi Lei
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
WIGM
122
2
0
18 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
177
1
0
18 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
370
1
0
17 Nov 2024
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Vipula Rawte
Sarthak Jain
Aarush Sinha
Garv Kaushik
Aman Bansal
...
Aishwarya N. Reganti
Vinija Jain
Aman Chadha
A. Sheth
A. Das
VLM
MLLM
123
1
0
16 Nov 2024
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
Shitong Shao
Zikai Zhou
Tian Ye
Lichen Bai
Zhiqiang Xu
Zeke Xie
DiffM
67
0
0
16 Nov 2024
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
106
2
0
14 Nov 2024
Bayesian Comparisons Between Representations
Bayesian Comparisons Between Representations
Heiko H. Schütt
FAtt
411
0
0
13 Nov 2024
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data
Chika Maduabuchi
Ericmoore Jossou
Matteo Bucci
52
0
0
12 Nov 2024
Diffusion Sampling Correction via Approximately 10 Parameters
Diffusion Sampling Correction via Approximately 10 Parameters
Guangyi Wang
Wei Peng
Lijiang Li
Wenyu Chen
Yuren Cai
Songzhi Su
DiffM
54
0
0
10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
63
0
0
09 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
68
17
0
07 Nov 2024
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
Aitor Martinez-Seras
Javier Del Ser
Alain Andres
Pablo García Bringas
Pablo Garcia-Bringas
OODD
63
0
0
07 Nov 2024
VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models
VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models
Ming Cheng
Jiaying Gong
Chenhan Yuan
William A. Ingram
Edward A. Fox
Hoda Eldardiry
136
1
0
07 Nov 2024
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization
Huayang Huang
Yu Wu
Qian Wang
DiffM
WIGM
69
7
0
06 Nov 2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
D. Song
Sicheng Lai
Shunian Chen
Lichao Sun
Benyou Wang
360
0
0
06 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
99
3
0
05 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
Mohammad Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Ming-Yu Liu
150
15
0
04 Nov 2024
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
Sejoon Oh
Yiqiao Jin
Megha Sharma
Donghyun Kim
Eric Ma
Gaurav Verma
Srijan Kumar
83
6
0
03 Nov 2024
A Geometric Framework for Understanding Memorization in Generative Models
A Geometric Framework for Understanding Memorization in Generative Models
Brendan Leigh Ross
Hamidreza Kamkari
Tongzi Wu
Rasa Hosseinzadeh
Zhaoyan Liu
George Stein
Jesse C. Cresswell
Gabriel Loaiza-Ganem
88
7
0
31 Oct 2024
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving
Maciej K. Wozniak
Hariprasath Govindarajan
Marvin Klingner
Camille Maurice
B Ravi Kiran
S. Yogamani
3DPC
107
1
0
30 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
381
0
0
29 Oct 2024
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices
Ming Kang
F. F. Ting
Raphaël C.-W. Phan
C. Ting
ViT
MedIm
133
1
0
29 Oct 2024
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
Hang Guo
Yawei Li
Tao Dai
Shu-Tao Xia
Luca Benini
MQ
51
2
0
29 Oct 2024
TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors
TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors
Adonisz Dimitriu
Tamás Michaletzky
Viktor Remeli
AAML
367
0
0
28 Oct 2024
Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models!
Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models!
Arash Marioriyad
Mohammadali Banayeeanzade
Reza Abbasi
M. Rohban
M. Baghshah
DiffM
94
3
0
28 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
81
1
0
26 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
Devendra Singh Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
115
1
0
25 Oct 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
362
4
2
24 Oct 2024
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Shilin Lu
Zihan Zhou
Jiayou Lu
Yuanzhi Zhu
A. Kong
WIGM
111
13
0
24 Oct 2024
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
Yaxiong Wang
Yijiao Wang
Lianwei Wu
Lechao Cheng
Zhun Zhong
Meng Wang
VLM
50
0
0
23 Oct 2024
CLEAR: Character Unlearning in Textual and Visual Modalities
CLEAR: Character Unlearning in Textual and Visual Modalities
Alexey Dontsov
Dmitrii Korzh
Alexey Zhavoronkin
Boris Mikheev
Denis Bobkov
Aibek Alanov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
MU
AILaw
VLM
93
5
0
23 Oct 2024
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Hao-Tang Tsui
Chien-Yao Wang
H. Liao
ObjD
VLM
87
0
0
20 Oct 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Chaodong Xiao
Minghan Li
Zhengqiang Zhang
Deyu Meng
Lei Zhang
Mamba
113
5
0
19 Oct 2024
Truncated Consistency Models
Truncated Consistency Models
Sangyun Lee
Yilun Xu
Tomas Geffner
Giulia Fanti
Karsten Kreis
Arash Vahdat
Weili Nie
102
3
0
18 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
123
27
0
18 Oct 2024
An Online Learning Approach to Prompt-based Selection of Generative Models
An Online Learning Approach to Prompt-based Selection of Generative Models
Xiaoyan Hu
Ho-fung Leung
Farzan Farnia
139
3
0
17 Oct 2024
Artificial Kuramoto Oscillatory Neurons
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato
Sindy Löwe
Andreas Geiger
Max Welling
AI4CE
138
7
0
17 Oct 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Jaehong Yoon
Shoubin Yu
Vaidehi Patil
Huaxiu Yao
Joey Tianyi Zhou
97
21
0
16 Oct 2024
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu
Liang Pang
Yunchang Zhu
Huawei Shen
Xueqi Cheng
MLLM
64
1
0
16 Oct 2024
LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
Joey Wilson
Ruihan Xu
Yile Sun
Parker Ewen
Minghan Zhu
Kira Barton
Maani Ghaffari
64
0
0
15 Oct 2024
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation
Jiayi Lin
Jiabo Huang
Jian Hu
S. Gong
DiffM
VLM
76
0
0
15 Oct 2024
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations
Naveen Gupta
Medha Sawhney
Arka Daw
Youzuo Lin
Anuj Karpatne
MedIm
AI4CE
63
3
0
15 Oct 2024
Fractal Calibration for long-tailed object detection
Fractal Calibration for long-tailed object detection
Konstantinos Panagiotis Alexandridis
Ismail Elezi
Jiankang Deng
Anh H. Nguyen
Shan Luo
367
0
0
15 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Browsing without Third-Party Cookies: What Do You See?
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
140
10
0
14 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
82
0
0
14 Oct 2024
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
Jiawei Li
Fanrui Zhang
Jiaying Zhu
Esther Sun
Qiang Zhang
Zheng-jun Zha
MLLM
86
12
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
145
5
0
14 Oct 2024
Previous
123...678...121314
Next