ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1405.0312
  4. Cited By
Microsoft COCO: Common Objects in Context

Microsoft COCO: Common Objects in Context

1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
    ObjD
ArXivPDFHTML

Papers citing "Microsoft COCO: Common Objects in Context"

50 / 652 papers shown
Title
Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery
Self-Supervised Data Generation for Precision Agriculture: Blending Simulated Environments with Real Imagery
Leonardo Saraceni
I. M. Motoi
Daniele Nardi
Thomas Alessandro Ciarfuglia
75
1
0
25 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
129
8
0
24 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
54
10
0
24 Feb 2025
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
Zeyu Yang
Nan Song
Wei Li
Xiatian Zhu
Lefei Zhang
Philip H. S. Torr
107
4
0
24 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
Jan-Jakob Sonke
E. Gavves
96
0
0
24 Feb 2025
Model Lakes
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
99
0
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
366
0
0
21 Feb 2025
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines
Xinyi Ying
Chao Xiao
Ruojing Li
Xu He
Boyang Li
...
Miao Li
Shilin Zhou
Wei An
Weidong Sheng
Li Liu
174
7
0
21 Feb 2025
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
Yuming Chen
Xinbin Yuan
Ruiqi Wu
Jiabao Wang
Qibin Hou
Mingg-Ming Cheng
Ming-Ming Cheng
ObjD
215
52
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
147
2
0
21 Feb 2025
VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion
VaLID: Verification as Late Integration of Detections for LiDAR-Camera Fusion
Vanshika Vats
Marzia Binta Nizam
James Davis
3DPC
100
0
0
21 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
107
7
0
20 Feb 2025
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Kyungmin Min
Minbeom Kim
Kang-il Lee
Dongryeol Lee
Kyomin Jung
MLLM
108
5
0
20 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
95
15
0
20 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIP
VLM
73
9
0
20 Feb 2025
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Kang Liao
Zongsheng Yue
Zhouxia Wang
Chen Change Loy
129
3
0
20 Feb 2025
On the Statistical Complexity of Estimating Vendi Scores from Empirical Data
On the Statistical Complexity of Estimating Vendi Scores from Empirical Data
Azim Ospanov
Farzan Farnia
86
1
0
17 Feb 2025
Simplifying DINO via Coding Rate Regularization
Simplifying DINO via Coding Rate Regularization
Ziyang Wu
Jingyuan Zhang
Druv Pai
Xinze Wang
Chandan Singh
Jianwei Yang
Jianfeng Gao
Yi-An Ma
380
1
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
124
0
0
17 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
164
6
0
12 Feb 2025
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
Prajwal Gatti
Kshitij Parikh
Dhriti Prasanna Paul
Manish Gupta
Anand Mishra
162
2
0
12 Feb 2025
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks
Lotfi Abdelkrim Mecharbat
Alberto Marchisio
Mohamed Bennai
M. Ghassemi
Tuka Alhanai
132
0
0
11 Feb 2025
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
Wenxi Li
Yuchen Guo
Jilai Zheng
Haozhe Lin
Chao Ma
Lu Fang
Xiaokang Yang
ViT
76
3
0
11 Feb 2025
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases
Sang Quang Nguyen
Kiet Van Nguyen
98
0
0
11 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
92
0
0
10 Feb 2025
Deciphering Functions of Neurons in Vision-Language Models
Deciphering Functions of Neurons in Vision-Language Models
Jiaqi Xu
Cuiling Lan
Xuejin Chen
Yan Lu
VLM
173
0
0
10 Feb 2025
Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector
Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector
Qirui Wu
Shizhou Zhang
De Cheng
Yinghui Xing
Di Xu
Peng Wang
Yanning Zhang
ObjD
127
0
0
08 Feb 2025
Coarse-to-Fine Structure-Aware Artistic Style Transfer
Coarse-to-Fine Structure-Aware Artistic Style Transfer
Kunxiao Liu
Guowu Yuan
Hao Wu
Wenhua Qian
86
0
0
08 Feb 2025
Augmented Conditioning Is Enough For Effective Training Image Generation
Augmented Conditioning Is Enough For Effective Training Image Generation
Jiahui Chen
Amy Zhang
Adriana Romero-Soriano
DiffM
VLM
120
0
0
06 Feb 2025
Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
Nathan Louis
Mahzad Khoshlessan
Jason J. Corso
104
0
0
06 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
128
4
0
06 Feb 2025
Quantifying Correlations of Machine Learning Models
Quantifying Correlations of Machine Learning Models
Yuanyuan Li
Neeraj Sarna
Yang Lin
109
0
0
06 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
101
1
0
06 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Yuxiao Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
85
2
0
05 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
Di Zhang
Chunhong Pan
129
1
0
04 Feb 2025
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu
Bing Li
Xiao Jin
Yupeng Shi
Qiuying Li
Xiang Wei
84
0
0
03 Feb 2025
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Keyi Zhu
Jiajia Li
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
R. Lu
Zhaojian Li
103
0
0
03 Feb 2025
Grokking Explained: A Statistical Phenomenon
Grokking Explained: A Statistical Phenomenon
B. W. Carvalho
Artur Garcez
Luís C. Lamb
Emílio Vital Brazil
78
0
0
03 Feb 2025
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches
Luca Ciampi
Ali Azmoudeh
Elif Ecem Akbaba
Erdi Sarıtaş
Ziya Ata Yazıcı
H. K. Ekenel
Giuseppe Amato
Fabrizio Falchi
140
0
0
31 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
130
0
0
28 Jan 2025
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors
Mingyuan Li
Tong Jia
Hui Lu
Bowen Ma
Hao Wang
Dongyue Chen
93
0
0
28 Jan 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
78
1
0
28 Jan 2025
Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition
Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition
Jielong Tang
Zhenxing Wang
Ziyang Gong
Jianxing Yu
Shuang Wang
Jian Yin
79
0
0
28 Jan 2025
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?
Utku Ozbulak
Esla Timothy Anzaku
Solha Kang
W. D. Neve
J. Vankerschaver
71
0
0
28 Jan 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
Jun Zhu
92
1
0
28 Jan 2025
BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity
BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity
Zahra Gharaee
Scott C. Lowe
ZeMing Gong
Pablo Millán Arias
Nicholas Pellegrino
...
Lila Kari
Dirk Steinke
Graham W. Taylor
Paul Fieguth
Angel X. Chang
80
9
0
28 Jan 2025
Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room
S. Cepeda
Olga Esteban-Sinovas
Roberto Romero
Vikas Singh
Prakash Shetty
...
Timothy R. West
Brian V. Nahed
I. Arrese
Roberto Hornero
R. Sarabia
73
0
0
28 Jan 2025
Scaling laws for decoding images from brain activity
Scaling laws for decoding images from brain activity
Hubert J. Banville
Yohann Benchetrit
Stéphane DÁscoli
Jérémy Rapin
J. King
MedIm
77
0
0
25 Jan 2025
Towards Robust Unsupervised Attention Prediction in Autonomous Driving
Towards Robust Unsupervised Attention Prediction in Autonomous Driving
Mengshi Qi
Xiaoyang Bi
Pengfei Zhu
Huadong Ma
104
0
0
25 Jan 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
353
2
0
25 Jan 2025
Previous
12345...121314
Next