ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1405.0312
  4. Cited By
Microsoft COCO: Common Objects in Context

Microsoft COCO: Common Objects in Context

1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
    ObjD
ArXivPDFHTML

Papers citing "Microsoft COCO: Common Objects in Context"

50 / 652 papers shown
Title
Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance
Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance
Quang-Huy Che
Duc-Tri Le
Vinh-Tiep Nguyen
D. Lam
Vinh-Tiep Nguyen
DiffM
128
1
0
09 Sep 2024
Replay Consolidation with Label Propagation for Continual Object Detection
Replay Consolidation with Label Propagation for Continual Object Detection
Riccardo De Monte
Davide Dalle Pezze
Marina Ceccon
Francesco Pasti
Francesco Paissan
Elisabetta Farella
Gian Antonio Susto
Nicola Bellotto
80
2
0
09 Sep 2024
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers
Lorenza Prospero
Abdullah Hamdi
João F. Henriques
Christian Rupprecht
3DGS
68
2
0
06 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
90
54
0
06 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
337
1
0
04 Sep 2024
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation
Hayeon Jo
Hyesong Choi
Minhee Cho
Dongbo Min
73
1
0
04 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
Mustansar Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
141
4
0
30 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
68
10
0
29 Aug 2024
The Benefits of Balance: From Information Projections to Variance Reduction
The Benefits of Balance: From Information Projections to Variance Reduction
Lang Liu
Ronak R. Mehta
Soumik Pal
Zaïd Harchaoui
45
0
0
27 Aug 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Xu Zhang
Yang Zheng
Kaitai Guo
Jimin Liang
73
2
0
27 Aug 2024
DIAGen: Semantically Diverse Image Augmentation with Generative Models for Few-Shot Learning
DIAGen: Semantically Diverse Image Augmentation with Generative Models for Few-Shot Learning
Tobias Lingenberg
Markus Reuter
Gopika Sudhakaran
Dominik Gojny
Stefan Roth
Simone Schaub-Meyer
DiffM
74
3
0
26 Aug 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
108
2
0
26 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
91
1
0
23 Aug 2024
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
Liyao Jiang
Negar Hassanpour
Mohammad Salameh
Mohan Sai Singamsetti
Fengyu Sun
Wei Lu
Di Niu
DiffM
95
2
0
21 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang
Christopher Hoang
Yuwen Xiong
Yann LeCun
Mengye Ren
123
0
0
20 Aug 2024
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
Youjun Zhao
Jiaying Lin
Shuquan Ye
Qianshi Pang
Rynson W. H. Lau
101
1
0
20 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
104
8
0
13 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
117
453
0
12 Aug 2024
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
Jaehyuk Heo
Pilsung Kang
VLM
45
1
0
09 Aug 2024
EasyInv: Toward Fast and Better DDIM Inversion
EasyInv: Toward Fast and Better DDIM Inversion
Ziyue Zhang
Mingbao Lin
Shuicheng Yan
Rongrong Ji
58
2
0
09 Aug 2024
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai
Huanran Hu
Lei Wang
Shengjie Jin
X. Chen
Zhiwu Lu
LLMAG
78
8
0
08 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
107
54
0
05 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
105
25
0
04 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
77
2
0
01 Aug 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
84
1
0
29 Jul 2024
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
Yu-Yun Tseng
Tanusree Sharma
Lotus Zhang
Abigale Stangl
Leah Findlater
Yang Wang
Danna Gurari
102
0
0
25 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
125
1
0
23 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
82
7
0
22 Jul 2024
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Junseo Park
Hyeryung Jang
162
1
0
17 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
97
142
0
16 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
69
2
0
15 Jul 2024
Enrich the content of the image Using Context-Aware Copy Paste
Enrich the content of the image Using Context-Aware Copy Paste
Qiushi Guo
VLM
110
0
0
11 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
126
4
0
10 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
68
4
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
86
66
0
10 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
99
5
0
09 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
55
2
0
08 Jul 2024
M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask
M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask
Xinyu Yang
Xiaochen Ma
Xuekang Zhu
Bo Du
Lei Su
Bingkui Tong
Zeyu Lei
Jizhe Zhou
80
0
0
04 Jul 2024
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications
Irene Siragusa
Salvatore Contino
Massimo La Ciura
Rosario Alicata
Roberto Pirrone
119
3
0
03 Jul 2024
OpenSlot: Mixed Open-Set Recognition with Object-Centric Learning
OpenSlot: Mixed Open-Set Recognition with Object-Centric Learning
Xu Yin
Fei Pan
G. An
Yuchi Huo
Zixuan Xie
Sung-eui Yoon
BDL
VLM
100
1
0
02 Jul 2024
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling
Minghao Zhou
Hong Wang
Yefeng Zheng
Deyu Meng
132
1
0
02 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
132
16
0
01 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
91
27
0
28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Tingting Gao
Xi Li
MoE
87
2
0
28 Jun 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
106
27
0
28 Jun 2024
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen
Xiangtai Li
Yining Li
Yanhong Zeng
Jianzong Wu
Xiangyu Zhao
Kai Chen
VLM
DiffM
88
3
0
28 Jun 2024
A Sanity Check for AI-generated Image Detection
A Sanity Check for AI-generated Image Detection
Shilin Yan
Ouxiang Li
Jiayin Cai
Y. Hao
Xiaolong Jiang
Feng-Long Xie
Weidi Xie
VLM
87
29
0
27 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
139
23
0
27 Jun 2024
Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods
Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods
Filipe Gama
Matej Misar
Lukas Navara
S. T. Popescu
Matej Hoffmann
3DH
70
2
0
25 Jun 2024
GMT: Guided Mask Transformer for Leaf Instance Segmentation
GMT: Guided Mask Transformer for Leaf Instance Segmentation
Feng Chen
Sotirios A. Tsaftaris
M. Giuffrida
49
1
0
24 Jun 2024
Previous
123...8910...121314
Next