ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1405.0312
  4. Cited By
Microsoft COCO: Common Objects in Context

Microsoft COCO: Common Objects in Context

1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
    ObjD
ArXivPDFHTML

Papers citing "Microsoft COCO: Common Objects in Context"

50 / 652 papers shown
Title
CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery
CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery
Oluwatosin O. Alabi
K. Toe
Zijian Zhou
Charlie Budd
Nicholas Raison
Miaojing Shi
Tom Vercauteren
ISeg
76
1
0
23 Jun 2024
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Debeshee Das
Jie Zhang
Florian Tramèr
MIALM
116
32
1
23 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
109
1
0
21 Jun 2024
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?
HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?
Ivan Karpukhin
F. Shipilov
Andrey Savchenko
AI4TS
81
4
0
20 Jun 2024
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Hengyi Wang
Haizhou Shi
Shiwei Tan
Weiyi Qin
Wenyuan Wang
Tunyu Zhang
A. Nambi
T. Ganu
Hao Wang
78
18
0
17 Jun 2024
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Alireza Ganjdanesh
Reza Shirkavand
Shangqian Gao
Heng Huang
DiffM
VLM
106
4
0
17 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
67
7
0
15 Jun 2024
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu
Zekun Li
Peipei Li
Shuhan Xia
Xing Cui
Linzhi Huang
Huaibo Huang
Weihong Deng
Zhaofeng He
84
19
0
13 Jun 2024
Towards Domain Adaptive Neural Contextual Bandits
Towards Domain Adaptive Neural Contextual Bandits
Ziyan Wang
Hao Wang
Hao Wang
107
0
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
91
3
0
13 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
83
2
0
12 Jun 2024
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang
Qifan Zhang
Yu-Wei Chao
Bowen Wen
Xiaohu Guo
Yu Xiang
3DH
88
2
0
10 Jun 2024
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
A DeNoising FPN With Transformer R-CNN for Tiny Object Detection
Hou-I Liu
Yu-Wen Tseng
Kai-Cheng Chang
Pin-Jyun Wang
Hong-Han Shuai
Wen-Huang Cheng
ViT
ObjD
94
25
0
09 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
78
13
0
08 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
97
34
0
07 Jun 2024
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina
Adriana Valentina Costache
Adrian-Gabriel Chifu
Josiane Mothe
Radu Tudor Ionescu
VLM
100
1
0
07 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
98
11
0
07 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
167
16
0
06 Jun 2024
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Erik Landolsi
Fredrik Kahl
DiffM
76
1
0
05 Jun 2024
Scaling White-Box Transformers for Vision
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
60
9
0
30 May 2024
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour
Victor Besnier
Vicky Kalogeiton
David Picard
DiffM
94
2
0
30 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
94
5
0
29 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
94
4
0
28 May 2024
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang
Yukun Zhou
Mingjia Shi
Zhihang Yuan
Yuzhang Shang
Yuzhang Shang
Hanwang Zhang
Hanwang Zhang
Yang You
92
13
0
27 May 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
80
5
0
27 May 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang
Jun Hao Liew
Hanshu Yan
Yuyang Yin
Yao Zhao
Yunchao Wei
Yunchao Wei
DiffM
116
7
0
27 May 2024
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
Harry Zhang
Luca Carlone
3DH
149
1
0
27 May 2024
Unsupervised Meta-Learning via In-Context Learning
Unsupervised Meta-Learning via In-Context Learning
Anna Vettoruzzo
Lorenzo Braccaioli
Joaquin Vanschoren
M. Nowaczyk
SSL
87
0
0
25 May 2024
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Weize Li
Zhicheng Zhao
Haochen Bai
Fei Su
66
0
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLM
DiffM
89
19
0
24 May 2024
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Hengtao Shen
MLLM
65
10
0
24 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
104
35
0
24 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu
Lingzhi Zhang
Jianbo Shi
84
13
0
23 May 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
74
0
0
22 May 2024
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Ahnaf Mozib Samin
M. F. Ahmed
Md. Mushtaq Shahriyar Rafee
VLM
72
2
0
19 May 2024
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
Zhaoxu Li
Wei An
Gaowei Guo
Longguang Wang
Yingqian Wang
Zaiping Lin
ViT
95
0
0
16 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
113
290
0
16 May 2024
UDA4Inst: Unsupervised Domain Adaptation for Instance Segmentation
UDA4Inst: Unsupervised Domain Adaptation for Instance Segmentation
Yachan Guo
Yi Xiao
Danna Xue
Jose Luis Gomez Zurita
Antonio M. López
106
0
0
15 May 2024
Wild Berry image dataset collected in Finnish forests and peatlands using drones
Wild Berry image dataset collected in Finnish forests and peatlands using drones
Luigi Riz
Sergio Povoli
Andrea Caraffa
Davide Boscaini
M. L. Mekhalfi
...
Elisa Castelli
Giacomo Piccinini
L. Marchesotti
Micael S. Couceiro
Fabio Poiesi
65
1
0
13 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
101
16
0
08 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
90
23
0
30 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
125
167
0
29 Apr 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
102
4
0
29 Apr 2024
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman
Noam Rotstein
Roy Ganz
Ron Kimmel
DiffM
69
15
0
28 Apr 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
52
12
0
25 Apr 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
171
18
0
25 Apr 2024
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Abhishek Aich
Yumin Suh
S. Schulter
Manmohan Chandraker
98
0
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
123
50
0
23 Apr 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
Wencheng Zhu
Xin Zhou
Pengfei Zhu
Yu Wang
Qinghua Hu
VLM
92
1
0
22 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
67
8
0
16 Apr 2024
Previous
123...10111213149
Next