Microsoft COCO: Common Objects in Context

1 May 2014

Piotr Dollár

Papers citing "Microsoft COCO: Common Objects in Context"

50 / 652 papers shown

Title
CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation Shipeng Liu Liang Zhao Dengfeng Chen SSL 144 1 0 19 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 400 1 0 19 Nov 2024
SL-YOLO: A Stronger and Lighter Drone Target Detection Model Defan Chen Luchan Zhang ObjD 116 2 0 18 Nov 2024
Conceptwm: A Diffusion Model Watermark for Concept Protection Liangqi Lei Keke Gai Jing Yu Liehuang Zhu Qi Wu WIGM 122 2 0 18 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements M. Arda Aydın Efe Mert Çırpar Elvin Abdinli Gözde B. Ünal Y. Sahin VLM 177 1 0 18 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu Sophia Ananiadou 370 1 0 17 Nov 2024
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models Vipula Rawte Sarthak Jain Aarush Sinha Garv Kaushik Aman Bansal ... Aishwarya N. Reganti Vinija Jain Aman Chadha A. Sheth A. Das VLM MLLM 123 1 0 16 Nov 2024
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer Shitong Shao Zikai Zhou Tian Ye Lichen Bai Zhiqiang Xu Zeke Xie DiffM 67 0 0 16 Nov 2024
Spider: Any-to-Many Multimodal LLM Jinxiang Lai Jie Zhang Jun Liu Jian Li Xiaocheng Lu Song Guo MLLM 106 2 0 14 Nov 2024
Bayesian Comparisons Between Representations Heiko H. Schütt FAtt 411 0 0 13 Nov 2024
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data Chika Maduabuchi Ericmoore Jossou Matteo Bucci 52 0 0 12 Nov 2024
Diffusion Sampling Correction via Approximately 10 Parameters Guangyi Wang Wei Peng Lijiang Li Wenyu Chen Yuren Cai Songzhi Su DiffM 54 0 0 10 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner Feiyang Huang 63 0 0 09 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Zhaofeng Wu Xinyan Velocity Yu Dani Yogatama Jiasen Lu Yoon Kim AIFin 68 17 0 07 Nov 2024
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data Aitor Martinez-Seras Javier Del Ser Alain Andres Pablo García Bringas Pablo Garcia-Bringas OODD 63 0 0 07 Nov 2024
VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models Ming Cheng Jiaying Gong Chenhan Yuan William A. Ingram Edward A. Fox Hoda Eldardiry 136 1 0 07 Nov 2024
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization Huayang Huang Yu Wu Qian Wang DiffM WIGM 69 7 0 06 Nov 2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination D. Song Sicheng Lai Shunian Chen Lichao Sun Benyou Wang 360 0 0 06 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models Tariq Berrada Ifriqi Pietro Astolfi Melissa Hall Reyhane Askari Hemmat Yohann Benchetrit ... Matthew Muckley Karteek Alahari Adriana Romero Soriano Jakob Verbeek M. Drozdzal AI4CE VLM 99 3 0 05 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Sheng-Chieh Lin Chankyu Lee Mohammad Shoeybi Jimmy J. Lin Bryan Catanzaro Ming-Yu Liu 150 15 0 04 Nov 2024
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models Sejoon Oh Yiqiao Jin Megha Sharma Donghyun Kim Eric Ma Gaurav Verma Srijan Kumar 83 6 0 03 Nov 2024
A Geometric Framework for Understanding Memorization in Generative Models Brendan Leigh Ross Hamidreza Kamkari Tongzi Wu Rasa Hosseinzadeh Zhaoyan Liu George Stein Jesse C. Cresswell Gabriel Loaiza-Ganem 88 7 0 31 Oct 2024
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving Maciej K. Wozniak Hariprasath Govindarajan Marvin Klingner Camille Maurice B Ravi Kiran S. Yogamani 3DPC 107 1 0 30 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization Lior Dikstein Ariel Lapid Arnon Netzer H. Habi MQ 381 0 0 29 Oct 2024
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices Ming Kang F. F. Ting Raphaël C.-W. Phan C. Ting ViT MedIm 133 1 0 29 Oct 2024
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models Hang Guo Yawei Li Tao Dai Shu-Tao Xia Luca Benini MQ 51 2 0 29 Oct 2024
TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors Adonisz Dimitriu Tamás Michaletzky Viktor Remeli AAML 367 0 0 28 Oct 2024
Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models! Arash Marioriyad Mohammadali Banayeeanzade Reza Abbasi M. Rohban M. Baghshah DiffM 94 3 0 28 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information Junjie Li Jianghong Ma Xiaofeng Zhang Yuhang Li Jianyang Shi 81 1 0 26 Oct 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? Antonia Wüst Tim Nelson Tobiasch Lukas Helff Inga Ibs Wolfgang Stammer Devendra Singh Dhami Constantin Rothkopf Kristian Kersting CoGe ReLM VLM LRM 115 1 0 25 Oct 2024
Probabilistic Language-Image Pre-Training Sanghyuk Chun Wonjae Kim Song Park Sangdoo Yun MLLM VLM CLIP 362 4 2 24 Oct 2024
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Shilin Lu Zihan Zhou Jiayou Lu Yuanzhi Zhu A. Kong WIGM 111 13 0 24 Oct 2024
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning Yaxiong Wang Yijiao Wang Lianwei Wu Lechao Cheng Zhun Zhong Meng Wang VLM 50 0 0 23 Oct 2024
CLEAR: Character Unlearning in Textual and Visual Modalities Alexey Dontsov Dmitrii Korzh Alexey Zhavoronkin Boris Mikheev Denis Bobkov Aibek Alanov Oleg Y. Rogov Ivan Oseledets Elena Tutubalina MU AILaw VLM 93 5 0 23 Oct 2024
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary Hao-Tang Tsui Chien-Yao Wang H. Liao ObjD VLM 87 0 0 20 Oct 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion Chaodong Xiao Minghan Li Zhengqiang Zhang Deyu Meng Lei Zhang Mamba 113 5 0 19 Oct 2024
Truncated Consistency Models Sangyun Lee Yilun Xu Tomas Geffner Giulia Fanti Karsten Kreis Arash Vahdat Weili Nie 102 3 0 18 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Baiqi Li Zhiqiu Lin Wenxuan Peng Jean de Dieu Nyandwi Daniel Jiang Zixian Ma Simran Khanuja Ranjay Krishna Graham Neubig Deva Ramanan AAML CoGe VLM 123 27 0 18 Oct 2024
An Online Learning Approach to Prompt-based Selection of Generative Models Xiaoyan Hu Ho-fung Leung Farzan Farnia 139 3 0 17 Oct 2024
Artificial Kuramoto Oscillatory Neurons Takeru Miyato Sindy Löwe Andreas Geiger Max Welling AI4CE 138 7 0 17 Oct 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation Jaehong Yoon Shoubin Yu Vaidehi Patil Huaxiu Yao Joey Tianyi Zhou 97 21 0 16 Oct 2024
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models Shicheng Xu Liang Pang Yunchang Zhu Huawei Shen Xueqi Cheng MLLM 64 1 0 16 Oct 2024
LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty Joey Wilson Ruihan Xu Yile Sun Parker Ewen Minghan Zhu Kira Barton Maani Ghaffari 64 0 0 15 Oct 2024
InvSeg: Test-Time Prompt Inversion for Semantic Segmentation Jiayi Lin Jiabo Huang Jian Hu S. Gong DiffM VLM 76 0 0 15 Oct 2024
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations Naveen Gupta Medha Sawhney Arka Daw Youzuo Lin Anuj Karpatne MedIm AI4CE 63 3 0 15 Oct 2024
Fractal Calibration for long-tailed object detection Konstantinos Panagiotis Alexandridis Ismail Elezi Jiankang Deng Anh H. Nguyen Shan Luo 367 0 0 15 Oct 2024
Browsing without Third-Party Cookies: What Do You See? Maxwell Lin Shihan Lin Helen Wu Karen Wang Xiaowei Yang BDL 140 10 0 14 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault AAML 82 0 0 14 Oct 2024
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization Jiawei Li Fanrui Zhang Jiaying Zhu Esther Sun Qiang Zhang Zheng-jun Zha MLLM 86 12 0 14 Oct 2024
Locality Alignment Improves Vision-Language Models Ian Covert Tony Sun James Zou Tatsunori Hashimoto VLM 145 5 0 14 Oct 2024