Title
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings Florian Vahl Jörn Griepenburg Jan Gutsche Jasper Güldenstein Jianwei Zhang VGen 39 0 0 29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models Sicheng Mo Thao Nguyen Xun Huang Siddharth Srinivasan Iyer Yijun Li ... Eli Shechtman Krishna Kumar Singh Yong Jae Lee Bolei Zhou Yuheng Li 71 0 0 29 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation Jingfeng Guo J. Chen Weikai Chen Zhenyu Sun Lanjiong Li Baozhu Zhao Lingting Zhu X. Wang Qi Liu 3DH 80 0 0 29 Apr 2025
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Zechuan Zhang Ji Xie Yu Lu Zongxin Yang Y. Yang DiffM 89 1 0 29 Apr 2025
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting Hanxi Liu Yifang Men Zhouhui Lian 3DGS 33 0 0 29 Apr 2025
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics Andre Schreiber Katherine Rose Driggs-Campbell 63 0 0 28 Apr 2025
CompleteMe: Reference-based Human Image Completion Yu-Ju Tsai Brian L. Price Qing Liu Luis Figueroa D. Pakhomov Zhihong Ding Scott D. Cohen Ming Yang 3DH 47 0 0 28 Apr 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Chia-Yu Hung Qi Sun Pengfei Hong Amir Zadeh Chuan Li U-Xuan Tan Navonil Majumder Soujanya Poria LM&Ro 37 1 0 28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video Sonia Joseph Praneet Suresh Lorenz Hufe Edward Stevinson Robert Graham Yash Vadi Danilo Bzdok Sebastian Lapuschkin Lee Sharkey Blake A. Richards 72 0 0 28 Apr 2025
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping Feng Chen Ilias Stogiannidis Andrew Wood Danilo Bueno Dominic Williams ... Stephen A. Rolfe Tracy Lawson Tony Pridmore M. Giuffrida Sotirios A. Tsaftaris 62 0 0 28 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation Zhe Dong Yuzhe Sun Tianzhu Liu Wangmeng Zuo Yanfeng Gu 48 0 0 28 Apr 2025
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation Victoria Yue Chen Daoye Wang Stephan Garbin Jan Bednarík Sebastian Winberg Timo Bolkart Thabo Beeler 3DH 3DPC 34 0 0 28 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation Xueqi Ma Y. Liu Tianlong Gao Q. Huang Hui Huang 3DV AI4CE 37 0 0 27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis Alexander Baumann Leonardo Ayala S. Jan Sellner Alexander Studier-Fischer Berkin Özdemir Lena Maier-Hein Slobodan Ilic 51 0 0 27 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System Xiaofeng Jin Matteo Frosi Matteo Matteucci 62 0 0 27 Apr 2025
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos Rezowan Shuvo M S Mekala Eyad Elyan MedIm 48 0 0 26 Apr 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation Shivam Duggal Yushi Hu Oscar Michel Aniruddha Kembhavi William T. Freeman Noah A. Smith Ranjay Krishna Antonio Torralba Ali Farhadi Wei-Chiu Ma EGVM ELM 70 0 0 25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning Jiahao Zhang Bowen Wang Hong Liu Liangzhi Li Yuta Nakashima Hajime Nagahara VLM 99 0 0 25 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology Elena Plekhanova Damien Robert Johannes Dollinger Emilia Arens Philipp Brun Jan Dirk Wegner Niklaus Zimmermann 19 0 0 25 Apr 2025
What is the Added Value of UDA in the VFM Era? B. B. Englert Tommie Kerssies Gijs Dubbelman 37 0 0 25 Apr 2025
Improving Open-World Object Localization by Discovering Background Ashish Singh Michael J. Jones Kuan-Chuan Peng A. Cherian Moitreya Chatterjee Erik Learned-Miller ObjD OCL VLM 64 0 0 24 Apr 2025
Lessons from Deploying Learning-based CSI Localization on a Large-Scale ISAC Platform Tianyu Zhang Dongheng Zhang Ruixu Geng Xuecheng Xie Shuai Yang Yan Chen 34 0 0 24 Apr 2025
The Fourth Monocular Depth Estimation Challenge Anton Obukhov Matteo Poggi Fabio Tosi Ripudaman Singh Arora Jaime Spencer ... Tuan-Anh Yang Minh-Quang Nguyen T. Tran Albert Luginov Muhammad Shahzad MDE 46 0 0 24 Apr 2025
Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding Mingxuan Wu Huang Huang Justin Kerr C. Kim Anthony Zhang Brent Yi Angjoo Kanazawa 48 0 0 24 Apr 2025
Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space Michael J. Smith Luke Fleming James E. Geach Ryan J. Roberts Freddie Kalaitzis James Banister 24 0 0 24 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Z. Wang Senthil Purushwalkam Caiming Xiong S. Heng Ji R. Xu 38 0 0 23 Apr 2025
V $^2$ R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations Zhiyuan Fan Yumeng Wang Sandeep Polisetty Yi Ren Fung 43 0 0 23 Apr 2025
ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Ying Li Xiaobao Wei Xiaowei Chi Y. K. Li Zhongyu Zhao Hao Wang Ningning MA Ming Lu Shanghang Zhang VGen 39 0 0 23 Apr 2025
SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos Yuxin Yao Yan Zhang Zhening Huang Joan Lasenby 3DGS 19 0 0 22 Apr 2025
DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining Wei Zhuo Zhiyue Tang Wufeng Xue Hao Ding Linlin Shen 25 0 0 22 Apr 2025
AffordanceSAM: Segment Anything Once More in Affordance Grounding D. Jiang Mengmeng Wang Teli Ma H. Li Y. Liu Guang Dai L. Zhang 32 0 0 22 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis Theodoros Kouzelis Efstathios Karypidis Ioannis Kakogeorgiou Spyros Gidaris N. Komodakis DiffM 26 0 0 22 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation Zebin Yao Lei Ren Huixing Jiang Chen Wei Xiaojie Wang Ruifan Li Fangxiang Feng DiffM 69 0 0 22 Apr 2025
MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation Xingxing Zuo Nikhil Ranganathan Connor T. Lee Georgia Gkioxari Soon-Jo Chung VLM 51 1 0 21 Apr 2025
Insert Anything: Image Insertion via In-Context Editing in DiT Wensong Song Hong Jiang Zongxing Yang Ruijie Quan Yi Yang DiffM 40 0 0 21 Apr 2025
Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation X. Zhang Lu Zou Tao Lu Yuan Yao Zhangjin Huang Guoping Wang 3DPC 28 0 0 21 Apr 2025
Context Aware Grounded Teacher for Source Free Object Detection Tajamul Ashraf Rajes Manna Partha Sarathi Purkayastha Tavaheed Tariq Janibul Bashir 25 0 0 21 Apr 2025
NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results Zheng Chen J. Wang Kai Liu Jue Gong Lei Sun ... J. Lee C. Lee Chih-Chung Hsu Hu Peng Chunming He 56 2 0 20 Apr 2025
Vision-Centric Representation-Efficient Fine-Tuning for Robust Universal Foreground Segmentation Guoyi Zhang Siyang Chen Guangsheng Xu Han Wang Xiaohu Zhang 29 0 0 20 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark Enxin Song Wenhao Chai Weili Xu Jianwen Xie Yuxuan Liu Gaoang Wang 57 0 0 20 Apr 2025
Seurat: From Moving Points to Depth Seokju Cho Jiahui Huang S. Kim Joon-Young Lee 3DPC MDE 29 0 0 20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D Sergio Arnaud Paul Mcvay Ada Martin Arjun Majumdar Krishna Murthy Jatavallabhula ... Nicolas Ballas Mido Assran Oleksandr Maksymets Aravind Rajeswaran Franziska Meier 3DPC 41 0 0 19 Apr 2025
Exploring Modality Guidance to Enhance VFM-based Feature Fusion for UDA in 3D Semantic Segmentation Johannes Spoecklberger W. Lin Pedro Hermosilla Sivan Doveh Horst Possegger M. Jehanzeb Mirza 17 0 0 19 Apr 2025
Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis Zichuan Liu Liming Jiang Qing Yan Yumin Jia Hao Kang Xin Lu DiffM 29 0 0 19 Apr 2025
Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction Wenyu Li Sidun Liu Peng Qiao Yong Dou 25 0 0 18 Apr 2025
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models Haiwen Huang Anpei Chen Volodymyr Havrylov Andreas Geiger Dan Zhang 27 1 0 18 Apr 2025
Entropic Time Schedulers for Generative Diffusion Models Dejan Stancevic Luca Ambrogioni DiffM OOD 38 0 0 18 Apr 2025
BeetleVerse: A study on taxonomic classification of ground beetles S M Rayeed Alyson East Samuel Stevens Sydne Record Charles V. Stewart 21 0 0 18 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Yang Yue Yulin Wang Chenxin Tao Pan Liu Shiji Song Gao Huang MedIm 24 0 0 18 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance Yang Yue Yulin Wang Haojun Jiang Pan Liu S. Song Gao Huang VGen 27 0 0 17 Apr 2025