Interactive Segmentation and Report Generation for CT ImagesInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Yannian Gu

Wenhui Lei

Hanyu Chen

Xiaofan Zhang

Shanghang Zhang

210

05 Mar 2025

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

296

04 Mar 2025

One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization

198

04 Mar 2025

Boltzmann Attention Sampling for Image Analysis with Small ObjectsComputer Vision and Pattern Recognition (CVPR), 2025

447

04 Mar 2025

Tracking-Aware Deformation Field Estimation for Non-rigid 3D Reconstruction in Robotic Surgeries

292

04 Mar 2025

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

362

04 Mar 2025

WeGen: A Unified Model for Interactive Multimodal Generation as We ChatComputer Vision and Pattern Recognition (CVPR), 2025

413

03 Mar 2025

Autonomous Dissection in Robotic Cholecystectomy

174

01 Mar 2025

Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups

352

01 Mar 2025

The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour RecognitionComputer Vision and Pattern Recognition (CVPR), 2025

...

317

28 Feb 2025

Revisiting the Evaluation Bias Introduced by Frame Sampling Strategies in Surgical Video Segmentation Using SAM2

220

28 Feb 2025

MITracker: Multi-View Integration for Visual Object TrackingComputer Vision and Pattern Recognition (CVPR), 2025

...

278

27 Feb 2025

Best Foot Forward: Robust Foot Reconstruction in-the-wild

327

27 Feb 2025

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

387

27 Feb 2025

Vector-Quantized Vision Foundation Models for Object-Centric Learning

1.2K

27 Feb 2025

Deep learning approaches to surgical video segmentation and object detection: A Scoping Review

174

23 Feb 2025

CAST: Component-Aligned 3D Scene Reconstruction from an RGB ImageACM Transactions on Graphics (TOG), 2025

421

18 Feb 2025

SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and TrackingIEEE International Conference on Robotics and Automation (ICRA), 2025

Septimiu E. Salcudean

287

17 Feb 2025

Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review

474

16 Feb 2025

Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos

401

14 Feb 2025

Bilevel Learning for Bilevel Planning

627

12 Feb 2025

ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor PolicyIEEE Robotics and Automation Letters (IEEE RA-L), 2025

...

323

11 Feb 2025

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

427

10 Feb 2025

Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform

527

09 Feb 2025

PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?

Mennatullah Siam

VLM

773

06 Feb 2025

No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal trackingIEEE International Symposium on Biomedical Imaging (ISBI), 2025

328

06 Feb 2025

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

589

06 Feb 2025

DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

304

06 Feb 2025

Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

403

05 Feb 2025

Particle Trajectory Representation Learning with Masked Point Modeling

344

04 Feb 2025

Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models

421

03 Feb 2025

Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identificationIEEE Transactions on Image Processing (IEEE TIP), 2025

419

28 Jan 2025

Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors

Zhiyuan Lu

Hao Lu

Hua Huang

937

28 Jan 2025

MADation: Face Morphing Attack Detection with Foundation Models

320

28 Jan 2025

Objects matter: object-centric world models improve reinforcement learning in visually complex environments

157

27 Jan 2025

MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation

615

23 Jan 2025

Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos

1.3K

23 Jan 2025

DynamicEarth: How Far are We from Open-Vocabulary Change Detection?

322

22 Jan 2025

Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples

L. Allison Jones-Farmer

322

22 Jan 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

...

558

121

21 Jan 2025

Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation

298

17 Jan 2025

The Devil is in Temporal Token: High Quality Video Reasoning SegmentationComputer Vision and Pattern Recognition (CVPR), 2025

248

15 Jan 2025

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token MarksComputer Vision and Pattern Recognition (CVPR), 2025

Subhashree Radhakrishnan

530

14 Jan 2025

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025

207

13 Jan 2025

EdgeTAM: On-Device Track Anything ModelComputer Vision and Pattern Recognition (CVPR), 2025

...

Raghuraman Krishnamoorthi

313

13 Jan 2025

Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation LearningIEEE International Conference on Robotics and Automation (ICRA), 2025

310

13 Jan 2025

Static Segmentation by Tracking: A Label-Efficient Approach for Fine-Grained Specimen Image Segmentation

...

292

12 Jan 2025

Zero-shot Shark Tracking and Biometrics from Aerial ImageryMethods in Ecology and Evolution (MEE), 2025

129

10 Jan 2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

...

612

07 Jan 2025

Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT

Wen-Dong Jiang

Chih-Yung Chang

Diptendu Sinha Roy

532

07 Jan 2025