ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08275
  4. Cited By
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

14 May 2023
Le Xue
Ning Yu
Shu Zhen Zhang
Artemis Panagopoulou
Junnan Li
Roberto Martín-Martín
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
ArXivPDFHTML

Papers citing "ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding"

50 / 99 papers shown
Title
TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment
TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment
Z. Wang
Yang Zhou
Jinhai Xiang
Y. Wang
Xinwei He
VLM
37
0
0
05 May 2025
Digital Twin Generation from Visual Data: A Survey
Digital Twin Generation from Visual Data: A Survey
Andrew Melnik
Benjamin Alt
Giang Hoang Nguyen
Artur Wilkowski
Maciej Stefańczyk
Qirui Wu
Sinan Harms
Helge Rhodin
Manolis Savva
Michael Beetz
3DGS
VGen
41
0
0
17 Apr 2025
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
Wei Sun
Yanzhao Zhou
Jianbin Jiao
Yuan Li
3DGS
41
0
0
16 Apr 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
32
0
0
11 Apr 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
75
3
0
28 Mar 2025
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
Xiaomeng Chu
Jiajun Deng
Guoliang You
Wei Liu
X. Li
Jianmin Ji
Y. Zhang
77
0
0
20 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
36
0
0
17 Mar 2025
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Ming Cheng
Y. Wang
Deying Li
Chenhui Gou
Jianfei Cai
3DPC
87
0
0
15 Mar 2025
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DV
LRM
59
1
0
08 Mar 2025
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi
Luca Moschella
Andrea Santilli
Diego Gomez
Qixing Huang
Emanuele Rodolà
Simone Melzi
M. Ovsjanikov
40
0
0
07 Mar 2025
X2CT-CLIP: Enable Multi-Abnormality Detection in Computed Tomography from Chest Radiography via Tri-Modal Contrastive Learning
Jianzhong You
Yuan Gao
Sangwook Kim
Chris McIntosh
64
1
0
04 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
J. Chen
Jianke Zhu
3DV
LRM
71
3
0
01 Mar 2025
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
Shaoming Li
Qing Cai
Songqi Kong
Runqing Tan
Heng Tong
Shiji Qiu
Yongguo Jiang
Z. Liu
3DV
3DPC
47
0
0
28 Feb 2025
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu
Xiang Deng
Qi Lv
Xiaoyang Chen
Yinchuan Li
Jianye Hao
Liqiang Nie
64
2
0
27 Feb 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Haoyuan Li
Yanpeng Zhou
Tao Tang
Jifei Song
Yihan Zeng
Michael C. Kampffmeyer
Hang Xu
Xiaodan Liang
3DGS
57
1
0
25 Feb 2025
CrossOver: 3D Scene Cross-Modal Alignment
CrossOver: 3D Scene Cross-Modal Alignment
S. Sarkar
O. Mikšík
Marc Pollefeys
Daniel Barath
Iro Armeni
3DPC
71
0
0
20 Feb 2025
Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
Jiaxiang Wang
Haote Xu
Xiaolu Chen
Haodi Xu
Yue Huang
Xinghao Ding
Xiaotong Tu
48
0
0
16 Feb 2025
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Khanh Nguyen
Ghulam Mubashar Hassan
Ajmal Saeed Mian
3DPC
42
0
0
15 Feb 2025
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Kohei Torimi
Ryosuke Yamada
Daichi Otsuka
Kensho Hara
Yuki M. Asano
Hirokatsu Kataoka
Y. Aoki
3DV
31
0
0
20 Jan 2025
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Y. Wang
Wang Chen
Kang Yang
Deying Li
Jianfei Cai
3DPC
67
3
0
17 Jan 2025
How Panel Layouts Define Manga: Insights from Visual Ablation
  Experiments
How Panel Layouts Define Manga: Insights from Visual Ablation Experiments
Siyuan Feng
Teruya Yoshinaga
Katsuhiko Hayashi
Koki Washio
Hidetaka Kamigaito
28
0
0
26 Dec 2024
Coordinate In and Value Out: Training Flow Transformers in Ambient Space
Coordinate In and Value Out: Training Flow Transformers in Ambient Space
Yuyang Wang
Anurag Ranjan
J. Susskind
Miguel Angel Bautista
3DPC
63
0
0
05 Dec 2024
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li
Y. Wu
Jiarui Meng
Qiankun Gao
Zhiyao Zhang
Ronggang Wang
Jian Andrew Zhang
ISeg
89
2
0
28 Nov 2024
Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion
Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion
Yan Chen
Di Huang
Zhichao Liao
Xi Cheng
Xinghui Li
Lone Zeng
3DPC
32
1
0
07 Sep 2024
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features
  from Multi-View Images
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
Xiaoshuai Zhang
Zhicheng Wang
Howard Zhou
Soham Ghosh
Danushen Gnanapragasam
Varun Jampani
Hao Su
Leonidas J. Guibas
DD
51
5
0
30 Aug 2024
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Chengzhuo Tong
Peng Gao
Chunyuan Li
Pheng-Ann Heng
VGen
3DPC
42
13
0
29 Aug 2024
Squid: Long Context as a New Modality for Energy-Efficient On-Device
  Language Models
Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Wei Chen
Zhiyuan Li
Shuo Xin
Yihao Wang
21
4
0
28 Aug 2024
DC3DO: Diffusion Classifier for 3D Objects
DC3DO: Diffusion Classifier for 3D Objects
Nursena Koprucu
Meher Shashwat Nigam
Shicheng Xu
Biruk Abere
Gabriele Dominici
Andrew Rodriguez
Sharvaree Vadgam
Berfin Inal
Alberto Tono
DiffM
23
0
0
13 Aug 2024
Multi-modal Relation Distillation for Unified 3D Representation Learning
Multi-modal Relation Distillation for Unified 3D Representation Learning
Huiqun Wang
Yiping Bao
Panwang Pan
Zeming Li
Xiao Liu
Ruijie Yang
Di Huang
45
0
0
19 Jul 2024
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
30
7
0
16 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
32
111
0
16 Jul 2024
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li
Zhengqiang Zhang
Chenhang He
Zhiyuan Ma
Vishal M. Patel
Lei Zhang
3DV
VLM
34
5
0
13 Jul 2024
Learning Robust 3D Representation from CLIP via Dual Denoising
Learning Robust 3D Representation from CLIP via Dual Denoising
Shuqing Luo
Bowen Qu
Wei-Nan Gao
37
1
0
01 Jul 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
36
3
0
17 Jun 2024
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary
  Understanding
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
Y. Wu
Jiarui Meng
Haijie Li
Chenming Wu
Yahao Shi
...
Chen Zhao
Haocheng Feng
Errui Ding
Jingdong Wang
Jian Andrew Zhang
3DGS
3DPC
31
28
0
04 Jun 2024
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
Sijin Chen
Xin Chen
Anqi Pang
Xianfang Zeng
Wei Cheng
...
C. Zhang
Jingyi Yu
Gang Yu
Bin-Bin Fu
Tao Chen
AI4CE
50
35
0
31 May 2024
CLAY: A Controllable Large-scale Generative Model for Creating
  High-quality 3D Assets
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
Longwen Zhang
Ziyu Wang
Qixuan Zhang
Qiwei Qiu
Anqi Pang
Haoran Jiang
Wei Yang
Lan Xu
Jingyi Yu
DiffM
AI4CE
VGen
26
113
0
30 May 2024
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D
  Vision-Language Understanding
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Junjie Fei
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
23
3
0
29 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction
  from Single Image
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
...
Taizhang Shang
Yang Li
Nick Barnes
Hongdong Li
Pan Ji
3DV
43
5
0
24 May 2024
A Survey On Text-to-3D Contents Generation In The Wild
A Survey On Text-to-3D Contents Generation In The Wild
Chenhan Jiang
37
5
0
15 May 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Zehan Wang
Ziang Zhang
Xize Cheng
Rongjie Huang
Luping Liu
...
Haifeng Huang
Yang Zhao
Tao Jin
Peng Gao
Zhou Zhao
18
8
0
08 May 2024
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language
  Models using 2D Priors
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Yixue Hao
Long Hu
Min Chen
24
14
0
02 May 2024
ESP-Zero: Unsupervised enhancement of zero-shot classification for
  Extremely Sparse Point cloud
ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud
Jiayi Han
Zidi Cao
Weibo Zheng
Xiangguo Zhou
Xiangjian He
Yuanfang Zhang
Daisen Wei
3DPC
39
0
0
30 Apr 2024
PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot
  Multi-View 3D Shape Recognition
PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
Dongyun Lin
Yi Cheng
Shangbo Mao
Aiyuan Guo
Yiqun Li
24
2
0
30 Apr 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
64
12
0
28 Apr 2024
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State
  Space Model
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model
Xu Han
Yuan Tang
Zhaoxuan Wang
Xianzhi Li
29
22
0
23 Apr 2024
3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
Junjie Zhang
Tianci Hu
Xiaoshui Huang
Yongshun Gong
Dan Zeng
25
1
0
23 Apr 2024
DreamView: Injecting View-specific Text Guidance into Text-to-3D
  Generation
DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Junkai Yan
Yipeng Gao
Q. Yang
Xihan Wei
Xuansong Xie
Ancong Wu
Wei-Shi Zheng
30
1
0
09 Apr 2024
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li
Jingbo Wang
Lixin Yang
Cewu Lu
Bo Dai
38
15
0
04 Apr 2024
12
Next