ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11172
  4. Cited By
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

18 May 2023
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
    VLM
    MLLM
    ObjD
ArXivPDFHTML

Papers citing "ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities"

50 / 95 papers shown
Title
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Moseli Motsóehli
Hope Mogale
Kyungim Baek
35
0
0
07 May 2025
Harmony: A Unified Framework for Modality Incremental Learning
Harmony: A Unified Framework for Modality Incremental Learning
Y. Song
Xiaoshan Yang
D. Jiang
Yaowei Wang
Changsheng Xu
CLL
43
0
0
17 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
63
6
1
14 Apr 2025
Multimodal Reference Visual Grounding
Multimodal Reference Visual Grounding
Yangxiao Lu
Ruosen Li
Liqiang Jing
Jikai Wang
Xinya Du
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
ObjD
76
0
0
02 Apr 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
42
0
0
28 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
74
1
0
11 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
83
2
0
08 Mar 2025
A Shared Encoder Approach to Multimodal Representation Learning
Shuvendu Roy
Franklin Ogidi
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
36
0
0
03 Mar 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
91
3
0
12 Feb 2025
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala
Vasileios Tsouvalas
N. Meratnia
MoE
39
0
0
10 Feb 2025
Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap
Srivatsa Mallapragada
Ying Xie
Varsha Rani Chawan
Zeyad Hailat
Yuanbo Wang
36
0
0
28 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
70
2
0
10 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
44
3
0
31 Dec 2024
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
105
7
0
22 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
71
2
0
14 Dec 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
95
6
0
27 Nov 2024
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural
  Images
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images
Shen Li
Lei Jiang
Wei Wang
Hongwei Hu
Liang Li
65
0
0
20 Nov 2024
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language
  Models
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Jonathan Fhima
Elad Ben Avraham
Oren Nuriel
Yair Kittenplon
Roy Ganz
Aviad Aberdam
Ron Litman
VLM
26
1
0
07 Nov 2024
SynthSet: Generative Diffusion Model for Semantic Segmentation in
  Precision Agriculture
SynthSet: Generative Diffusion Model for Semantic Segmentation in Precision Agriculture
Andrew Heschl
Mauricio Murillo
Keyhan Najafian
F. Maleki
DiffM
20
0
0
05 Nov 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a
  resource-limited Context
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
15
0
0
23 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
23
3
0
16 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
18
5
0
10 Oct 2024
Segmenting Wood Rot using Computer Vision Models
Segmenting Wood Rot using Computer Vision Models
Roland Kammerbauer
Thomas H. Schmitt
Tobias Bocklet
16
1
0
30 Sep 2024
Neural Contrast: Leveraging Generative Editing for Graphic Design
  Recommendations
Neural Contrast: Leveraging Generative Editing for Graphic Design Recommendations
Marian Lupascu
Ionut Mironica
Mihai-Sorin Stupariu
DiffM
23
0
0
26 Sep 2024
BitQ: Tailoring Block Floating Point Precision for Improved DNN
  Efficiency on Resource-Constrained Devices
BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices
Yongqi Xu
Yujian Lee
Gao Yi
Bosheng Liu
Yucong Chen
Peng Liu
Jigang Wu
Xiaoming Chen
Yinhe Han
MQ
26
0
0
25 Sep 2024
KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation
  without Robot Data
KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data
Grace Tang
Swetha Rajkumar
Yifei Zhou
Homer Walke
Sergey Levine
Kuan Fang
LM&Ro
VLM
16
6
0
21 Sep 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
20
0
0
18 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
40
0
0
12 Sep 2024
Dissecting Temporal Understanding in Text-to-Audio Retrieval
Dissecting Temporal Understanding in Text-to-Audio Retrieval
Andreea-Maria Oncescu
João F. Henriques
A. Sophia Koepke
17
2
0
01 Sep 2024
A Survey and Evaluation of Adversarial Attacks for Object Detection
A Survey and Evaluation of Adversarial Attacks for Object Detection
Khoi Nguyen Tiet Nguyen
Wenyu Zhang
Kangkang Lu
Yuhuan Wu
Xingjian Zheng
Hui Li Tan
Liangli Zhen
AAML
24
0
0
04 Aug 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
31
2
0
10 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
51
23
0
28 Jun 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
33
8
0
24 Jun 2024
Enhancing Domain Adaptation through Prompt Gradient Alignment
Enhancing Domain Adaptation through Prompt Gradient Alignment
Hoang Phan
Lam C. Tran
Quyen Tran
Trung Le
45
0
0
13 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical
  Representation Learning
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
35
1
0
11 Jun 2024
Bridging Language Gaps in Audio-Text Retrieval
Bridging Language Gaps in Audio-Text Retrieval
Zhiyong Yan
Heinrich Dinkel
Yongqing Wang
Jizhong Liu
Junbo Zhang
Yujun Wang
Bin Wang
VLM
27
4
0
11 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo
  Labels for PVUW2024
Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024
Biao Wu
Diankai Zhang
Sihan Gao
Cheng-yong Zheng
Shaoli Liu
Ning Wang
30
0
0
02 Jun 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
22
1
0
21 May 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Zehan Wang
Ziang Zhang
Xize Cheng
Rongjie Huang
Luping Liu
...
Haifeng Huang
Yang Zhao
Tao Jin
Peng Gao
Zhou Zhao
18
8
0
08 May 2024
A Generalization Theory of Cross-Modality Distillation with Contrastive
  Learning
A Generalization Theory of Cross-Modality Distillation with Contrastive Learning
Hangyu Lin
Chen Liu
Chengming Xu
Zhengqi Gao
Yanwei Fu
Yuan Yao
VLM
33
0
0
06 May 2024
Connecting NeRFs, Images, and Text
Connecting NeRFs, Images, and Text
Francesco Ballerini
Pierluigi Zama Ramirez
Roberto Mirabella
Samuele Salti
Luigi Di Stefano
29
4
0
11 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
19
2
0
04 Apr 2024
Dialogue with Robots: Proposals for Broadening Participation and
  Research in the SLIVAR Community
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Casey Kennington
Malihe Alikhani
Heather Pon-Barry
Katherine Atwell
Yonatan Bisk
...
Jivko Sinapov
Angela Stewart
Matthew Stone
Stefanie Tellex
Tom Williams
36
0
0
01 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
27
5
0
28 Mar 2024
LocCa: Visual Pretraining with Location-aware Captioners
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
35
5
0
28 Mar 2024
Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot
  Visual Question Answering
Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
Bowen Jiang
Zhijun Zhuang
Shreyas S. Shivakumar
Dan Roth
Camillo J. Taylor
LLMAG
26
2
0
21 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
57
2
0
21 Mar 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large
  Vision Language Models
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
36
9
0
20 Mar 2024
Multiscale Matching Driven by Cross-Modal Similarity Consistency for
  Audio-Text Retrieval
Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval
Qian Wang
Jia-Chen Gu
Zhen-Hua Ling
17
2
0
15 Mar 2024
12
Next