ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.06679
  4. Cited By
AltCLIP: Altering the Language Encoder in CLIP for Extended Language
  Capabilities

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

12 November 2022
Zhongzhi Chen
Guangyi Liu
Bo-Wen Zhang
Fulong Ye
Qinghong Yang
Ledell Yu Wu
    VLM
ArXivPDFHTML

Papers citing "AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities"

50 / 61 papers shown
Title
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment
Wanjiang Weng
Xiaofeng Tan
Hongsong Wang
Pan Zhou
VGen
44
0
0
08 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
15
0
0
05 May 2025
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Junlong Ren
Hao Wang
36
0
0
02 Apr 2025
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation
Thomas Pickard
Aline Villavicencio
Maggie Mi
Wei He
Dylan Phelps
Carolina Scarton
75
1
0
19 Mar 2025
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
...
Qi Zhang
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
DiffM
VLM
55
6
0
10 Mar 2025
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages
  with Negligible Cost
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
Sen Xing
Muyan Zhong
Zeqiang Lai
Liangchen Li
J. Liu
Yaohui Wang
Jifeng Dai
Wenhai Wang
70
1
0
02 Dec 2024
What If the Input is Expanded in OOD Detection?
What If the Input is Expanded in OOD Detection?
Boxuan Zhang
Jianing Zhu
Zengmao Wang
Tongliang Liu
Bo Du
Bo Han
AAML
OODD
19
0
0
24 Oct 2024
TriplePlay: Enhancing Federated Learning with CLIP for Non-IID Data and
  Resource Efficiency
TriplePlay: Enhancing Federated Learning with CLIP for Non-IID Data and Resource Efficiency
Ahmed Imteaj
Md Zarif Hossain
Saika Zaman
Abdur R. Shahid
VLM
19
1
0
09 Sep 2024
Navigating Text-to-Image Generative Bias across Indic Languages
Navigating Text-to-Image Generative Bias across Indic Languages
S. Mittal
Arnav Sudan
Mayank Vatsa
Richa Singh
Tamar Glaser
Tal Hassner
EGVM
31
1
0
01 Aug 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language
  Models
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
26
1
0
30 Jul 2024
Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal
  Models: An Empirical Analysis
Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis
Cristian-Alexandru Botocan
Raphael Meier
Ljiljana Dolamic
AAML
19
0
0
25 Jul 2024
Assessing Brittleness of Image-Text Retrieval Benchmarks from
  Vision-Language Models Perspective
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
38
0
0
21 Jul 2024
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal
  Models
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
Shaeke Salman
M. Shams
Xiuwen Liu
27
1
0
01 Jul 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and
  Lexical Alterations
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
30
8
0
17 Jun 2024
Visual-Text Cross Alignment: Refining the Similarity Score in
  Vision-Language Models
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li
Haopeng Li
S. Erfani
Lei Feng
James Bailey
Feng Liu
VLM
27
3
0
05 Jun 2024
Envisioning Outlier Exposure by Large Language Models for
  Out-of-Distribution Detection
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
Chentao Cao
Zhun Zhong
Zhanke Zhou
Yang Liu
Tongliang Liu
Bo Han
OODD
19
10
0
02 Jun 2024
Multilingual Diversity Improves Vision-Language Representations
Multilingual Diversity Improves Vision-Language Representations
Thao Nguyen
Matthew Wallingford
Sebastin Santy
Wei-Chiu Ma
Sewoong Oh
Ludwig Schmidt
Pang Wei Koh
Ranjay Krishna
VLM
23
5
0
27 May 2024
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Ahnaf Mozib Samin
M. F. Ahmed
Md. Mushtaq Shahriyar Rafee
VLM
22
2
0
19 May 2024
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and
  Lexical Alterations
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
VLM
CoGe
35
0
0
25 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based
  Visual Question Answering
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
35
1
0
19 Apr 2024
A Progressive Framework of Vision-language Knowledge Distillation and
  Alignment for Multilingual Scene
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
30
1
0
17 Apr 2024
Lightweight Unsupervised Federated Learning with Pretrained Vision
  Language Model
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
Hao Yan
Yuhong Guo
VLM
FedML
24
2
0
17 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
24
18
0
30 Mar 2024
Negative Label Guided OOD Detection with Pretrained Vision-Language
  Models
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
Xue Jiang
Feng Liu
Zhengfeng Fang
Hong Chen
Tongliang Liu
Feng Zheng
Bo Han
VLM
42
28
0
29 Mar 2024
Lost in Translation? Translation Errors and Challenges for Fair
  Assessment of Text-to-Image Models on Multilingual Concepts
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Michael Stephen Saxon
Yiran Luo
Sharon Levy
Chitta Baral
Yezhou Yang
William Yang Wang
EGVM
25
3
0
17 Mar 2024
OVEL: Large Language Model as Memory Manager for Online Video Entity
  Linking
OVEL: Large Language Model as Memory Manager for Online Video Entity Linking
Haiquan Zhao
Xuwu Wang
Shisong Chen
Zhixu Li
Xin Zheng
Yanghua Xiao
KELM
VLM
18
1
0
03 Mar 2024
Clarify: Improving Model Robustness With Natural Language Corrections
Clarify: Improving Model Robustness With Natural Language Corrections
Yoonho Lee
Michelle S. Lam
Helena Vasconcelos
Michael S. Bernstein
Chelsea Finn
18
6
0
06 Feb 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based
  Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
25
2
0
31 Jan 2024
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale
  Efficient Pretraining
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Qingpei Guo
Furong Xu
Hanxiao Zhang
Wang Ren
Ziping Ma
Lin Ju
Jian Wang
Jingdong Chen
Ming Yang
VLM
MLLM
25
2
0
29 Jan 2024
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with
  Large Vision-Language Model Support
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Xiaojun Wu
Di Zhang
Ruyi Gan
Junyu Lu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
VLM
21
5
0
26 Jan 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short
  Video Search Scenarios
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
18
0
0
19 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
156
895
0
21 Dec 2023
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image
  Diffusion Model for Interior Design
iDesigner: A High-Resolution and Complex-Prompt Following Text-to-Image Diffusion Model for Interior Design
Ruyi Gan
Xiaojun Wu
Junyu Lu
Yuanhe Tian
Di Zhang
...
Renliang Sun
Chang Liu
Jiaxing Zhang
Pingjian Zhang
Yan Song
44
4
0
07 Dec 2023
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation
  in non-English Text-to-Image Generation
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jiancang Ma
Chen Chen
Qingsong Xie
H. Lu
DiffM
VLM
20
3
0
28 Nov 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
46
3
0
22 Oct 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP
  Performance on Low-Resource Languages
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
27
4
0
20 Oct 2023
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of
  Text-To-Image Models
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models
Mor Ventura
Eyal Ben-David
Anna Korhonen
Roi Reichart
21
12
0
03 Oct 2023
GeRA: Label-Efficient Geometrically Regularized Alignment
GeRA: Label-Efficient Geometrically Regularized Alignment
Dustin Klebe
Tal Shnitzer
Mikhail Yurochkin
Leonid Karlinsky
Justin Solomon
11
1
0
01 Oct 2023
Improving Multimodal Classification of Social Media Posts by Leveraging
  Image-Text Auxiliary Tasks
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
31
2
0
14 Sep 2023
PAI-Diffusion: Constructing and Serving a Family of Open Chinese
  Diffusion Models for Text-to-image Synthesis on the Cloud
PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud
Chengyu Wang
Zhongjie Duan
Bingyan Liu
Xinyi Zou
Cen Chen
Kui Jia
Jun Huang
DiffM
9
3
0
11 Sep 2023
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge
  Distillation at Multiple Levels
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels
Bo Wan
Tinne Tuytelaars
VLM
19
3
0
10 Sep 2023
Zero-Shot Robustification of Zero-Shot Models
Zero-Shot Robustification of Zero-Shot Models
Dyah Adila
Changho Shin
Lin Cai
Frederic Sala
29
18
0
08 Sep 2023
NLLB-CLIP -- train performant multilingual image retrieval model on a
  budget
NLLB-CLIP -- train performant multilingual image retrieval model on a budget
Alexander Visheratin
VLM
19
17
0
04 Sep 2023
Bridge Diffusion Model: bridge non-English language-native text-to-image
  diffusion model with English communities
Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities
Shanyuan Liu
Dawei Leng
Yuhui Yin
DiffM
13
7
0
02 Sep 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
24
48
0
23 Aug 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
27
25
0
19 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Foundation Model is Efficient Multimodal Multitask Model Selector
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
22
13
0
11 Aug 2023
On the Cultural Gap in Text-to-Image Generation
On the Cultural Gap in Text-to-Image Generation
Bingshuai Liu
Longyue Wang
Chenyang Lyu
Yong Zhang
Jinsong Su
Shuming Shi
Zhaopeng Tu
VLM
EGVM
18
6
0
06 Jul 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language
  Representations
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavas
VLM
MLLM
13
5
0
14 Jun 2023
Multilingual Conceptual Coverage in Text-to-Image Models
Multilingual Conceptual Coverage in Text-to-Image Models
Michael Stephen Saxon
William Yang Wang
EGVM
21
8
0
02 Jun 2023
12
Next