Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words

10 May 2022

Dan Jurafsky

Papers citing "Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words"

36 / 36 papers shown

Title
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers Alice Rueda Mohammed S. Hassan Argyrios Perivolaris Bazen G. Teferra Reza Samavi ... Y. Wu Y. Zhang Bo Cao Divya Sharma Sridhar Krishnan Venkat Bhat ELM LRM 48 0 0 02 May 2025
Semantics at an Angle: When Cosine Similarity Works Until It Doesn't Kisung You 23 0 0 22 Apr 2025
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation Fahao Chen Peng Li Zicong Hong Zhou Su Song Guo MoMe MoE 67 0 0 23 Nov 2024
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why? QiHong Chen Jiawei Li Jiecheng Deng Jiachen Yu Justin Tian Jin Chen Iftekhar Ahmed 48 0 0 03 Nov 2024
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling Nirav Bhan Shival Gupta Sai Manaswini Ritik Baba Narun Yadav Hillori Desai Yash Choudhary Aman Pawar Sarthak Shrivastava Sudipta Biswas LLMAG 21 0 0 23 Oct 2024
Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare Pardis Sadat Zahraei Zahra Shakeri LM&MA 21 0 0 09 Oct 2024
IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios Hai Lin Shaoxiong Zhan Junyou Su Haitao Zheng Hui Wang RALM 21 1 0 24 Sep 2024
Norm of Mean Contextualized Embeddings Determines their Variance Hiroaki Yamagiwa Hidetoshi Shimodaira 25 0 0 17 Sep 2024
Leveraging a Cognitive Model to Measure Subjective Similarity of Human and GPT-4 Written Content Tyler Malloy Maria José Ferreira Fei Fang Cleotilde Gonzalez 16 1 0 30 Aug 2024
Pragmatic inference of scalar implicature by LLMs Ye-eun Cho Seong mook Kim 19 0 0 13 Aug 2024
Semantics or spelling? Probing contextual word embeddings with orthographic noise Jacob A. Matthews John R. Starr Marten van Schijndel 32 2 0 08 Aug 2024
Who's asking? User personas and the mechanics of latent misalignment Asma Ghandeharioun Ann Yuan Marius Guerard Emily Reif Michael A. Lepori Lucas Dixon LLMSV 41 7 0 17 Jun 2024
Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search T. R. Barillot Alex De Castro 21 0 0 12 Jun 2024
PowerPeeler: A Precise and General Dynamic Deobfuscation Method for PowerShell Scripts Ruijie Li Chenyang Zhang Huajun Chai Lingyun Ying Haixin Duan Jun Tao 27 0 0 06 Jun 2024
AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization Jiawei Chen Xiao Yang Zhengwei Fang Yu Tian Yinpeng Dong Zhaoxia Yin Hang Su 24 1 0 30 May 2024
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM Michelle S. Lam Janice Teoh James A. Landay Jeffrey Heer Michael S. Bernstein 19 40 0 18 Apr 2024
Is Cosine-Similarity of Embeddings Really About Similarity? Harald Steck Chaitanya Ekanadham Nathan Kallus DML 25 66 0 08 Mar 2024
When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality Brielen Madureira Patrick Kahardipraja David Schlangen 31 2 0 20 Feb 2024
Injecting Wiktionary to improve token-level contextual representations using contrastive learning Anna Mosolova Marie Candito Carlos Ramisch 16 0 0 12 Feb 2024
A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models Vansh Sharma Venkat Raman 21 7 0 31 Dec 2023
Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks Bo Li Wei Ye Quan-ding Wang Wen Zhao Shikun Zhang VLM 30 1 0 14 Dec 2023
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations Raphael Tang Xinyu Crystina Zhang Jimmy J. Lin Ferhan Ture 30 6 0 30 Nov 2023
Two-Stage Classifier for Campaign Negativity Detection using Axis Embeddings: A Case Study on Tweets of Political Users during 2021 Presidential Election in Iran Fatemeh Rajabi Ali Mohades 11 0 0 31 Oct 2023
The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation E. Tokarchuk Vlad Niculae 13 2 0 31 Oct 2023
Translating away Translationese without Parallel Data Rricha Jalota Koel Dutta Chowdhury C. España-Bonet Josef van Genabith 22 6 0 28 Oct 2023
Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language Models You Li Jinhui Yin Yuming Lin 23 0 0 19 Oct 2023
Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance Molly R. Petersen Lonneke van der Plas LRM 13 6 0 09 Oct 2023
Using Artificial Populations to Study Psychological Phenomena in Neural Models Jesse Roberts Kyle Moore Drew Wilenzick Doug Fisher 17 6 0 15 Aug 2023
Solving Cosine Similarity Underestimation between High Frequency Words by L2 Norm Discounting Saeth Wannasuphoprasit Yi Zhou Danushka Bollegala 24 4 0 17 May 2023
Unsupervised Sentence Representation Learning with Frequency-induced Adversarial Tuning and Incomplete Sentence Filtering Bing Wang Ximing Li Zhiyao Yang Yuanyuan Guan Jiayin Li Sheng-sheng Wang 27 6 0 15 May 2023
Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings Taichi Aida Danushka Bollegala 16 8 0 15 May 2023
Dictionary-Assisted Supervised Contrastive Learning Patrick Y. Wu Richard Bonneau Joshua A. Tucker Jonathan Nagler CLIP 17 0 0 27 Oct 2022
Word Embedding for Social Sciences: An Interdisciplinary Survey Akira Matsui Emilio Ferrara 11 5 0 07 Jul 2022
Richer Countries and Richer Representations Kaitlyn Zhou Kawin Ethayarajh Dan Jurafsky 35 9 0 10 May 2022
BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology Luke Gessler Nathan Schneider 13 7 0 20 Sep 2021
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality William Timkey Marten van Schijndel 213 110 0 09 Sep 2021