97

Semantic Search At LinkedIn

Fedor Borisyuk
Sriram Vasudevan
Muchen Wu
Guoyao Li
Benjamin Le
Shaobo Zhang
Qianqi Kay Shen
Yuchin Juan
Kayhan Behdin
Liming Dong
Kaixu Yang
Shusen Jing
Ravi Pothamsetty
Rajat Arora
Sophie Yanying Sheng
Vitaly Abdrashitov
Yang Zhao
Lin Su
Xiaoqing Wang
Chujie Zheng
Sarang Metkar
Rupesh Gupta
Igor Lapchuk
David N. Racca
Madhumitha Mohan
Yanbo Li
Haojun Li
Saloni Gandhi
Xueying Lu
Chetan Bhole
Ali Hooshmand
Xin Yang
Raghavan Muthuregunathan
Jiajun Zhang
Mathew Teoh
Adam Coler
Abhinav Gupta
Xiaojing Ma
Sundara Raman Ramachandran
Morteza Ramezani
Yubo Wang
Lijuan Zhang
Richard Li
Jian Sheng
Chanh Nguyen
Yen-Chi Chen
Chuanrui Zhu
Claire Zhang
Jiahao Xu
Deepti Kulkarni
Qing Lan
Arvind Subramaniam
Ata Fatahibaarzi
Steven Shimizu
Yanning Chen
Zhipeng Wang
Ran He
Zhengze Zhou
Qingquan Song
Yun Dai
Caleb Johnson
Ping Liu
Shaghayegh Gharghabi
Gokulraj Mohanasundaram
Juan Bottaro
Santhosh Sachindran
Qi Guo
Yunxiang Ren
Chengming Jiang
Di Mo
Luke Simon
Jianqiang Shen
Jingwei Wu
Wenjing Zhang
Main:7 Pages
5 Figures
Bibliography:3 Pages
13 Tables
Appendix:1 Pages
Abstract

Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.

View on arXiv
Comments on this paper