KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

8 January 2026

Tingyu Wu

Zhisheng Chen

Ziyan Weng

Shuhe Wang

Chenglong Li

Shuo Zhang

Sen Hu

Silin Wu

Qizhen Lan

Huacan Wang

Ronghao Chen

ArXiv (abs)PDF HTML HuggingFace (46 upvotes)Github

Main:9 Pages

2 Figures

Bibliography:2 Pages

4 Tables

Appendix:7 Pages

Abstract

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{this https URL}.

View on arXiv

Comments on this paper