Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification

2 June 2025

Zehao Wu

Yanjie Zhao

Haoyu Wang

ArXiv (abs)PDF HTML Github

Main:2 Pages

6 Figures

1 Tables

Appendix:10 Pages

Abstract

As Large Language Models (LLMs) become integral software components in modern applications, unauthorized model derivations through fine-tuning, merging, and redistribution have emerged as critical software engineering challenges. Unlike traditional software where clone detection and license compliance are well-established, the LLM ecosystem lacks effective mechanisms to detect model lineage and enforce licensing agreements. This gap is particularly problematic when open-source model creators, such as Meta's LLaMA, require derivative works to maintain naming conventions for attribution, yet no technical means exist to verify compliance.

View on arXiv

Comments on this paper