Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.19465
Cited By
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
29 February 2024
Chao Qian
Jie M. Zhang
Wei Yao
Dongrui Liu
Zhen-fei Yin
Yu Qiao
Yong Liu
Jing Shao
LLMSV
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models"
9 / 9 papers shown
Title
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
59
1
0
07 Feb 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Y. Wang
Hao Wang
66
1
0
23 Dec 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
130
349
0
01 Feb 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
91
164
0
10 Oct 2023
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning
Haowen Chen
Yiming Zhang
Qi Zhang
Hantao Yang
Xiaomeng Hu
Xuetao Ma
Yifan YangGong
J. Zhao
ALM
61
46
0
16 May 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
213
297
0
26 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
221
402
0
24 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1