ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.01011
  4. Cited By
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

2 November 2023
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
Tiffany Wang
Isaac Ong
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
ArXivPDFHTML

Papers citing "Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game"

11 / 61 papers shown
Title
SPML: A DSL for Defending Language Models Against Prompt Attacks
SPML: A DSL for Defending Language Models Against Prompt Attacks
Reshabh K Sharma
Vinayak Gupta
Dan Grossman
AAML
49
14
0
19 Feb 2024
Stealthy Attack on Large Language Model based Recommendation
Stealthy Attack on Large Language Model based Recommendation
Jinghao Zhang
Yuting Liu
Qiang Liu
Shu Wu
Guibing Guo
Liang Wang
27
13
0
18 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
41
14
0
15 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
65
81
0
13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM
  Agents Exponentially Fast
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min-Bin Lin
LLMAG
LM&Ro
35
47
0
13 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David A. Wagner
SILM
AAML
22
65
0
09 Feb 2024
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks
  Against LLM-Integrated Applications
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
AAML
SILM
15
26
0
15 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David A. Wagner
AAML
SyDa
75
51
0
29 Dec 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David A. Wagner
ALM
18
26
0
06 Nov 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
40
0
16 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Previous
12