Skip to main navigation Skip to search Skip to main content

RogueGPT: Unleashing Jailbreak Prompts on LLMs

  • Arpitha Shivaswaroopa
  • , Vanshika Sood
  • , H. L. Gururaj
  • , J. Shreyas
  • , Fadi Al-Turjman*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Large Language Models (LLMs) have seen a remarkable surge in popularity since the latter part of 2022. These models have become vital in the lives of individuals from varying professions. While some users leverage LLMs for academic or informational purposes, others exploit them for illicit activities. Methods of exploitation include Adversarial Attacks, Instruction Tuning Attacks, Inference Attacks, and Extraction Attacks. This paper investigates a specific Instruction Tuning Attack known as jailbreaking, which manipulates LLMs with prompts to generate harmful responses to forbidden instructions. This study presents compelling evidence of how widely used LLMs, such as OpenAI's ChatGPT, Google's Gemini, Meta's LLaMa, LMSYS's Vicuna, and Alibaba Cloud's Qwen, can be manipulated to generate responses that range from mildly illegal to potentially criminal content. Jailbreak prompts were created for each LLM, encompassing a range of inquiries spanning various categories. Based on the level of response elicited, they were categorized and computed alongside the Attack-to-Success Rate (ASR). These findings highlight the effectiveness of our prompts on each LLM and their performance relative to other models. Vicuna produced the best results with ASR (0.93) and FT (0.842), followed by LLaMa with ASR (0.71) and FT (0.709), indicating their vulnerability. The category of False Information had the highest overall average, with ASR (0.864) and FT (0.96). Our conclusions were reached through a combination of human assessment and quantitative analysis, detailed in subsequent sections. Through the dissemination of this research, the aim is to encourage organizations to prioritize their security measures and raise awareness among individuals about the responsible and ethical use of LLMs, given their potential for harm.

Original languageEnglish
Article numbere70069
JournalEngineering Reports
Volume8
Issue number4
DOIs
Publication statusPublished - 04-2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 16 - Peace, Justice and Strong Institutions
    SDG 16 Peace, Justice and Strong Institutions

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Engineering

Fingerprint

Dive into the research topics of 'RogueGPT: Unleashing Jailbreak Prompts on LLMs'. Together they form a unique fingerprint.

Cite this