Europol warns of malicious use of ChatGPT

Security

Prompt engineering and jailbreaking: Europol warns of ChatGPT exploitation

The concern arises from the growing number of cybercriminals attempting to exploit the AI-based chatbot for developing malware and other malicious tools.

byHabiba Rashid

March 28, 2023

3 minute read

Europol has expressed concerns about the possibility of cybercriminals exploiting ChatGPT through various techniques to bypass the safety features implemented by OpenAI to prevent harmful content generation.

Open AI’s ChatGPT has one of the fastest-growing user bases with over 100 million active users. While it’s a success for users and investors, this also makes the platform a lucrative target for cybercriminals.

Large language models have revolutionized the field of Natural Language Processing (NLP), allowing computers to generate human-like text with increasing accuracy.

However, the potential for criminal exploitation of LLMs has raised concerns for law enforcement agencies worldwide. Recently, Europol Innovation Lab organized workshops to explore the possibilities of LLM exploitation by criminals and how it would impact law enforcement.

The key findings of these workshops were released to the public on 27th March in a report, which primarily focused on ChatGPT due to its common availability and growing popularity.

ChatGPT is a large language model that was developed by OpenAI, an artificial intelligence research laboratory. The model is part of the GPT (Generative Pre-trained Transformer) series and is one of the most advanced and sophisticated language models in the world.

Released to the public in November 2022, it quickly caught the public eye due to its ability to provide ready-to-use answers. However, with increasing use, its limitations were made evident as well.

To prevent malicious use of the model, OpenAI has implemented several safety features, including a moderation endpoint that evaluates text inputs for potentially harmful content and restricts ChatGPT’s ability to respond to such prompts.

However, the report highlights that despite these safeguards, criminals may employ prompt engineering to circumvent content moderation limitations. Prompt engineering is the practice of refining the way a question is asked to influence the output generated by an AI system. While prompt engineering can maximize the usefulness of AI tools, it can also be abused to produce harmful content.

One of the most common workarounds to bypass content moderation limitations is prompt creation. This involves providing an answer and asking ChatGPT to provide the corresponding prompt. Other workarounds include asking ChatGPT to provide an answer as a piece of code or pretending to be a fictional character discussing a topic.

Additionally, replacing trigger words and changing the context later, style/opinion transfers, and creating fictitious examples that are easily transferable to real events are all methods that can be used to circumvent ChatGPT’s safety features.

The most advanced and powerful workarounds involve jailbreaking the model, such as the ‘Do Anything Now‘ (DAN) jailbreak prompt. This prompt is designed to bypass OpenAI’s safeguards and leads ChatGPT to respond to any input, regardless of its potentially harmful nature.

Although OpenAI has quickly closed such loopholes, new and more complex versions of DAN have emerged subsequently.

“As the capabilities of LLMs (large language models) such as ChatGPT are actively being improved, the potential exploitation of these types of AI systems by criminals provides a grim outlook,” Europol said.

Another specific concern raised by Europol is the potential for criminals to use LLMs to impersonate others in online conversations. Attackers could use a language model to create text that appears to be generated by a certain trusted individual or entity, such as a bank representative or a government official.

Europol also warns that LLMs could be used to generate highly convincing phishing emails, which can trick victims into handing over their login credentials or other sensitive information.

With ChatGPT, it has become increasingly simple for individuals with a minimal grasp of the English language to input prompts that generate formal and grammatically correct texts and at a faster speed, as well.

While previously online scams were easy enough to recognize with the poor use of language, criminals may now use language models to generate highly convincing texts for nefarious purposes.

Similarly, language models such as ChatGPT also present the ability to generate codes that may be used maliciously. The newer model, GPT-4, is especially effective in understanding code contexts and correcting the errors it may contain. This would