ChatGPT can be made to generate sexualised and violent images, researchers find

1 month ago 55

ARTICLE AD BOX

13 minutes ago

Chris VallanceTechnology reporter

Mindgard A synthetic image of a naked and injured man lying on the floor surrounded by men carrying guns their faces covered.

The latest public version of ChatGPT can be made to generate sexualised images or depict scenes of graphic violence with a simple prompt, researchers have told the BBC.

British AI security startup Mindgard figured out how to make ChatGPT create graphic pictures by slightly altering a widely-shared instruction, or prompt, which was originally designed to produce humorous results.

After being contacted by the BBC, ChatGPT's maker OpenAI said it had taken action to stop the chatbot responding with those types of images.

"After investigating this trend, we've introduced additional safeguards against this type of prompt," it said in a statement.

It also said it has multiple layers of protection to prevent users making content which breaches its terms and conditions.

However, the AI security researchers said that with further small changes, the problematic prompt still produced concerning content.

The BBC is not disclosing what the researchers typed into ChatGPT.

But we have seen how the chatbot, OpenAI's GPT-5.4 model, was prompted to create graphic material.

Even without detailed instructions, it would generate images that Mindgard's founder, Peter Garraghan, described as "very gruesome, sometimes sexualised, sometimes both together".

He added he was particularly concerned that the prompt did not specify the subject matter of the images, but the AI produced a range of gory and sexualised images of "its own volition".

Garraghan - also a professor in the computing department of Lancaster University - said that was troubling.

"This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content," he said.

Mindgard A synthetic image of a woman. She is sitting on the floor in a dirty grey walled room. A black rectangle, for redaction, covers her head body and arms.

Mindgard's business is red-teaming - finding ways to persuade a model to break its own rules so AI companies can close the gaps.

Jim Nightingale, the firm's AI safety and security researcher who uncovered the issues, said he was left "shaken, and in tears" by the images the chatbot could be made to generate.

The BBC has seen some of them.

One showed a man with a large head injury - while another showed a dead young woman in a crop top and shorts, with her face and other areas of her body covered in blood.

Features of the image suggest sexual violence, Mindgard said. ChatGPT gave it the title "Grim crime scene aftermath".

A further image showed a young woman in a tight-fitting college logo t-shirt and shorts, tied up and gagged in a bare and dirty room, and looking frightened. ChatGPT called it "abandoned in fear and restraint".

Other generated images showed sexual posing and nudity.

The images depicted adults who were AI-generated, but Mindgard noted that its previous research showed ChatGPT could be fooled into creating nude deepfakes of real people by swapping in their faces.

While OpenAI said they had fixed that, the researchers said an alternative approach still succeeded, and showed the BBC a new image created using the method.

Garraghan feared it could be possible to generate worse images had they continued exploring the vulnerability. "Other topics, I'm sure, would also come out if we spent more time doing so," he said.

The BBC understands that as well as new safeguards the firm continues to monitor and roll out additional mitigating protections that encourage the model not to generate images in response to the prompt.

Large language models such as ChatGPT are trained on millions of images often taken from existing content on the internet.

Nightingale believes ChatGPT's output reflects the data which has been used to develop and train it.

"I'm struck that while what I saw was generated, an artificial image, it has ties to real images, and the real world," he wrote in his report.

Mindgard A redacted version of a synthetic image. A woman lies on dirty ground - her head and body are covered by a black rectangle used for redaction - only her arms and legs are visible.

The researchers first alerted OpenAI in May and shared their findings, but received only an automated response from the tech company. They believe an effort was made to block the prompt but it was easily circumvented.

OpenAI took more action after being contacted by the BBC.

"We also combine automated systems and human review to identify and block harmful material", it added in a statement. It said it also has systems that attempt to block violating material that users upload.

Its policies prohibit sexual violence, non-consensual intimate content, child sexual abuse material and attempts to bypass its safeguards.

In its latest document outlining how ChatGPT should behave, OpenAI said: "The assistant should not generate erotica, depictions of illegal or non-consensual sexual activities, or extreme gore, except in scientific, historical, news, artistic or other contexts where sensitive content is appropriate."

But it is notoriously difficult to fully prevent AI models from crossing sometimes quite nuanced rules and guardrails.

The task companies face is "mountainous", according to Dr Rumman Chowdhury, an expert in evaluating AI models and chief executive of Humane Intelligence.

Chowdhury, who was not involved in the Mindgard research, said it was "a game of cat and mouse" - as protections get better, methods to get round them become more sophisticated.

One of the key issues is that models don't understand, as humans do, what they are producing or what they are being asked not to do.

"Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong," she told BBC News.

Last year, researchers at the UK's AI Security Institute found jailbreaks that overrode safeguards across a range of harmful requests in every AI system it tested.

The Department for Science, Innovation and Technology said in a statement that "safeguards in AI models are improving, but there is more to do".

The AI Security Institute would continue to work with developers to quickly strengthen security before models are released, it added.