Images generated by artificial intelligence (AI) are often almost indistinguishable from real images to the human eye. Watermarks—visible or invisible markers embedded in image files—may be the key to verifying whether an image was generated by AI. So-called semantic watermarks, which are embedded deep within the image generation process itself, are considered to be especially robust and hard to remove.
However, Cybersecurity researchers from Ruhr University Bochum, Germany, showed that this assumption is wrong. In a talk at the Conference on Computer Vision and Pattern Recognition (CVPR 2025) on June 15 in Nashville, Tennessee, U.S., the team revealed fundamental security flaws in the supposedly resilient watermarking techniques.
“We demonstrated that attackers could forge or entirely remove semantic watermarks using surprisingly simple methods,” says Andreas Müller from Ruhr University Bochum’s Faculty of Computer Science, who co-authored the study alongside Dr. Denis Lukovnikov, Jonas Thietke, Professor Asja Fischer, and Dr. Erwin Quiring. The paper is available on the arXiv preprint server.
Two novel attack strategies
Their research introduces two novel attack strategies. The first method, known as the imprinting attack, works at the level of latent representations—i.e., the underlying digital signature of an image on which AI image generators work. The hidden representation of a real image—its underlying digital structure, so to speak—is deliberately modified to resemble that of an image containing a watermark.
This makes it possible to transfer the watermark onto any real image, even though the reference image was originally purely AI-generated. An attacker can therefore deceive an AI provider by making any image appear watermarked—and thus artificially generated—effectively making real images look fake.
“The second method, the reprompting attack, exploits the ability to return a watermarked image to the latent space and then regenerate it with a new prompt. This results in arbitrary newly generated images that carry the same watermark,” explains co-author Dr. Quiring from Bochum’s Faculty of Computer Science.
Attacks work independently of AI architecture
Alarmingly, both attacks require just a single reference image containing the target watermark and can be executed across different model architectures; they work for older legacy UNet-based systems as well as for newer diffusion transformers. This cross-model flexibility makes the vulnerabilities especially concerning.
According to the researchers, the implications are far-reaching: Currently, there are no effective defenses against these types of attacks. “This calls into question how we can securely label and authenticate AI-generated content moving forward,” Müller warns. The researchers argue that the current approach to semantic watermarking must be fundamentally rethought to ensure long-term trust and resilience.
More information:
Andreas Müller et al, Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models, arXiv (2024). DOI: 10.48550/arxiv.2412.03283
Citation:
Semantic watermarks for AI image recognition can be easily manipulated (2025, June 23)
retrieved 24 June 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.