Exploring the ‘Jekyll-and-Hyde tipping point’ in AI

Celebrity Gig
Attention head (‘AI’) shown in basic form, generates a response to a user’s prompt. Credit: arXiv (2025). DOI: 10.48550/arxiv.2504.20980

Language learning machines, such as ChatGPT, have become proficient in solving complex mathematical problems, passing difficult exams, and even offering advice for interpersonal conflicts. However, at what point does a helpful tool become a threat?

Trust in AI is undermined because there is no science that predicts when its output goes from being informative and based on facts to producing material or even advice that is misleading, wrong, irrelevant or even dangerous.

In a new study, George Washington University researchers have explored when and why the output of large language models goes awry. The study is published on the arXiv preprint server.

READ ALSO:  Lack of standard training for EV techs perpetuates shortage

Neil Johnson, a professor of physics at the George Washington University, and a GW graduate student, Frank Yingjie Huo, developed a mathematical formula to pinpoint the moment at which the “Jekyll-and-Hyde tipping point” occurs. At the tipping point, AI’s attention has been stretched too thin and it starts pushing out misinformation and other negative content, Johnson says.

READ ALSO:  Spain under pressure to abort nuclear energy phase-out

In the future, Johnson says the model may pave the way toward solutions which would help keep AI trustworthy and prevent this tipping point.

This paper provides a unique and concrete platform for discussions between the public, policymakers and companies about what might go wrong with AI in future personal, medical, or societal settings—and what steps should be taken to mitigate the risks, Johnson says.

More information:
Neil F. Johnson et al, Jekyll-and-Hyde Tipping Point in an AI’s Behavior, arXiv (2025). DOI: 10.48550/arxiv.2504.20980

Journal information:
arXiv


Provided by
George Washington University


Categories

Share This Article
Leave a comment