Most major large language models (LLMs) can quickly tell when they are being given a personality test and will tweak their responses to provide more socially desirable results—a finding with implications for any study using LLMs as a stand-in for humans.
Aadesh Salecha and colleagues gave LLMs from OpenAI, Anthropic, Google, and Meta the classic Big 5 personality test, which is a survey that measures Extraversion, Openness to Experience, Conscientiousness, Agreeableness, and Neuroticism. Researchers have given the Big 5 test to LLMs, but have not typically considered that the models, like humans, may tend to skew their responses to seem likable, which is known as a “social desirability bias.” The work is published in the journal PNAS Nexus.
Typically, people prefer people who have low neuroticism scores and high scores on the other four traits, such as extraversion. The authors varied the number of questions given to models. When only asked a small number of questions, LLMs did not change their responses as much as when the authors asked five or more questions, which allowed models to conclude that their personality was being measured.
For GPT-4, scores for positively perceived traits increased by more than 1 standard deviation, and for neuroticism, scores reduced by a similar amount, as the authors increased the number of questions or told the models that their personality was being measured. This is a large effect, the equivalent of speaking to an average human who suddenly pretends to have a personality that’s more desirable than 85% of the population.
The authors think this effect is likely the result of the final LLM training step, which involves humans choosing their preferred response from LLMs. According to the authors, LLMs “catch on” to which personalities are socially desirable at a deep level, which allows LLMs to emulate those personalities when asked.
More information:
Aadesh Salecha et al, Large language models display human-like social desirability biases in Big Five personality surveys, PNAS Nexus (2024). DOI: 10.1093/pnasnexus/pgae533. academic.oup.com/pnasnexus/art … 3/12/pgae533/7919163
Provided by
PNAS Nexus
Citation:
AI models adjust personality test answers to appear more likable, study finds (2024, December 17)
retrieved 17 December 2024
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.