English lit grad’s AI tool deciphers Twitter bios, aiding text analysis

Celebrity Gig
Credit: CC0 Public Domain

An English literature graduate turned data scientist has developed a new method for large language models (LLMs) used by AI chatbots to understand and analyze small chunks of text, such as those on social media profiles, in customer responses online or for understanding online posts responding to disaster events.

In today’s digital world, such use of short text has become central to online communication. However, analyzing these snippets is challenging because they often lack shared words or context. This lack of context makes it difficult for AI to find patterns or group similar texts.

The new research addresses the problem by using large language models (LLMs) to group large datasets of short text into clusters. These clusters condense potentially millions of tweets or comments into easy-to-understand groups generated by the model.

Ph.D. student Justin Miller has developed this method for use by AI programs that successfully produced coherent categories after analyzing nearly 40,000 Twitter (X) user biographies from accounts tweeting about U.S. President Donald Trump over two days in September 2020.

The language model developed by Miller, an English literature graduate, clustered the biographies into 10 categories, and allocated scores within each of these categories to assist in analyzing the likely occupation of the tweeters, their political leaning, or even their use of emojis.

READ ALSO:  Industrialist gets recognition, canvasses manufacturing-friendly policies

The study is published in the Royal Society Open Science journal.

Miller said, “What makes this study stand out is its focus on human-centered design. The clusters created by the large language models are not only computationally effective but also make sense to people.

“For instance, texts about family, work, or politics are grouped in ways that humans can intuitively name and understand. Furthermore, the research shows that generative AI, such as ChatGPT, can mimic how humans interpret these clusters.

“In some cases, the AI provided clearer and more consistent cluster names than human reviewers, particularly when distinguishing meaningful patterns from background noise.”

Miller, a doctoral candidate in the School of Physics and a member of the Computational Social Sciences lab, said the tool he has developed could be used to simplify large datasets, gain insights for decision making and improve search and organization.

Using large language models (LLMs), the authors created clusters using a methodology known as “Gaussian mixture modeling” that capture the essence of the text and are easier for humans to understand. They validated these clusters by comparing human interpretations with those from a generative LLM, which closely matched human reviews.

READ ALSO:  Apple ends block on EU app store for Fortnite-maker Epic

This approach not only improved clustering quality but also suggests that human reviews, while valuable, might not be the only standard for cluster validation.

Miller said, “Large datasets, which would be impossible to manually read, can be reduced into meaningful, manageable groups.”

Applications include:

  • Simplifying Large Datasets: Large datasets, which would be impossible to manually read, can be reduced into meaningful, manageable groups. For example, Mr. Miller applied the same methods from this paper to another project on the Russia-Ukraine war. By clustering over 1 million social media posts, he identified 10 distinct topics, including Russian disinformation campaigns, the use of animals as symbols in humanitarian relief, and Azerbaijan’s attempts to showcase its support for Ukraine.
  • Gaining Insights for Decision-Making: Clusters provide actionable insights for organizations, governments and businesses. A business might use clustering to identify what customers like or dislike about their product, while governments could use it to condense wide ranging public sentiment into a few topics.
  • Improving Search and Organization: For platforms handling large volumes of user-generated content, clustering makes it easier to organize, filter and retrieve relevant information. This method can help users quickly find what they’re looking for and improve overall content management.
READ ALSO:  One-step electrochemical regeneration of CO₂ from (bi)carbonates enhances carbon capture efficiency

Miller said, “This dual use of AI for clustering and interpretation opens up significant possibilities. By reducing reliance on costly and subjective human reviews, it offers a scalable way to make sense of massive amounts of text data. From social media trend analysis to crisis monitoring or customer insights, this approach combines machine efficiency with human understanding to organize and explain data effectively.”

More information:
Justin K. Miller et al, Human-interpretable clustering of short-text using large language models, Royal Society Open Science (2025). On arXiv: DOI: 10.48550/arxiv.2405.07278

Provided by
University of Sydney


Citation:
English lit grad’s AI tool deciphers Twitter bios, aiding text analysis (2025, January 21)
retrieved 21 January 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Categories

Share This Article
Leave a comment