Computer scientists develop solutions for making AI models more efficient and customizable

Celebrity Gig
Anshumali Shrivastava is an associate professor of computer science, electrical and computer engineering and statistics, and a member of the Ken Kennedy Institute at Rice University. Credit: Jeff Fitlow / Rice University

Artificial intelligence (AI) is everywhere—from the chatbots we consult for customer support to tools predicting how diseases might spread. But the computing power and energy required to power modern AI models—such as large language models (LLMs)—can make them expensive, inaccessible and environmentally taxing. A team of researchers at Rice University is working on solutions to change that.

“Generative artificial intelligence is still in its infancy when it comes to broader integration,” said Anshumali Shrivastava, associate professor of computer science, electrical and computer engineering and statistics and member of Rice’s Ken Kennedy Institute. “We have a long way to go till we see the full potential of this technology in play.”

Shrivastava explained that successful AI integration entails companies and organizations having access to expert AI systems that can tap their data infrastructure securely to perform highly specialized tasks.

“For an AI to solve physics problems well, it needs to be built by physicists, and AI that is solving a medical problem has to be built by medical experts,” Shrivastava said.

Easier said than done: Building LLMs from scratch is a major lift in terms of labor, energy and data. In most cases, in order to deploy LLMs in context-specific settings while preserving data security, the only available option is to customize existing models.

Shrivastava and several members of his research group presented three of their most recent advancements in tweaking LLMs to better suit users’ needs at the latest convening of the AI conference Neural Information Processing Systems (NeurIPS) in Vancouver, British Columbia, in December 2024.

READ ALSO:  Twitter to start charging developers for API access

The three papers develop superior alternatives to popular strategies such as low rank approximations and standard quantizations, showcasing the impact potential and creativity of AI research at Rice.

LLMs are neural network systems that learn from and process language data. These algorithms are equipped with parameters or variables that determine how input (say, a ChatGPT prompt) gets turned into output (an email draft).

The “large” in LLM points to the trend over the past decade to equip the models with more and more parameters and data, since this translates into enhanced intelligence. In turn, this has resulted in a significant increase in the computational power and memory needed to train and deploy the models; hence LLMs’ notoriously large memory and energy footprint.

One of the Rice team’s papers presented at NeurIPS explores a concept Shrivastava calls “parameter sharing,” introducing Sketch Structured Transforms (SS1)—a method for handling the vast tables of numbers, called weight matrices or working memory, that AI models rely on to make predictions and decisions.

SS1 leverages parameter sharing, a fundamental idea of probabilistic algorithms, to reduce the model’s memory and computation needs while maintaining its expressivity and accuracy. For example, when applied to popular LLMs, the SS1 technique sped up processing times by over 11% without requiring additional fine-tuning.

Today, LLMs—and more broadly, foundation models—rely on expensive, power-hungry hardware called GPUs (graphics processing units) to perform the millions of calculations they need. This means that foundation models are often confined to data centers owned by Big Tech companies or require expensive hardware far from the reach of most people or smaller organizations.

READ ALSO:  Plasma dynamic synthesis method produces carbide and carbonitride for industrial applications

Shrivastava’s team has developed an algorithm that allows LLMs to run efficiently on standard computer processors (CPUs) instead of GPUs. This work, outlined in a second paper presented at NeurIPS, leverages CPUs’ very own hardware capabilities to redesign how calculations happen: The NoMAD Attention algorithm replaces complex operations with a clever alternative, using a feature of CPUs’ memory architecture in a way that’s faster and less resource-intensive.

“Our algorithm makes everything run twice as fast without any accuracy loss,” said Tianyi Zhang, a Rice doctoral student in Shrivastava’s research group and first author on two of the papers presented at NeurIPS.

This breakthrough means that in the near future, advanced AI tools might not just live in the cloud but could run directly on a phone or laptop.

Another challenge AI researchers face is managing context memory. Large AI models do not just need powerful processors—they also require enormous amounts of high-speed memory to store their “thoughts.” For example, LLMs like ChatGPT keep a temporary “notepad” of everything they have seen in a conversation. Known as the “key-value” or “KV-cache,” this memory grows as the conversation continues, quickly straining even the most advanced systems.

In a third paper, the team introduced “coupled quantization,” a method for compressing this memory without losing the quality of the model’s responses. Traditional methods compress each piece of information individually, but Shrivastava’s team realized that this approach misses a key part of the picture: The different pieces of memory are interconnected. By compressing related pieces together, their method achieves much greater efficiency.

READ ALSO:  US alleges software scheme to raise rent prices

“We found that we could shrink the memory down to just one bit per piece of information—basically the smallest possible size—while still preserving the model’s performance,” Zhang said. “To my knowledge, we are the first to achieve this.”

Shrivastava’s work reflects a broader vision for the future of AI, one where advanced AI is available to everyone, not just tech giants. Only a handful of organizations currently have the resources to train and fine-tune LLMs, leaving most companies reliant on prebuilt systems. Shrivastava said he sees a future where every organization could create its own AI tools tailored to its specific needs without breaking the bank.

But getting there will require more than just technical breakthroughs. As Shrivastava points out, “We’re only scratching the surface of what AI can do, and already the energy and computing demands are significant. If we want a future where AI solves problems in health care, climate science, etc., we need to make it vastly more efficient. It is clear that the next frontier of efficiency in AI will come via algorithms.”

Provided by
Rice University


Citation:
Computer scientists develop solutions for making AI models more efficient and customizable (2025, February 3)
retrieved 3 February 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Categories

Share This Article
Leave a comment