Researchers explore how to bring larger neural networks closer to the energy efficiency of biological brains

Credit: Unsplash/CC0 Public Domain

The more lottery tickets you buy, the higher your chances of winning, but spending more than you win is obviously not a wise strategy. Something similar happens in AI powered by deep learning: we know that the larger a neural network is (i.e., the more parameters it has), the better it can learn the task we set for it.

However, the strategy of making it infinitely large during training is not only impossible but also extremely inefficient. Scientists have tried to imitate the way biological brains learn, which is highly resource-efficient, by providing machines with a gradual training process that starts with simpler examples and progresses to more complex ones—a model known as “curriculum learning.”

Surprisingly, however, they found that this seemingly sensible strategy is irrelevant for overparameterized (very large) networks.

A study in the Journal of Statistical Mechanics: Theory and Experiment sought to understand why this “failure” occurs, suggesting that these overparameterized networks are so “rich” that they tend to learn by following a path based more on quantity (of resources) than quality (input organized by increasing difficulty).

This might actually be good news, as it suggests that by carefully adjusting the initial size of the network, curriculum learning could still be a viable strategy, potentially promising for creating more resource-efficient, and therefore less energy-consuming, neural networks.

There is great excitement towards neural network-based AI like ChatGPT: every day, a new bot or feature emerges that everyone wants to try, and the phenomenon is also growing in scientific research and industrial applications. This requires increasing computing power—and, therefore, energy consumption—and the concerns regarding both the energy sources needed and the emissions produced by this sector are on the rise. Making this technology capable of doing more with less is thus crucial.

Neural networks are computational models made up of many “nodes” performing calculations, with a distant resemblance to the networks of neurons in biological brains, capable of learning autonomously based on the input they receive. For example, they “see” a vast number of images and learn to categorize and recognize content without direct instruction.

Among experts, it is well known that the larger a neural network is during the training phase (i.e., the more parameters it uses), the more precisely it can perform the required tasks. This strategy is known in technical jargon as the “Lottery Ticket Hypothesis” and has the significant drawback of requiring a massive amount of computing resources, with all the associated problems (increasingly powerful computers are needed, which demand more and more energy).

To find a solution, many scientists have looked at where this type of problem appears to have been, at least partially, solved: biological brains. Our brains, with only two or three meals a day, can perform tasks that require supercomputers and a huge amount of energy for a neural network. How do they do it?

The order in which we learn things might be the answer. “If someone has never played the piano and you put them in front of a Chopin piece, they’re unlikely to make much progress learning it,” explains Luca Saglietti, a physicist at Bocconi University in Milan, who coordinated the study. “Normally, there’s a whole learning path spanning years, starting from playing ‘Twinkle Twinkle Little Star’ and eventually leading to Chopin.”

When input is provided to machines in an order of increasing difficulty, it is called “curriculum learning.” However, the most common way to train neural networks is to feed them input randomly into highly powerful, overparameterized networks.

Once the network has learned, it is possible to reduce the number of parameters—even lower than 10% of the initial amount—because they are no longer used. However, if you start with only 10% of the parameters, the network fails to learn. So, while an AI might eventually fit into our phone, during training, it requires massive servers.

Scientists have wondered whether curriculum learning could save resources. But research so far suggests that for very overparameterized networks, curriculum learning seems irrelevant: performance in the training phase does not seem to be improved.

The new work by Saglietti and colleagues attempted to understand why.

“What we’ve seen is that an overparameterized neural network doesn’t need this path because, instead of being guided through learning by examples, it’s guided by the fact that it has so many parameters—resources that are already close to what it needs,” explains Saglietti.

In other words, even if you offer it optimized learning data, the network prefers to rely on its vast processing resources, finding parts within itself that, with a few tweaks, can already perform the task.

This is actually good news, as it does not mean that networks cannot take advantage of curriculum learning, but that, given the high number of initial parameters, they are pushed in a different direction. In principle, therefore, one could find a way to start with smaller networks and adopt curriculum learning.

“This is one part of the hypothesis explored in our study,” Saglietti explains.

“At least within the experiments we conducted, we observed that if we start with smaller networks, the effect of the curriculum—showing examples in a curated order—begins to show improvement in performance compared to when the input is provided randomly. This improvement is greater than when you keep increasing the parameters to the point where the order of the input no longer matters.”

More information:
Stefano Sarao Mannelli et al, Tilting the odds at the lottery: the interplay of overparameterisation and curricula in neural networks*, Journal of Statistical Mechanics: Theory and Experiment (2024). DOI: 10.1088/1742-5468/ad864b

Provided by
International School of Advanced Studies (SISSA)

Citation:
Researchers explore how to bring larger neural networks closer to the energy efficiency of biological brains (2024, November 19)
retrieved 20 November 2024
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Top Stories

Science fiction stories allow us to explore what we want and what we reject with AI

Why I featured Davido on ‘If It’s Okay’

“Only selfish men are intimidated by women’s success” – Diva Gold

Stay Connected

Researchers explore how to bring larger neural networks closer to the energy efficiency of biological brains

Leave a Reply Cancel reply

Content Safety

Trustworthy

Celebrity Gig Magazine

Related Stories

Why the EV boom could put a major strain on our power grid

China’s SMIC posts a 80% drop in third-quarter profit

Association laments Ijora bridge collapse

In new world of AI, same old data privacy tradeoff for consumers

Slowest new laptop in the world is now on sale, with Windows 95 and a CPU that’s almost 40 years old — but at least it is (almost) pocketable and can run Doom or Commander Keen

Tesla suppliers jump as EV maker cuts prices for some models in China

Quordle today – hints and answers for Friday, May 17 (game #844)

An innovative no-code prototype to automate design structure matrix generation

About Us

Quick Links