A new programming language for high-performance computing, with much less code

Celebrity Gig
Paper overview Credit: arXiv (2024). DOI: 10.48550/arxiv.2411.07211

Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial intelligence systems. NVIDIA, for instance, developed some of the most advanced high-performance computing (HPC) libraries, creating a competitive moat that has proven difficult for others to breach.

But what if a couple of students, within a few months, could compete with state-of-the-art HPC libraries with a few hundred lines of code, instead of tens or hundreds of thousands?

That’s what researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown with a new programming language called Exo 2.

Exo 2 belongs to a new category of programming languages that MIT Professor Jonathan Ragan-Kelley calls “user-schedulable languages” (USLs). Instead of hoping that an opaque compiler will auto-generate the fastest possible code, USLs put programmers in the driver’s seat, allowing them to write “schedules” that explicitly control how the compiler generates code. This enables performance engineers to transform simple programs that specify what they want to compute into complex programs that do the same thing as the original specification, but much, much faster.

One of the limitations of existing USLs (like the original Exo) is their relatively fixed set of scheduling operations, which makes it difficult to reuse scheduling code across different “kernels” (the individual components in a high-performance library).

READ ALSO:  Tesla earnings a 'moment of truth' for Musk after stumbles

In contrast, Exo 2 enables users to define new scheduling operations externally to the compiler, facilitating the creation of reusable scheduling libraries.

Lead author Yuka Ikarashi, an MIT Ph.D. student in electrical engineering and computer science and CSAIL affiliate, says that Exo 2 can reduce total schedule code by a factor of 100 and deliver performance competitive with state-of-the-art implementations on multiple different platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications. This makes it an attractive option for engineers in HPC focused on optimizing kernels across different operations, data types, and target architectures.

“It’s a bottom-up approach to automation, rather than doing an ML/AI search over high-performance code,” says Ikarashi. “What that means is that performance engineers and hardware implementers can write their own scheduling library, which is a set of optimization techniques to apply on their hardware to reach the peak performance.”

One major advantage of Exo 2 is that it reduces the amount of coding effort needed at any one time by reusing the scheduling code across applications and hardware targets.

READ ALSO:  Amazon introduces robotic arm that can do repetitive warehouse tasks

The researchers implemented a scheduling library with roughly 2,000 lines of code in Exo 2, encapsulating reusable optimizations that are linear-algebra specific and target-specific (AVX512, AVX2, Neon, and Gemmini hardware accelerators). This library consolidates scheduling efforts across more than 80 high-performance kernels with up to a dozen lines of code each, delivering performance comparable to, or better than, MKL, OpenBLAS, BLIS, and Halide.

Exo 2 includes a novel mechanism called “Cursors” that provides what they call a “stable reference” for pointing at the object code throughout the scheduling process. Ikarashi says that a stable reference is essential for users to encapsulate schedules within a library function, as it renders the scheduling code independent of object-code transformations.

“We believe that USLs should be designed to be user-extensible, rather than having a fixed set of operations,” says Ikarashi. “In this way, a language can grow to support large projects through the implementation of libraries that accommodate diverse optimization requirements and application domains.”

READ ALSO:  Ecommerce And Tech Companies Have Much to Learn From Each Other

Exo 2’s design allows performance engineers to focus on high-level optimization strategies while ensuring that the underlying object code remains functionally equivalent through the use of safe primitives. In the future, the team hopes to expand Exo 2’s support for different types of hardware accelerators, like GPUs. Several ongoing projects aim to improve the compiler analysis itself, in terms of correctness, compilation time, and expressivity.

The study is published on the arXiv preprint server.

More information:
Yuka Ikarashi et al, Exo 2: Growing a Scheduling Language, arXiv (2024). DOI: 10.48550/arxiv.2411.07211

Journal information:
arXiv


Massachusetts Institute of Technology


This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation:
Exo 2: A new programming language for high-performance computing, with much less code (2025, March 13)
retrieved 13 March 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Categories

Share This Article
Leave a comment