Novel Code Generation Technique Improves Machine Learning Speed

April 18, 2024

Authored by:

April Horency

Sparse Matrix-Matrix Multiplication (SpMM) is the fundamental calculation used for various artificial intelligence (AI) and machine learning (ML) applications. Consequently, optimizing SpMM performance is garnering increasing research attention due to these applications' large input data size. Professor of Electrical and Computer Engineering Howie Huang and co-authors, former student Qiang Fu, Ph.D.‘23, and Dr. Thomas Rolinger from the Laboratory for Physical Sciences, advanced this line of research in their award-winning study, “JITSpMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication,” honored at the 2024 International Symposium on Code Generation and Optimization (CGO).

CGO, renowned as the leading conference for cutting-edge research at the interface of computer hardware and software, covers a wide range of optimization and code generation techniques and related issues. With over fifteen years of experience, Huang works at the intersection of graph algorithms, computer architectures, and systems. The research conducted in his GraphLab is directed toward innovating high-performance computing and ML algorithms and systems, particularly for processing large graph datasets. In the new study, he and Fu presented a novel code generation technique aimed at optimizing SpMM, which earned them the Distinguished Paper Award at CGO 2024.

“While our previous research was nominated for the Best Paper Award in the past, it is an honor to receive the Distinguished Paper Award at CGO this year,” said Huang. “This is a testament to the significance and caliber of this work recognized by the research community.”

Most existing solutions for SpMM computation follow an ahead-of-time (AOT) compilation approach, meaning it converts the human-readable code into machine-readable or binary code entirely before execution. However, this approach has proven to be suboptimal due to unnecessary data access and redundant computation.

To overcome these inefficiencies, the research team designed the novel computing approach JITSpMM, a just-in-time (JIT) assembly code generation framework that accelerates SpMM computation on multi-core central processing units (CPUs). In this approach, they analyze the input data first before generating the binary code to ensure it is suitable for the matrix input as well as the underlying computer architecture. Taking advantage of the availability of runtime information is what allows them to optimize the code created, significantly shortening the time required to train AI/ML models.

To evaluate JITSpMM, Huang and Fu compared it to two AOT baselines from Intel. Their findings revealed impressive performance enhancements with JITSpMM, achieving improvements as high as 3.8x.

“Training large models can span from days to months and our ability to significantly accelerate this process can have a profound impact, providing many ML/AI applications a much-needed speed boost,” Huang stated.