Computing & Wireless : Computing Methods
Available for non-exclusive licensing
- Kazushige Goto , Texas Advanced Computing Center
The Basic Linear Algebra Subprograms (BLAS) library contains a collection of computing routines for performing low-level basic linear algebra operations, including matrix multiplication. Users of high-end supercomputers often rely on routines from the BLAS library as a building block for developing a wide array of scientific and engineering software. Consequently, optimization of key BLAS routines can result in a measurable increase in productivity of both workstation-level computers and expensive supercomputing resources.
GotoBLAS is an optimized implementation of the BLAS library and is available for a range of computing architectures. GotoBLAS speeds up the most commonly utilized components of the BLAS library and has been used to greatly increase the performance of a variety of scientific applications. It has increased the speed of some applications by as much as 50 percent.
GotoBLAS focuses on optimizing the matrix multiplication routine, a computationally intensive standard matrix operation that can significantly slow processing time. While Goto BLAS uses performance-enhancing cache management techniques similar to other standard BLAS routines, it is able to achieve superior performance on a broad spectrum of supercomputing architectures by decreasing computing overhead caused by TLB (Translation Look-aside Buffer) table misses, an issue that results in significant performance degradation but is generally not addressed by other BLAS routines.
GotoBLAS is fully developed and commercially implemented with updated versions available as new computing architectures emerge. Currently, GotoBLAS is supported on the following architectures: Itanium2, Alpha 21264, Power 3/4/5, Pentium 4/Xeon (32-bit and 64-bit architectures), Opteron, Blue Gene, PPC970MP, and Sparc IV.
BLAS routines are used in a diverse set of computing applications including fluid dynamics, structural mechanics, reservoir modeling, acoustics, graphics and visualization, Fourier transforms, linear solvers, and much more. In addition, the underlying performance of BLAS is a key element in the LINPACK benchmark, which is used to quantify the performance and ranking of the fastest supercomputers in the world. Consequently, GotoBLAS is frequently utilized in LINPACK benchmarks due to its performance improvements over other available implementations.