A High-Performance GPU Implementation of the Classic Barnes Hut N-Body Simulation Algorithm

Computing & Wireless : Utility Software

Available for licensing


  • Martin Burtscher, Ph.D. , ICES
  • Keshav Pingali, Ph.D. , Computer Science

Background/unmet need

The Barnes Hut algorithm is widely used in n-body simulations such as modeling the motion of galaxies. It works by partitioning the 3D space around the bodies into octants until each partition contains one or zero bodies. This partitioning allows to quickly compute the approximate forces that the bodies induce upon each other. Nevertheless, for interesting problem sizes, even this algorithm is time-consuming. Since graphics processing units (GPUs) have many more processing cores than conventional microprocessors, they are ideal candidates for executing such computationally expensive algorithms.

Invention Description

This code implements the classical Barnes Hut n-body algorithm in the Compute Unified Device Architecture (CUDA) programming language for running on a GPU. Unlike most other CUDA programs, it performs complex traversals of an irregular tree-based data structure. The code is fully parallelized within and across thread blocks and is heavily optimized to minimize memory accesses and thread divergence. It demonstrates that complex irregular algorithms can be executed efficiently on GPUs.


  • Implements Barnes Hut n-body algorithm in CUDA
  • GPU code runs much faster than fast CPU code
  • GPU solution is more energy efficient and cost effective


  • CUDA implementation
  • Code simulates one time step with 5 million bodies in 5.25 seconds on GPU—74 times faster than on a CPU
  • Eight times more energy efficient and six times more cost efficient than CPU alternative

Market potential/applications

Any OS that supports CUDA

Development Stage

Lab/bench prototype