Machine learning, a form of artificial intelligence, enjoys unprecedented success in commercial applications. However, the use of machine learning in high performance computing for science has been limited. Why? Advanced machine learning tools weren’t designed for big data sets, like those used to study stars and planets. A team from Intel, National Energy Research Scientific Computing Center (NERSC), and Stanford changed that situation. They developed the first 15-petaflop deep-learning software. They demonstrated its ability to handle large data sets via test runs on the Cori supercomputer.
Using machine learning techniques on supercomputers, scientists could extract insights from large, complex data sets. Powerful instruments, such as accelerators, produce massive data sets. The new software could make the world’s largest supercomputers able to fit such data into deep learning uses. The resulting insights could benefit Earth systems modeling, fusion energy, and astrophysics.
Machine learning techniques hold potential for enabling scientists to extract valuable insights from large, complex data sets being produced by accelerators, light sources, telescopes, and computer simulations. While these techniques have had great success in a variety of commercial applications, their use in high performance computing for science has been limited because existing tools were not designed to work with the terabyte- to petabyte-sized data sets found in many science domains.
To address this problem a collaboration among Intel, the National Energy Research Scientific Computing Center, and Stanford University has been working to solve problems that arise when using deep learning techniques, a form of machine learning, on terabyte and petabyte data sets. The team developed the first 15-petaflop deep-learning software. They demonstrated its scalability for data-intensive applications by executing a number of training runs using large scientific data sets. The runs used physics- and climate-based data sets on Cori, a supercomputer located at the National Energy Research Scientific Computing Center. They achieved a peak rate between 11.73 and 15.07 petaflops (single-precision) and an average sustained performance of 11.41 to 13.47 petaflops. (A petaflop is million billion calculations per second.)