A global consortium of scientists from federal laboratories, research institutes, academia, and industry has formed to address the challenges of building large-scale artificial intelligence (AI) systems and advancing trustworthy and reliable AIfor scientific discovery. The Trillion Parameter Consortium (TPC) brings together teams of researchers engaged in creating large-scale generative AI models to address key challenges in advancing AI for science. These challenges include developing scalable model architectures and training strategies, organizing, and curating scientific data for training models; optimizing AI libraries for current and future exascale computing platforms; and developing deep evaluation platforms to assess progress on scientific task learning and reliability and trust.

“At our laboratory and at a growing number of partner institutions around the world, teams are beginning to develop frontier AI models for scientific use and are preparing enormous collections of previously untapped scientific data for training.” — Rick Stevens, Argonne associate laboratory director for computing, environment and life sciences

Toward these ends, TPC will:

  • Build an open community of researchers interested in creating state-of-the-art large-scale generative AI models aimed broadly at advancing progress on scientific and engineering problems by sharing methods, approaches, tools, insights, and workflows.
  • Incubate, launch, and coordinate projects voluntarily to avoid duplication of effort and to maximize the impact of the projects in the broader AI and scientific community.
  • Create a global network of resources and expertise to facilitate the next generation of AI and bring together researchers interested in developing and using large-scale AI for science and engineering.

The consortium has formed a dynamic set of foundational work areas addressing three facets of the complexities of building large-scale AI models:

  • Identifying and preparing high-quality training data, with teams organized around the unique complexities of various scientific domains and data sources.
  • Designing and evaluating model architectures, performance, training, and downstream applications.
  • Developing crosscutting and foundational capabilities such as innovations in model evaluation strategies with respect to bias, trustworthiness, and goal alignment, among others.

TPC aims to provide the community with a venue in which multiple large model-building initiatives can collaborate to leverage global efforts, with flexibility to accommodate the diverse goals of individual initiatives. TPC includes teams that are undertaking initiatives to leverage emerging exascale computing platforms to train LLMs — or alternative model architectures — on scientific research including papers, scientific codes, and observational and experimental data to advance innovation and discoveries.

Trillion parameter models represent the frontier of large-scale AI with only the largest commercial AI systems currently approaching this scale.

Training LLMs with this many parameters requires exascale class computing resources, such as those being deployed at several U.S. Department of Energy (DOE) national laboratories and multiple TPC founding partners in Japan, Europe, and elsewhere. Even with such resources, training a state-of-the-art one trillion parameter model will require months of dedicated time—intractable on all but the largest systems. Consequently, such efforts will involve large, multi-disciplinary, multi-institutional teams. TPC is envisioned as a vehicle to support collaboration and cooperative efforts among and within such teams.

“At our laboratory and at a growing number of partner institutions around the world, teams are beginning to develop frontier AI models for scientific use and are preparing enormous collections of previously untapped scientific data for training,” said Rick Stevens, associate laboratory director of computing, environment and life sciences at DOE’s Argonne National Laboratory and professor of computer science at the University of Chicago. ​“We collaboratively created TPC to accelerate these initiatives and to rapidly create the knowledge and tools necessary for creating AI models with the ability to not only answer domain-specific questions but to synthesize knowledge across scientific disciplines.”

The founding partners of TPC are from the following organizations (listed in organizational alphabetical order, with a point-of-contact): 

  • AI Singapore: Leslie Teo
  • Allen Institute For AI: Noah Smith
  • AMD: Michael Schulte
  • Argonne National Laboratory: Ian Foster
  • Barcelona Supercomputing Center: Mateo Valero Cortes
  • Brookhaven National Laboratory: Shantenu Jha
  • CalTech: Anima Anandkumar
  • CEA: Christoph Calvin
  • Cerebras Systems: Andy Hock
  • CINECA: Laura Morselli
  • CSC - IT Center for Science: Per Öster
  • CSIRO: Aaron Quigley
  • ETH Zürich: Torsten Hoefler
  • Fermilab National Accelerator Laboratory: Jim Amundson
  • Flinders University: Rob Edwards
  • Fujitsu Limited: Koichi Shirahata
  • HPE: Nic Dube
  • Intel: Koichi Yamada
  • Juelich Supercomputing Center: Thomas Lippert
  • Kotoba Technologies, Inc.: Jungo Kasai
  • LAION: Jenia Jitsev
  • Lawrence Berkeley National Laboratory: Stefan Wild
  • Lawrence Livermore National Laboratory: Brian Van Essen
  • Leibniz Supercomputing Centre: Dieter Kranzlmüller
  • Los Alamos National Laboratory: Jason Pruet
  • Microsoft: Shuaiwen Leon Song
  • National Center for Supercomputing Applications: Bill Gropp
  • National Institute of Advanced Industrial Science and Technology (AIST): Yoshio Tanaka
  • National Renewable Energy Laboratory: Juliane Mueller
  • National Supercomputing Centre, Singapore: Tin Wee Tan
  • NCI Australia: Jingbo Wang
  • New Zealand eScience Infrastructure: Nick Jones
  • Northwestern University: Pete Beckman
  • NVIDIA: Giri Chukkapalli
  • Oak Ridge National Laboratory: Prasanna Balaprakash
  • Pacific Northwest National Laboratory: Neeraj Kumar
  • Pawsey Institute: Mark Stickells
  • Princeton Plasma Physics Laboratory: William Tang
  • RIKEN: Makoto Taiji
  • Rutgers University: Shantenu Jha
  • SambaNova: Marshall Choy
  • Sandia National Laboratories: John Feddema
  • Seoul National University: Jiook Cha
  • SLAC National Accelerator Laboratory: Daniel Ratner
  • Stanford University: Sanmi Koyejo
  • STFC Rutherford Appleton Laboratory, UKRI: Jeyan Thiyagalingam
  • Texas Advanced Computing Center: Dan Stanzione
  • Thomas Jefferson National Accelerator Facility: Malachi Schram
  • Together AI: Ce Zhang
  • Tokyo Institute of Technology: Rio Yokota
  • Université de Montréal: Irina Rish
  • University of Chicago: Rick Stevens
  • University of Delaware: Ilya Safro
  • University of Illinois Chicago: Michael Papka
  • University of Illinois Urbana-Champaign: Lav Varshney
  • University of New South Wales: Tong Xie
  • University of Tokyo: Kengo Nakajima
  • University of Utah: Manish Parashar
  • University of Virginia: Geoffrey Fox

TPC contact: Charlie Catlett

Learn more at tpc.dev..