Join Georgia Tech at SC12!

Georgia Tech HPC faculty, students and staff will be participating in paper presentations, tutorials, panels and birds-of-a-feather, among other activities, at SC12. This site provides a quick snapshot of who we are, what we’re doing, and how to contact us or join our conversations. Visit us at the GT booth #2043 or at any one of the activities listed below!

Check back here for updates during the conference…

Birds of a Feather

Fifth Graph500 List
Session Leader(s): David A. Bader (Georgia Tech), Richard Murphy, Marc Snir
ABSTRACT: Data intensive applications represent increasingly important workloads but are ill suited for most of today’s machines. The Graph500 has demonstrated the challenges of even simple analytics. Backed by a steering committee of over 30 international HPC experts from academia, industry, and national laboratories, this effort serves to enhance data intensive workloads for the community. This BOF will unveil the fifth Graph500 list, and delve into the specification for the second kernel. We will further explore the new energy metrics for the Green Graph500, and unveil the first results.

Tuesday, Nov. 13, 12:15-1:15 p.m.
Room: 255-BC

Scientific Application Performance in Heterogeneous Supercomputing Clusters
Session Leaders: Wen-mei Hwu, Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory), Nacho Navarro

ABSTRACT: Many current and upcoming supercomputers are heterogeneous CPU-GPU computing clusters. Accordingly, applications groups are porting scientific applications and libraries to these heterogeneous supercomputers. Industry vendors have also been actively collaborating with system teams as well as application groups. Although these teams have been working on diverse applications and targeting different systems, countless shared lessons and challenges exist. With this BOF, we aim to bring together system teams and application groups to discuss their experiences, results, lessons, and challenges to date. We hope to form a collaborative community moving forward.

Tuesday, Nov. 13, 5:30-7 p.m.
Room: 155-F

Cyber Security’s Big Data, Graphs, and Signatures
Session Leader: Daniel M. Best
Panelists: David A. Bader (Georgia Tech), TBA

ABSTRACT: Cyber security increases in complexity and network connectivity every day. Today’s problems are no longer limited to malware using hash functions. Interesting problems, such as coordinated cyber events, involve hundreds of millions to billions of nodes and similar or more edges. Nodes and edges go beyond single attribute objects to become multivariate entities depicting complex relationships with varying degree of importance. To unravel cyber security’s big data, novel and efficient algorithms are needed to investigate graphs and signatures. We bring together domain experts from various research communities to talk about current techniques and grand challenges being researched to foster discussion.

Tuesday, Nov. 13, 5:30-7 p.m.
Room: 250-AB


Graph Analytics in Big Data
Session Leaders: Amar Shan, Shoaib Mufti
Panelists: David A. Bader (Georgia Tech), TBA

ABSTRACT: Data intensive computing, popularly known as Big Data, has grown enormously in importance over the past 5 years. However, most data intensive computing is focused on conventional analytics: searching, aggregating and summarizing the data set. Graph analytics goes beyond conventional analytics to search for patterns of relationships, a capability that has important application in many HPC areas ranging from climate science to healthcare and life sciences to intelligence. The purpose of this BOF is to bring together practitioners of graph analytics. Presentations and discussions will include system architectures and software designed specifically for graph analytics; applications; and benchmarking.

Thursday, Nov. 15, 12:15-1:15 p.m.
Room: 255-EF

 

Workshops

2nd International Workshop on Network-aware Data Management
Keynote Speaker: Karsten Schwan (Georgia Tech)
Title: Data-intensive and Cloud Applications in Large-scale DataCenter Systems

ABSTRACT: Data-intensive applications have been evolving from their original focus on offline mining of business data into broader domains, including the online inspection and analysis of large-scale web data used for rapid response to current conditions. Processing such ‘data in motion’ brings new challenges to the domain of data intensive computing. This talk will articulate some of those challenges, present representative solutions, and describe potential avenues for future work, in lieu of several constraints seen for this broad class of datacenter applications, including their use of shared underlying datacenter infrastructure, their support by datacenter operators, and the time-constrained operation inherent in their execution. Future research opportunities in this space include application acceleration via GPGPUs as well as new ways to enrich the open source infrastructures used to run these codes.

Sunday, Nov. 11, 9:15 a.m.
Room: 155-A

Broader Engagement, HPC Educator

HPC: suddenly relevant to mainstream CS education?

Presenter: Matthew Wolf (Georgia Tech)

ABSTRACT: Significant computer science curriculum initiatives are underway, with parallel and distributed computing and the impacts of multi-/many-core infrastructures and ubiquitous cloud computing playing a pivotal role. The developing guidelines will impact millions of students worldwide, and many emerging geographies are looking to use them to boost competitive advantage. Does this mainstream focus on ubiquitous parallelism draw HPC into the core of computer science, or does it make HPC’s particular interests more remote from the cloud/gaming/multi-core emphasis? Following the successful model used at SC10 & SC11, the session will be highly interactive. An initial panel will lay out some of the core issues, with experts from multiple areas in education and industry. Following this will be a lively, moderated discussion to gather ideas from participants about industry and research needs as well as the role of academia in HPC.

Wednesday, Nov. 14, 1:30-5 p.m.
Room: 355-A

Tutorials

Scalable Heterogeneous Computing on GPU Clusters

Presenters: Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory), Allen Malony, Philip Roth, Kyle Spafford, Jeremy Meredith

ABSTRACT: This tutorial is suitable for attendees with an intermediate-level in parallel programing in MPI, and with some background in GPU programming in CUDA or OpenCL; it will provide a comprehensive overview on the optimization techniques to port, analyze, and accelerate applications on scalable heterogeneous computing systems using MPI and OpenCL, CUDA, and directive-based compilers using OpenACC. First, we will review our methodology and software environment for successfully identifying and selecting portions of applications to accelerate with a GPU, motivated with several application case studies. Second, we will present an overview of several performance and correctness tools, which provide performance measurement, profiling, and tracing information about applications running on these systems. Third, we will present a set of best practices for optimizing these applications: GPU and NUMA optimization techniques, optimizing interactions between MPI and GPU programming models. A hands-on session will be conducted on the NSF Keeneland System, after each part to give participants the opportunity to investigate techniques and performance optimizations on such a system. Existing tutorial codes and benchmark suites will be provided to facilitate individual discovery. Additionally, participants may bring and work on their own applications.

Sunday, Nov. 11, 8:30 a.m. – 5 p.m.
Room: 355-E

Papers

Early Evaluation of Directive-Based GPU Programming Models for Productive Exascale Computing
Authors: Seyong Lee, Jeffrey S. Vetter (Georgia Tech and Oak Ridge National Laboratory)

ABSTRACT: Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.

Tuesday, Nov. 13, 1:30-2 p.m.
Room: 355-EF

Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors (Finalist: Best Paper Award)
Authors: Jongsoo Park, Ping Tak Peter Tang, Mikhail Smelyanskiy, Daehyun Kim, Thomas Benson (Georgia Tech)

ABSTRACT: Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR) via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture supports; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on a Sandy Bridge-based cluster, equipped with Intel Knights Corner coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.

Tuesday, Nov. 13, 2:30 − 3 p.m.
Room: 255-BC

Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool
Authors: Dong Li, Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory), Weikuan Yu

ABSTRACT: Extreme-scale scientific applications are at a significant risk of being hit by soft errors on future supercomputers. To better understand soft error vulnerabilities in scientific applications, we have built an empirical fault injection and consequence analysis tool – BIFIT – to evaluate how soft errors impact applications. BIFIT is designed with capability to inject faults at specific targets: execution point and data structure. We apply BIFIT to three scientific applications and investigate their vulnerability to soft errors. We classify each application’s individual data structures in terms of their vulnerabilities, and generalize these classifications. Our study reveals that these scientific applications have a wide range of sensitivities to both the time and the location of a soft error. Yet, we are able to identify relationships between vulnerabilities and classes of data structures. These classifications can be used to apply appropriate resiliency solutions to each data structure within an application.

Wednesday, Nov. 14, 1:30-2 p.m.
Room: 255-EF

Optimizing the Computation of N-Point Correlations on Large-Scale Astronomical Data
Authors: William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Richard Vuduc, Edmond Chow, Alexander G. Gray (Georgia Tech), Andrew J. Connolly

ABSTRACT: The n-point correlation functions (npcf) are powerful statistics that are widely used for data analyses in astronomy and other fields. These statistics have played a crucial role in fundamental physical breakthroughs, including the discovery of dark energy. Unfortunately, directly computing the npcf at a single value requires $\bigO{N^n}$ time for $N$ points and values of $n$ of 2, 3, 4, or even larger. Astronomical data sets can contain billions of points, and the next generation of surveys will generate terabytes of data per night. To meet these computational demands, we present a highly-tuned npcf computation code that show an order-of-magnitude speedup over current state-of-the-art. This enables a much larger 3-point correlation computation on the galaxy distribution than was previously possible. We show a detailed performance evaluation on many different architectures.

Thursday, Nov. 15, 11-11:30 a.m.
Room: 255-EF

Aspen – A Domain Specific Language for Performance Modeling
Authors: Kyle L. Spafford, Jeffrey S. Vetter (Georgia Tech and Oak Ridge National Laboratory)

ABSTRACT: We present a new approach to analytical performance modeling using Aspen, a domain specific language. Aspen (Abstract Scalable Performance Engineering Notation) fills an important gap in existing performance modeling techniques and is designed to enable rapid exploration of new algorithms and architectures. It includes a formal specification of an application’s performance behavior and an abstract machine model. We provide an overview of Aspen’s features and demonstrate how it can be used to express a performance model for a three dimensional Fast Fourier Transform. We then demonstrate the composability and modularity of Aspen by importing and reusing the FFT model in a molecular dynamics model. We have also created a number of tools that allow scientists to balance application and system factors quickly and accurately.

Thursday, Nov. 15, 11:30 a.m. – 12 p.m.
Room: 355-EF

Cray Cascade – A Scalable HPC System Based on a Dragonfly Network
Session Chair: Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory)
Authors: Gregory Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Michael Higgins, James Reinhard

ABSTRACT: Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly network topology. We describe the structure of the system, its Dragonfly network the routing algorithms, and a set of advanced features supporting both mainstream high performance computing applications and emerging global address space programing models. With a combination of performance results from prototype systems and simulation data for large systems, we demonstrate the value of the Dragonfly topology and the benefits obtained through extensive use of adaptive routing.

Thursday, Nov. 15, 3:30-4 p.m.
Room: 255-BC

GRAPE-8 – An Accelerator for Gravitational N-Body Simulation with 20.5GFLOPS/W Performance
Session Chair: Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory)
Authors: Junichiro Makino, Hiroshi Daisaka

ABSTRACT: In this paper, we describe the design and performance of GRAPE-8 accelerator processor for gravitational N-body simulations. It is designed to evaluate gravitational interaction with cutoff between particles. The cutoff function is useful for schemes like TreePM or Particle-Particle Particle-Tree, in which gravitational force is divided to short-range and long-range components. A single GRAPE-8 processor chip integrates 48 pipeline processors. The effective number of floating-point operations per interaction is around 40. Thus the peak performance of a single GRAPE-8 processor chip is 480 Gflops. A GRAPE-8 processor card houses two GRAPE-8 chips and one FPGA chip for PCI-Express interface. The total power consumption of the board is 46W. Thus, theoretical peak performance per wattage is 20.5 Gflops/W. The effective performance of the total system, including the host computer, is around 5Gflops/W. This is more than a factor of two higher than the highest number in the current Green500 list.

Thursday, Nov. 15, 4-4:30 p.m.
Room: 255-BC

SGI UV2 – A Fused Computation and Data Analysis Machine
Session Chair: Jeffrey Vetter (Georgia Tech and Oak Ridge National Laboratory)
Authors: Gregory M. Thorson, Michael Woodacre

ABSTRACT: UV2 is SGI’s 2nd generation Data Fusion system. UV2 was designed to meet the latest challenges facing users in computation and data analysis. Its unique ability to perform both functions on a single platform enables efficient, easy to manage workflows. This platform has a hybrid infrastructure, leveraging the latest Intel EP processors to provide industry leading computation. Due to its high bandwidth, extremely low latency NumaLink6 interconnect, plus vectorized synchronization and data movement, UV2 provides industry leading data intensive capability. It supports a single operating system (OS) image up to 64TB and 4K threads. Multiple OS images can be deployed on a single NL6 fabric, which has a single flat address space up to 8PB and 256K threads. These capabilities allow for extreme performance on a broad range of programming models and languages including: OpenMP, MPI, UPC, CAF, and SHMEM. The architecture, implementation, and performance are detailed.

Thursday, Nov. 15, 4:30-5 p.m.
Room: 255-BC

Technical Committees

Technical Papers

  • Jeffrey S. Vetter, SC12 Technical Papers Co-Chair
  • Edmond Chow, Technical Papers, Algorithms Chair
  • Rich Vuduc, Technical Papers, Performance, Energy, Dependability, Program Committee Member
  • Ada Gavrilovska, Technical Papers, Systems Software Program Committee Member

Awards

  • David A. Bader, George Michael Memorial HPC Ph.D. Fellowship Committee Member

Workshops

  • David A. Bader, Program Committee, IA3 Workshop on Irregular Applications: Architectures & Algorithms

Technical Papers

  • Karsten Schwan (Systems Software)
  • Rich Vuduc (Performance)

Panels

  • David A. Bader (Doctoral Showcase)

Awards

  • David A. Bader (George Michael HPC Fellowship)

Tutorials

  • Jeffrey Vetter
  • Matthew Wolf
  • Joel Saltz (Emory University and Georgia Tech)