The Role of Low Latency File Access in Accelerating AI Workloads

The use of artificial intelligence (AI) is rapidly moving from the lab into the mainstream. The reason? Businesses believe AI can deliver operational cost savings, improve decision making, enhance customer interactions, speed data mining, and boost data security. As such, the number of companies using AI has grown by 270% in the past four years. As a result, organizations need to design high-performance computing architectures for AI workloads. 

Supporting AI efforts requires high-performance computing (HPC) capabilities to perform rapid analysis, tune neural net models, and conduct machine learning by examining large datasets. Fortunately, HPC requirements for AI are similar to other compute-intensive applications (e.g., Big Data analytics, forecasting, modeling, and finite element simulations) that are also increasingly being introduced into the enterprise today. That means there are many high-performance core compute, storage, and networking technologies available, which have made their way from supercomputing centers and academic labs into the enterprise.

However, several factors determine what type of infrastructure elements are needed for specific AI applications. Many AI efforts need speedy execution. That is the case for AI applications that do things like power autonomous systems, engage customers in real-time via chat or natural language, or identify outliners to prevent fraud. Such applications must run their analyses and get actionable information in real-time.

Selecting the right technology solution

To achieve the necessary performance, AI deployments typically make use of expensive GPU processing arrays. For workloads to run cost-effectively, there is a need to support high data rates to keep the processors satiated. That, in turn, dictates the use of ultrafast interconnect technology and tightly coupled high-performance storage.

Looking deeper, some of the GPU options include:

  • NVIDIA Tesla P100 GPU accelerators for PCIe based servers. Tesla P100 with NVIDIA NVLink delivers up to a 50X performance boost for the top HPC applications and all deep learning frameworks.
  • NVIDIA Tesla V100 Tensor Core, powered by NVIDIA Volta architecture, is a data center GPU to accelerate HPC and AI workloads.
  • NVIDIA T4 GPU, which accelerates cloud workloads, is used for HPC, deep learning training, and inference, machine learning, and data analytics.
  • GEFORCE RTX 2080 Ti is NVIDIA’s flagship graphics card based on NVIDIA Turing™ GPU architecture and ultra-fast GDDR6 memory.

Systems that use these and other GPUs to accelerate AI workloads need high-performance interconnect technologies to make cost-effective use of their performance capabilities. The internet technologies of choice include InfiniBand, Omni-Path, and remote direct memory access (RDMA). 

What are their capabilities?

  • InfiniBand is a computer-networking communications standard used in HPC systems that features very high throughput and very low latency. It is used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems.
  • Omni-Path(also Omni-Path Architecture or OPA) is a high-performance communication architecture from Intel. It delivers low communication latency and high throughput.
  • RDMA is an industry-standard that supports what is known as zero-copy networking by enabling the network adapter to move data directly to or from the application. This eliminates both the operating system and CPU involvement, so it is exceptionally faster than other solutions.

The plethora of GPU and interconnect technologies choices is a double-edged sword. On the plus side, the right combination will produce an optimized system to accelerate a specific AI application. On the downside, many businesses do not have expertise in these technologies and need help selecting the best solution for their application and optimizing a system’s performance.

When selecting elements in a system, it’s critical to determine which interconnect solution provides low latency file access to help AI and HPC workloads achieve higher performance and scalability.

These challenges exist when trying to configure a system for any HPC application, but the issues are especially important with AI applications. They need fast access to data to reduce training time in a deep learning scenario but also in supporting fast decision making in production environments.

Determining which is best for you

So how do you determine which storage is best for an AI application? Beyond basics like determining cost/performance issues when using hard drives versus solid-state and flash drives, there are storage file system and architecture issues to consider. Do you use a distributed architecture? Do you need a parallel file system? The bottom-line: AI applications need storage solutions that offer the highest throughput, lowest latency data access for CPU- and GPU-intensive AI, and HPC workloads.

In the final analysis, to optimize the running of AI workloads and make the most efficient use of expensive GPU arrays, compute solutions must bring together the right GPU, high-performance storage, and interconnect technologies. These technologies must be tightly integrated and tuned to optimize the solution’s performance when running AI workloads.

Leave a Reply

Your email address will not be published. Required fields are marked *