Scalable Machine Learning: Building Scalable Machine Learning Systems

The use of machine learning (ML) is on the rise. From a data scientist’s perspective, the computing challenge is how to scale the ingesting of more data in faster times to train machine learning algorithms, as well as how to scale processing power. Looking deeper, the issue is that ML applications parse ever-growing amounts of data requiring enormous parallel processing capabilities using large numbers of cores.

Traditional computing systems based on standard CPUs will not suffice. They cannot process the data, train the machine learning algorithms, or run the ML applications against new data in an efficient manner. In most cases, scaling legacy systems to the processing level required is too costly. Even if the investment is made, the time it takes to train and run ML applications is impractical for the needs of the business.

What’s needed is an infrastructure update that delivers the required parallel processing performance at a reasonable cost. In most cases, the best solution is one that combines multithreaded CPUs with GPUs, large memory, high-performance interconnects, and HPC storage solutions with high I/O and throughput, plus low latency features.

What’s Different?

Organizations have been scaling their systems for things like Big Data and the use of more sophisticated analytics for years. In most cases, solutions based on traditional CPU architectures were enough.

Why is machine learning different? Why do these traditional solutions not fit the bill?

Many organizations found their installed systems were hitting a wall because of the amounts of data involved and the nature of the ML algorithms. Training models took too long to run due to computing limitations.

When confronted with this problem, organizations have looked for systems that lend themselves to the requirements of ML and machine learning algorithms. Such systems often include HPC servers with greater processor performance, systems that scale up (vs. scale-out), I/O solutions with bandwidth, and accelerator technologies such as GPUs or FPGAs.

Solutions with these characteristics get to the heart of the problem for ML / Machine Learning — core starvation. CPUs are designed for serial processing. Machine learning training and applications must be done in parallel on many more cores than CPUs can provide. Accelerators overcome this problem. GPUs offer thousands of cores, and custom-designed processors (ASIC, FPGAs) complement CPU processing capabilities.

Such accelerators offer a massively parallel architecture that economically delivers the needed parallel compute performance. Going hand-in-hand with the use of accelerators, systems that can scale to meet the demands of ML applications also must have high-speed interconnects, increased memory size, and fast storage.

Technology from an Experienced Partner

PSSC Labs has delivered tens of thousands of custom-engineered HPC servers to higher education, government agencies, small/medium businesses, and large enterprise organizations across 36 countries.

PSSC Labs delivers integrated HPC / High Performance Computing solutions for ML that tightly integrate and optimize hardware and software. Its PowerServe Uniti Servers include the latest components from Intel^® and Nvidia^®.

PSSC Labs GPU options include:

NVIDIA Tesla P100 GPU accelerators for PCIe based servers. Tesla P100 with NVIDIA NVLinkdelivers up to a 50X performance boost for the top HPC applications and all deep learning frameworks.
NVIDIA Tesla V100 Tensor Core, powered by NVIDIA Volta architecture, is a data center GPU to accelerate HPC and AI workloads.
NVIDIA T4 GPU, which accelerates cloud workloads, is used for HPC, deep learning training, and inference, machine learning, and data analytics.
GEFORCE RTX 2080 Ti is NVIDIA’s flagship graphics card based on NVIDIA Turing™ GPU architecture and ultra-fast GDDR6 memory.

Systems that use these and other GPUs to scale ML workloads need high-performance interconnect technologies to make cost-effective use of their performance capabilities. Interconnect technologies available include InfiniBand, Omni-Path, and remote direct memory access (RDMA).

Most important, PSSC Labs’ family of HPC systems are Machine Learning ready systems that are purpose-built for an organization’s needs. The solutions come production ready, which is known to be critical from a data scientist’s perspective. PSSC Labs does not need to spend time on IT issues, as clients are assured the systems they are using for their ML efforts can scale to meet the demands of their applications.

To learn more about our HPC systems specifically designed to meet the need of data scientists across various industries, click the button below to schedule a meeting with one of our knowledgable Solutions Architects.