9 GPUs to Compare when Building an HPC System – Updated September 2020

GPU PSSC Labs

Graphics Processing Units (GPUs) are built to speed up the process by which a computer creates and renders images intended for output to a display device. In addition to this, GPUs also ensure that graphics are displayed properly and work to enhance graphic performance. GPUs live in all kinds of technology hardware, from computers to laptops to cellphones and gaming consoles.

When it comes to working with a vendor to build a custom high performance system, selecting each component specifically for your needs is extremely important. We work with our clients to determine their needs and help them pick each component specifically for their workloads, but it’s always great when our clients have some familiarity with the pieces of their system. Below we breakdown some important things that we look at when helping our clients pick the right GPU.

NVIDIA has recently released a new series of graphics cards, the GeForce RTX 30 Series. This series is the second generation of the NVIDIA RTX PC gaming platform, and regarded as the fastest discrete graphics memory graphics card on the market today. Built with enhanced Ray Tracing Cores and Tensor Cores, new streaming multiprocessors, and high-speed G6 memory, this structure gives users the power needed to rip through the most demanding games.

GPU Options to Consider

  1. GeForce RTX™ 3070

    Base/Boost Core Clock: 1400/1700 Mhz
    Bandwidth (GB/s): 1000
    Memory Size (GB): 8
    NVIDIA CUDA Cores: 5888

  2. GeForce RTX™ 3080

    Base/Boost Core Clock: 1440/1710 Mhz
    Bandwidth (GB/s): 1000
    Memory Size (GB): 10
    NVIDIA CUDA Cores: 8704

  3. GeForce RTX™ 3090

    Base/Boost Core Clock: 1500/1730 Mhz
    Bandwidth (GB/s): 1000
    Memory Size (GB): 24
    NVIDIA CUDA Cores: 10496

  4. NVIDIA A100

    NVIDIA GPU for HPC built the A100 GPU specifically for scientific computing, graphics, and data analytics in data centers. Even having launched this product amid a global pandemic, NVIDIA CEO Jensen Huang stated, “It’s our best data center GPU ever made, and it capitalizes on nearly a decade of our data center experience.”
    Researchers and engineers alike need to be able to analyze, visualize, and turn massive datasets into actionable insights. The problem is that these datasets often are scattered across multiple servers and therefore, the entire process gets bogged down. With the A100, the right amount of compute power, memory, and scalability is delivered to help organizations tackle their massive workloads.
    The NVIDIA A100 has more than 54 billion transistors. It’s the world’s largest 7nm processor. A100 can also efficiently scale to thousands of GPUs or, with NVIDIA Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes.

    Base/Boost Core Clock: 1430/1480 MHz
    Bandwidth (GB/s): 1555
    Memory Size (GB): 16 or 32

  5. NVIDIA V100

    The NVIDIA V100 GPU is powered by NVIDIA Volta architecture and comes in both 16 and 32 GB configurations. This GPU also offers up to 100 CPUs, meaning data scientists, researchers, and engineers can now spend much less timing working to optimize memory usage and more time on their important artificial intelligence efforts. Even better, with 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraFLOPS (TFLOPS) barrier of deep learning performance.
    Base/Boost Core Clock: 1246/1380 Mhz
    Bandwidth (GB/s): 897
    Memory Size (GB): 16

  6. GeForce 2080 Ti

    GeForce GPUs are another family of GPUs developed by NVIDIA. These GPUs are popular in gaming systems due to their ultra-fast processing speeds. In fact, this GPU in particular is considered the world’s ultimate gaming GPU. It’s release into the market marked the introduction of NVIDIA’s Turing microarchitecture, the first in the industry to implement real-time hardware ray tracing in a consumer product. With a base core clock speed of 1380 MHz, the GeForce 2080 Ti is quite powerful and delivers a solid framerate, even when advanced features are enabled.

    Base/Boost Core Clock: 1350/1545 Mhz
    Memory (MT/s): 14000
    Bandwidth (GB/s): 616

  7. Tesla T4

    The Tesla T4 GPU was launched in September of 2018. NVIDIA built this GPU in an effort to accelerate workloads specifically in the HPC, deep learning, machine learning, data analytics, and graphics spaces. Tesla products are primarily used in simulations and large scale calculations, as well as for high end image generation in professional and scientific fields.
    The Tesla T4 is a rather large chip but a single-slot card, so it doesn’t require an additional power connector, as its power draw is rated at 70W maximum. Unlike the fully unlocked GeForce RTX 2080 SUPER, which uses the same GPU but has all 3072 shaders enabled, NVIDIA has disabled some shading units on the Tesla T4 to reach the product’s target shader count. It features 2560 shading units, 160 texture mapping units, and 64 ROPs. Also included are 320 tensor cores which help improve the speed of machine learning applications.

    Base/Boost Core Clock: unknown/1455 Mhz
    Bandwidth (GB/s): 900
    Memory Size (GB): 16 or 32

  8. Quadro RTX 6000

    Quadro has long been the de facto standard for enterprise desktop graphics for digital designers and artists, but the Quadro RTX 6000 also stands out as the world’s first ray tracing GPU. With the launch of the Quadro RTX 6000, significant graphic advancements have been brought to professional workflows. In general, though dependent on application, this GPU is faster than the GeForce 2080 Ti, making it arguably the fastest graphics card in any environment.

    Base/Boost Core Clock: 1440/1770 Mhz
    Memory (MT/s): 14000
    Bandwidth (GB/s): 672
    Memory Size (GB): 24

  9. Quadro RTX 8000

    Optimized for workstation applications like CAD and 3D modeling, artists and designers are able to push the boundaries of possibilities in their line of work with the Quadro RTX 8000 graphics card. This Quadro RTX product is built to work with the largest and most complex ray tracing, deep learning, and visual computing workloads.

    Base/Boost Core Clock: 1440/1770 Mhz
    Memory (MT/s): 14000
    Bandwidth (GB/s): 672
    Memory Size (GB): 48

    As a custom manufacturer of high performance computing and big data systems that use GPUs and graphics cards like those listed above, we know how important it is to build your system with the right components. That’s why we work closely with our clients to understand their workflow and data needs, then build up the system that will work perfectly for their organization.

    If you have questions about these components or any of our other products, please feel free to contact us at 4sales@pssclabs.com