When it comes time to add high performance storage to your HPC cluster, file systems can become a critical limiting factor that has to be addressed. The cost of the storage expansion can sometimes be as expensive as the HPC cluster itself. This is where BeeGFS (formerly FhGFS) and other parallel file systems come in as a cost-effective, highly scalable, high performance alternative to traditional storage platforms.
BeeGFS is a leading parallel cluster file system. As a parallel file system, BeeGFS spreads user data across multiple servers in an effort to simplify I/O intensive workloads. By increasing the number of servers and disks in a system, users can scale the performance and capacity of their storage to the appropriate scale necessary. BeeGFS makes it possible to start with a small storage cluster and grow it into an enterprise-wide, multi-Petabyte system (comprising hundreds or thousands of nodes) in the most seamless way possible.
Understanding BeeGFS Architecture
BeeGFS utilizes metadata servers to coordinate file placement and striping among storage servers, informing the clients about file details when necessary. When users access their file content, they directly contact the storage servers to run workloads and communicate with multiple servers simultaneously, giving applications truly parallel access to file data. Metadata is therefore distributed across multiple servers to minimize data access latency. A significant benefit of BeeGFS is that since it is a software-defined storage solution, applications don’t need to be rewritten or customized. The architecture of BeeGFS is built for ultimate user convenience and flexibility. Users of BeeGFS experience increased productivity and fast data retrieval.
Depending on how users are accessing the BeeGFS file system, a gateway node may function as an intermediary between clients and the HPC cluster file system. The gateway node will allow both Windows and Linux users simple access to the BeeGFS file system.
Comparing BeeGFS to other Parallel File Systems
When compared to other popular parallel file systems like Gluster and Lustre, the performance of BeeGFS in terms of simplicity and functionality is unparalleled. From our experience, BeeGFS offers the best overall combination of features, performance, and scalability, combined with ease of deployment and management. Although alternative parallel files systems offer their own benefits in terms of simplicity (Gluster) or performance/scalability (Lustre), it’s BeeGFS that allows us to deploy turnkey storage solutions for our customers. Note: While in many cases we recommend BeeGFS, the PSSC Labs team is also able to utilize other parallel file systems, if required.
To really see the power of BeeGFS in action, the PSSC Labs team has run benchmark tests using FIO (flexible IO) benchmark software. We configured a small 10 node HPC Cluster writing to a 9 node flash storage node based BeeGFS cluster. The results were very impressive. With over 50 GB/sec sustained reads & writes and over 10Mil IOPS, BeeGFS can certainly deliver the type of data transfer rates necessary for the most demanding HPC environments. As expected, the first machines run slightly slower than others, an unavoidable event due to the additional overhead incurred as the management node. Overall, performance benchmarks exceeded our highest expectations. Benchmarks can be found in image below.
Creating Your Parallux Storage Clusters with BeeGFS
The PSSC Labs Parallux Storage Cluster is delivered fully application-optimized with popular parallel file systems, including BeeGFS. This system is built for maximum IOPS performance with the ability to scale to your unique requirements including advanced Artificial Intelligence and HPC workloads. Our Parallux Storage Cluster systems have been deployed at some of the most esteemed higher education institutions, government agencies, and commercial organizations around the country, including Villanova, Cal Poly San Luis Obispo, the U.S. Navy, the U.S. Army, and many more. Like the entire PSSC Labs product line, our Parallux Storage Cluster is delivered to you production-ready.
The Parallux Storage Cluster will always be built exactly to your unique specifications, particularly regarding metadata nodes, data nodes, and gateway nodes. Our metadata nodes are small, lightweight boxes comprised of SSDs or NVMEs, designed for fast data seek times. Data nodes are larger storage spaces that we typically configure with RAID 6 to ensure the protection of user data. Additionally, we can also add gateway nodes, perfect for users that will be accessing their system with Linux or Windows operating systems. Our solutions are truly customizable from the ground up, so each component, even beyond those mentioned above, can be designed exactly for your needs. To learn more about the Parallux Storage Cluster, visit us here: https://pssclabs.com/products/storage-cluster/ or by calling us directly at (949) 380-7288.