For customers looking to deploy an advanced NFS Storage Solution (NSS) configuration that has higher aggregate capacity and better aggregate performance, the Extra Large or XL configuration with two mount points is now an available option. This configuration increases the total usable capacity up to 160TB (across two file systems) while continuing to provide the high availability and performance of the existing configurations. The XL configuration builds upon the existing Dell NFS Storage Solution with High Availability (NSS-HA) Large configuration. Details on the NSS-HA solution are available at http://content.dell.com/us/en/highered/d/business~solutions~whitepapers~en/Documents~dell-hpc-nssha-sg.pdf.aspx This blog post describes the XL configuration, performance, and HA functionality. For an introductory blog on NSS-HA, refer to the overview blog post. Figure 1 shows the XL configuration.
The XL configuration has the following components:
Compared with NSS-HA Medium and Large configurations, we have one more MD3200 storage array to manage for the XL configuration. To make our HA solution simple and consistent, our strategy was to introduce two active/passive pairs for the XL configuration to implement the HA feature:
When server 1 suffers a catastrophic failure, file system 1 originally hosted by it will fail over to server 2. At that time, server 2 hosts both file systems. In other words, there are two active NFS mount points exported by server 2. This strategy has been verified in our lab. The pros and cons are: Pros:
Two * Large configurations = XL configuration.
Two * Large configurations = XL configuration.
Please note that the results presented in the following figures are aggregate throughput. For example, the “4 concurrent test case” means that there are four clients concurrently accessing each file system, and the result is the aggregate throughput of the four-client read or write. That is, for NSS-HA Large configuration, there are four client threads accessing one file system, while there are eight client threads in total concurrently accessing the two file systems for NSS-HA XL configuration (four client threads per file system). Figure 2 shows the test bed used in our evaluation. The test bed consists of 64 PowerEdge R410 servers in an HPC cluster used as I/O clients. The clients can access the two NFS servers via either 10GbE or InfiniBand. The NFS servers are setup in an HA-cluster and provide access to the backend PowerVault storage arrays.
Let’s first answer our first question: Is there any performance difference between the single-server and two-server cases? Answer: Yes, there are significant read and write performance differences between the single-server and two-server cases with the increase of the number of concurrent clients. From figure 3 and figure 4, it is obvious that when there are a small number of concurrent clients per file system (less than or equal to 4), the peak performance between the two cases are not significantly different (less than 20% for write, 10% for read). However, as the number of concurrent clients per file system increases, we observed a difference of almost 40% for writes and 60% for reads between the two cases with 32 concurrent clients. It is reasonable to observe such differences. When there are a small number of clients accessing the storage system, even a single server can provide sufficient system resources including CPU, memory, and I/O bandwidth to process the incoming I/O requests. Thus, the performance difference between the single-server and two-server cases is small. As the number of concurrent clients increases, the system resources in a single server become insufficient to host so many concurrent I/O requests, but the aggregate system resources for the two-server case are still sufficient to host those concurrent I/O requests. As a result, a significant performance difference is observed between the two cases. Now, let’s answer our second question: what’s the performance gain if we compare the aggregate throughput between NSS-HA XL and large configuration? Answer: For the two-server case, the aggregate read and write throughputs are close to twice of the performance of NSS-HA large configuration, as the system resource is always sufficient in that case. While for the single-server case, the aggregate read and write throughputs are close to twice of the performance of NSS-HA Large configuration when there are a small number of concurrent clients. As the number of concurrent clients increases, the performance gain is smaller than the two-server case due to the limited -system resources in a single server.
We observed the similar performance behavior as the case of IB, although the similar behavior is due to a different reason: the network bandwidth is the limiting factor for the single-server case. Figure 5 and 6 show the performance results.
To verify the failover functionality for NSS-HA XL configuration, we conducted different tests with various initial states and different failures. Table 1 lists our results.