Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
CommAutoTestGroup - Blog CommAutoTestGroup 1
Custom Solutions Engineering Blog Custom Solutions Engineering 6
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Dell Cloud Blog Cloud 42
Dell Cloud OpenStack Solutions - Blog Dell Cloud OpenStack Solutions 0
Dell Lifecycle Controller Integration for SCVMM - Blog Dell Lifecycle Controller Integration for SCVMM 0
Dell Premier - Blog Dell Premier 3
Dell TechCenter TechCenter 1,853
Desktop Authority Desktop Authority 25
Featured Content - Blog Featured Content 0
Foglight for Databases Foglight for Databases 35
Foglight for Virtualization and Storage Management Virtualization Infrastructure Management 256
General HPC High Performance Computing 226
High Performance Computing - Blog High Performance Computing 35
Hotfixes vWorkspace 57
HPC Community Blogs High Performance Computing 27
HPC GPU Computing High Performance Computing 18
HPC Power and Cooling High Performance Computing 4
HPC Storage and File Systems High Performance Computing 21
Information Management Welcome to the Dell Software Information Management blog! Our top experts discuss big data, predictive analytics, database management, data replication, and more. Information Management 229
KACE Blog KACE 143
Life Sciences High Performance Computing 6
OMIMSSC - Blogs OMIMSSC 0
On Demand Services Dell On-Demand 3
Open Networking: The Whale that swallowed SDN TechCenter 0
Product Releases vWorkspace 13
Security - Blog Security 3
SharePoint for All SharePoint for All 388
Statistica Statistica 24
Systems Developed by and for Developers Dell Big Data 1
TechCenter News TechCenter Extras 47
The NFV Cloud Community Blog The NFV Cloud Community 0
Thought Leadership Service Provider Solutions 0
vWorkspace - Blog vWorkspace 510
Windows 10 IoT Enterprise (WIE10) - Blog Wyse Thin Clients running Windows 10 IoT Enterprise Windows 10 IoT Enterprise (WIE10) 3
Latest Blog Posts
  • General HPC

    HPC Applications Performance on V100

    Authors: Frank Han, Rengan Xu, Nishanth Dandapanthula.

    HPC Innovation Lab. August 2017

     

    Overview

    This is one of two articles in our Tesla V100 blog series. In this blog, we present the initial benchmark results of NVIDIA® Tesla® Volta-based V100™ GPUs on 4 different HPC benchmarks, as well as a comparative analysis against previous generation Tesla P100 GPUs. We are releasing another V100 series blog, which discusses our V100 and deep learning applications. If you haven’t read it yet, it is highly recommend to take a look here.

    PowerEdge C4130 with V100 GPU support

    The NVIDIA® Tesla® V100 accelerator is one of the most advanced accelerators available in the market right now and was launched within one year of the P100 release. In fact, Dell EMC is the first in the industry to integrate Tesla V100 and bring it to market. As was the case with the P100, V100 supports two form factors: V100-PCIe and the mezzanine version V100-SXM2. The Dell EMC PowerEdge C4130 server supports both types of V100 and P100 GPU cards. Table 1 below notes the major enhancements in V100 over P100:

    Table 1: The comparison between V100 and P100

    PCIe

    SXM2

    P100

    V100

    Improvement

    P100

    V100

    Improvement

    Architecture

    Pascal

    Volta

    Pascal

    Volta

    CUDA Cores

    3584

    5120

    3584

    5120

    GPU Max Clock rate (MHz)

    1329

    1380

    1481

    1530

    Memory Clock rate (MHz)

    715

    877

    23%

    715

    877

    23%

    Tensor Cores

    N/A

    640

    N/A

    640

    Tensor Cores/SM

    N/A

    8

    N/A

    8

    Memory Bandwidth (GB/s)

    732

    900

    23%

    732

    900

    23%

    Interconnect Bandwidth
    Bi-Directional (GB/s)

    32

    32

    160

    300

    Deep Learning (TFlops)

    18.6

    112

    6x

    21.2

    125

    6x

    Single Precision (TFlops)

    9.3

    14

    1.5x

    10.6

    15.7

    1.5x

    Double Precision (TFlops)

    4.7

    7

    1.5x

    5.3

    7.8

    1.5x

    TDP (Watt)

    250

    300

     

    V100 not only significantly improves performance and scalability as will be shown below, but also comes with new features. Below are some highlighted features important for HPC Applications:

    • Second-Generation NVIDIA NVLink™

      All four V100-SXM2 GPUs in the C4130 are connected by NVLink™ and each GPU has six links. The bi-directional bandwidth of each link is 50 GB/s, so the bi-directional bandwidth between different GPUs is 300 GB/s. This is useful for applications requiring a lot of peer-to-peer data transfers between GPUs.

    • New Streaming Multiprocessor (SM)

      Single precision and double precision capability of the new SM is 50% more than the previous P100 for both PCIe and SXM2 form factors. The TDP (Thermal Design Power) of both cards are the same, which means V100 is ~1.5 times more energy efficient than the previous P100.

    • HBM2 Memory: Faster, Higher Efficiency

      The 900 GB/sec peak memory bandwidth delivered by V100, is 23% higher than P100. Also the DRAM utilization has been improved from 76% to 95%, which allows for a 1.5x improvement in delivered memory bandwidth.

      More in-depth details of all new features of V100 GPU card can be found at this Nvidia website.

       

    Hardware and software specification update

     

    All the performance results in this blog were measured on a PowerEdge Server C4130 using Configuration G (4x PCIe V100) and Configuration K (4x V100-SXM2). Both these configurations have been used previously in P100 testing. Also except for the GPU, the hardware components remain identical to those used in the P100 tests as well: dual Intel Xeon E5-2690 v4 processors, 256GB (16GB*16 2400 MHz) Memory and an NFS file system mounted via IPoIB on InfiniBand EDR were used. Complete specs details are included in our previous blog. Moreover, if you are interested in other C4130 configurations besides G and K, you can find them in our K80 blog.

    There are some changes on the software front. In order to unleash the power of the V100, it was necessary to use the latest version of all software components. Table 2 lists the versions used for this set of performance tests. To keep the comparison fair, the P100 tests were reran using the new software stack to normalize for the upgraded software.

    Table 2: The changes in software versions

    Software

    Current Version

    Previous version in P100 blog

    OS

    RHEL 7.3

    RHEL 7.2

    GPU Driver

    384.59

    361.77/375.20

    CUDA Toolkit

    9.0.103RC

    8.0.44

    OpenMPI

    1.10.7 & 2.1.2

    1.10.1 & 2.0.1

    HPL

    Compiled with sm7.0

    Compiled with sm6.0

    HPCG

    Compiled with sm7.0

    -

    AMBER

    16, AmberTools17 update 20

    16 AmberTools16 update3

    LAMMPS

    patch_17Aug2017

    30Sep16

     

    p2pBandwidthLatencyTest

     

    p2pBandwidthLatencyTest is a micro-benchmark included in the CUDA SDK. It tests the card to card bandwidth and latency with and without GPUDirect™ Peer-to-Peer enabled. Since the full output matrix is fairly long, the unidirectional P2P result is listed below as an example here to demonstrate the way to verify the NVLINKs speed on both V100 and P100.

    In theory, V100 has 6x 25GB/s uni-directional links, giving 150GB/s throughput. The previous P100-SXM2 only has 4x 20GB/s links, delivering 80GB/s. The results of p2pBandwitdhtLatencyTest on both cards are in Table 3. “D/D” represents “device-to-device”, that is the bandwidth available between two devices (GPUs). The achievable bandwidth of GPU0 was calculated by aggregating the second, third and fourth value in the first line, which represent the throughput from GPU0 to GPU1, GPU2 and GPU3 respectively.

    Table 3: Unidirectional peer-to-peer bandwidth

    Unidirectional P2P=Enabled Bandwidth Matrix (GB/s).   Four GPUs cards in the server.

    P100

    V100

    D\D

    0

    1

    2

    3

    D\D

    0

    1

    2

    3

    0

    231.53

    18.36

    18.31

    36.39

    0

    727.38

    47.88

    47.9

    47.93

    1

    18.31

    296.74

    36.54

    18.33

    1

    47.92

    725.61

    47.88

    47.89

    2

    18.35

    36.08

    351.51

    18.36

    2

    47.91

    47.91

    726.41

    47.95

    3

    36.59

    18.42

    18.42

    354.79

    3

    47.96

    47.89

    47.9

    725.02

    It is clearly seen that V100-SXM2 on C4130 configuration K is significant faster than P100-SXM2, on:

    1. Achievable throughput. V100-SXM2 has 47.88+47.9+47.93= 143.71 GB/s aggregated achievable throughput, which is 95.8% of the theoretical value 150GB/s and significant higher than 73.06GB/s and 91.3% on P100-SXM2. The bandwidth for bidirectional traffic is twice that of unidirectional traffic and is also very close to the theoretically 300 GB/s throughput.

    2. Real world application. Symmetric access is the key for real world applications, on each chipset, P100 has 4 links, out of which three are connected to each of the other three GPUS. The remaining fourth link is connected to one of the other three GPUs. So, there are two links between GPU0 and GPU3, but only 1 link between GPU0 and GPU1 as well as GPU0 and GPU2. This is not symmetrical. The above numbers of p2pBandwidthLatencyTest in blue show this imbalance, as the value between GPU0 to GPU3 reaches 36.39 GB/s, which is double the bandwidth between GPU0 and GPU1 or GPU0 and GPU2. In most real world applications, it is common for the developer to treat all cards equally and not take such architectural differences into account. Therefore it will be likely that the faster pair of GPUs will need to wait for the slowest transfers, which means that 18.31 GB/s is the actual speed between all pairs of GPUs.

      On the other hand, V100 has a symmetrical design with 6 links as seen in Figure 1. GPU0 to GPU1, GPU2, or GPU3 all have 2 links between each pair. So 47.88 GB/s is the achievable link bandwidth for each, which is 2.6 times faster than the P100.


    Figure1: V100 and P100 Topologies on C4130 configuration K

     

    High Performance Linpack (HPL)

    Figure2: HPL Multi-GPU results with V100 and P100 on C4130 configuration G and K

    Figure 2 shows the HPL performance on the C4130 platform with 1, 2 and 4 V100-PCIe and V100-SXM2 installed. P100’s performance number is also listed for comparison. It can be observed:

    1. Both P100 and V100 scaling well, performance increases as more GPUs are added.

    2. V100 is ~30% faster than P100 on both PCIe (Config G) and SMX2 (Config K).

    3. A single C4130 server with 4x V100 reaches over 20TFlops on PCIe (Config G).

      HPL is a system level benchmark and its performance is limited by other components like CPU, memory and PCIe bandwidth. Configuration G is a balanced design, which has 2 PCIe links between CPU and GPU and this is why it outperforms configuration K with 4x GPUs in the HPL benchmark. We do see some other applications perform better in Configuration K, since SXM2 (Config K) supports NVLink, higher core clock speed and peer-to-peer data transfer, these are described below.

       

    HPCG

    Figure 3: HPCG Performance results with 4x V100 and P100 on C4130 configuration G and K

    HPCG, the High Performance Conjugate Gradients benchmark, is another well-known metric for HPC system ranking. Unlike HPL, its performance is strongly influenced by memory bandwidth. Credit to the faster and higher efficient HBM2 memory of V100, the performance improvement observed is 44% over P100 on both Configuration G and K.

    AMBER

    Figure 4: AMBER Multi-GPU results with V100 and P100 on C4130 configuration G and K

    Figure 4 illustrates AMBER’s results with Satellite Tobacco Mosaic Virus (STMV) dataset. On SXM2 system (Config K), AMBER scales weakly with 2 and 4 GPUs. Even though the scaling is not strong, V100 has noticeable improvement than P100, giving ~78% increase in single card runs, and 1x V100 is actually 23% faster than 4x P100. On the PCIe (Config G) side, 1 and 2 cards perform similar to SXM2, but 4 cards’ results dropped sharply. This is because PCIe (Config G) only supports Peer-to-Peer access between GPU0/1 and GPU2/3 and not among all four GPUs. Since AMBER has redesigned the way data transfers among GPUs to address the PCIe bottleneck, it relies heavily on Peer-to-Peer access for performance with multiple GPU cards. Hence a fast, direct interconnect like NVLink between all GPUs in SXM2 (Config K) is vital for AMBER multiple GPU performance.

    Figure 5: AMBER Multi-GPU Aggregate results with V100 and P100 on C4130 configuration G and K

    To compensate for a single job’s weak scaling on multiple GPUs, there is another use case promoted by AMBER developers, which is running multiple jobs in the same node concurrently but where each job uses only 1 or 2 GPUs. Figure 5 shows the results of 1-4 individual jobs on one C4130 with V100s and the numbers indicate that those individual jobs have little impact on each other. This is because AMBER is designed to run pretty much entirely on the GPUs and has very low dependency on the CPU. The aggregate throughput of multiple individual jobs scales linearly in this case. Without any card to card communication, the 5% better performance on SXM2 is contributed by its higher clock speed.

     

    LAMMPS

     

    Figure 6: LAMMPS 4-GPU results with V100 and P100 on C4130 configuration G and K

    Figure 6 shows LAMMPS performance on both configurations G and K. The testing dataset is Lennard-Jones liquid dataset, which contains 512000 atoms, and LAMMPS compiled with the kokkos package. V100 is 71% and 81% faster on Config G and Config K respectively. Comparing V100-SXM2 (Config K) and V100-PCIe (Config G), the former is 5% faster due to NVLINK and higher CUDA core frequency.

     

    Conclusion

     

    Figure 7: V100 Speedups on C4130 configuration G and K

    The C4130 server with NVIDIA® Tesla® V100™ GPUs demonstrates exceptional performance for HPC applications that require faster computational speed and highest data throughput. Applications like HPL, HPCG benefit from the additional PCIe links between CPU and GPU that are offered by Dell PowerEdge C4130 configuration G. On the other hand, applications like AMBER and LAMMPS were boosted with C4130 configuration K, owing to P2P access, higher bandwidth of NVLink and higher CUDA core clock speed. Overall, a PowerEdge C4130 with Tesla V100 GPUs performs 1.24x to 1.8x faster than a C4130 with P100 for HPL, HPCG, AMBER and LAMMPS.

     

  • General HPC

    Deep Learning on V100

    Authors: Rengan Xu, Frank Han, Nishanth Dandapanthula.

    HPC Innovation Lab. September 2017

    Overview

    In this blog, we will introduce the NVIDIA Tesla Volta-based V100 GPU and evaluate it with different deep learning frameworks. We will compare the performance of the V100 and P100 GPUs. We will also evaluate two types of V100: V100-PCIe and V100-SXM2. The results indicate that in training V100 is ~40% faster than P100 with FP32 and >100% faster than P100 with FP16, and in inference V100 is 3.7x faster than P100. This is one blog of our Tesla V100 blog series. Another blog of this series is about the general HPC applications performance on V100 and you can read it here

    Introduction to V100 GPU

    In the 2017 GPU Technology Conference (GTC), NVIDIA announced the Volta-based V100 GPU. Similar to P100, there are also two types of V100: V100-PCIe and V100-SXM2. V100-PCIe GPUs are inter-connected by PCIe buses and the bi-directional bandwidth is up to 32 GB/s. V100-SXM2 GPUs are inter-connected by NVLink and each GPU has six links and the bi-directional bandwidth of each link is 50 GB/s, so the bi-directional bandwidth between different GPUs is up to 300 GB/s. A new type of core added in V100 is called tensor core which was designed specifically for deep learning. These cores are essentially a collection of ALUs for performing 4x4 matrix operations: specifically a fused multiply add (A*B+C), multiplying two 4x4 FP16 matrices together, and then adding to a FP16/FP32 4x4 matrix to generate a final 4x4 FP16/FP32 matrix. By fusing matrix multiplication and add in one unit, the GPU can achieve high FLOPS for this operation. A single Tensor Core performs the equivalent of 64 FMA operations per clock (for 128 FLOPS total), and with 8 such cores per Streaming Multiprocessor (SM), 1024 FLOPS per clock per SM. By comparison, even with pure FP16 operations, the standard CUDA cores in a SM only generate 256 FLOPS per clock. So in scenarios where these cores can be used, V100 is able to deliver 4x the performance versus P100. The detailed comparison between V100 and P100 is in Table 1. 

                                                                      

    Table 1: The comparison between V100 and P100

    Testing Methodology

    As in our previous deep learning blog, we still use the three most popular deep learning frameworks: NVIDIA’s fork of Caffe (NV-Caffe), MXNet and TensorFlow. Both NV-Caffe and MXNet have been optimized for V100. TensorFlow still does not have any official release to support V100, but we applied some patches obtained from TensorFlow developers so that it is also optimized for V100 in these tests. For the dataset, we still use ILSVRC 2012 dataset whose training set contains 1281167 training images and 50000 validation images. For the testing neural network, we chose Resnet50 as it is a computationally intensive network. To get best performance, we used CUDA 9-rc compiler and CUDNN library in all of the three frameworks since they are optimized for V100. The testing platform is Dell EMC’s PowerEdge C4130 server. The C4130 server has multiple configurations, and we evaluated both P100-PCIe in configuration G and P100-SXM2 in configuration K. The difference between configuration G and configuration K is shown in Figure 1. There are mainly two differences: one is that configure G has two x16 PCIe link connecting from dual CPUs to the four GPUs, while configure K has only one x16 PCIe bus connecting from one CPU to four GPUs; another difference is that GPUs are connected by PCIe buses in configure G but by NVLink in configure K. The other hardware and software details are shown in Table 2.

      

    Figure 1: Comparison between configure G and configure K

    Table 2: The hardware configuration and software details


    In this experiment, we trained various deep learning frameworks with one pass on the whole dataset since we were comparing only the training speed, not the training accuracy. Other important input parameters for different deep learning frameworks are listed in Table 3. For NV-Caffe and MXNet, in terms of different batch size, we doubled the batch size for FP16 tests since FP16 consumes half the memory for floating points as FP32. As TensorFlow does not support FP16 yet, we did not evaluate its FP16 performance in this blog. Because of different implementations, NV-Caffe consumes more memory than MXNet and TensorFlow for the same neural network, the batch size in FP32 mode is only half of that in MXNet and TensorFlow. In NV-Caffe, if FP16 is used, then the data type of several parameters need to be changed. We explain these parameters as follows: the solver_data_type controls the data type for master weights; the default_forward_type and default_backward_type controls the data type for training values; the default_forward_math and default_backward_math controls the data type for matrix-multiply accumulator. In this blog we used FP16 for training values, FP32 for matrix-multiply accumulator and FP32 for master weights. We will explore other combinations in our future blogs. In MXNet, we tried different values for the parameter “--data-nthreads” which controls the number of threads for data decoding.
     

    Table 3: Input parameters used in different deep learning frameworks

     

    Performance Evaluation

    Figure 1, Figure 2, and Figure 3 show the performance of V100 versus P100 with NV-Caffe, MXNet and TensorFlow, respectively. And Table 4 shows the performance improvement of V100 compared to P100. From these results, we can obtain the following conclusions:

    • In both PCIe and SXM2 versions, V100 is >40% faster than P100 in FP32 for both NV-Caffe and MXNet. This matches the theoretical speedup. Because FP32 is single precision floating points, and V100 is 1.5x faster than P100 in single precision. With TensorFlow, V100 is more than 30% faster than P100. Its performance improvement is lower than the other two frameworks and we think that is because of different algorithm implementations in these frameworks.

    • In both PCIe and SXM2 versions, V100 is >2x faster than P100 in FP16. Based on the specification, V100 tensor performance is ~6x than P100 FP16. The reason that the actual speedup does not match the theoretical speedup is that not all data are stored in FP16 and so not all operations are tensor operations (the FMA matrix multiply and add operation).

    • In V100, the performance of FP16 is close to 2x than that of FP32. This is because FP16 only requires half storage compared to FP32 and therefore we could double the batch size in FP16 to improve the computation speed.

    • In MXNet, we set the “--data-nthreads” to 16 instead of the default value 4. The default value is often sufficient to decode more than 1K images per second but still not fast enough for V100 GPU. In our testing, we found the default value 4 is enough for P100 but for V100 we need to set it at least 12 to achieve good performance, with a value of 16 being ideal.

    Figure 2: Performance of V100 vs P100 with NV-Caffe

    Figure 3: Performance of V100 vs P100 with MXNet

     

     

    Figure 4: Performance of V100 vs P100 with TensorFlow

    Table 4: Improvement of V100 compared to P100

     

    Since V100 supports both deep learning training and inference, we also tested the inference performance with V100 using the latest TensorRT 3.0.0. The testing was done in FP16 mode on both V100-SXM2 and P100-PCIe and the result is shown in Figure 5. We used batch size 39 for V100 and 10 for P100. Different batches were chosen to make their inference latencies are close to each other (~7ms in the figure). The result shows that when their latencies are close, the inference throughput of V100 is 3.7x faster compared to P100.

     Figure 5: Resnet50 inference performance on V100 vs P100

    Conclusions and Future Work

    After evaluating the performance of V100 with three popular deep learning frameworks, we conclude that in training V100 is more than 40% faster than P100 in FP32 and more than 100% faster in FP16, and in inference V100 is 3.7x faster than P100. This demonstrates the performance benefits when the V100 tensor cores are used. In the future work, we will evaluate different data type combinations in FP16 and study the accuracy impact with FP16 in deep learning training. We will also evaluate the TensorFlow with FP16 once support is added into the software. Finally, we plan to scale the training to multiple nodes with these frameworks.

  • General HPC

    14G with Skylake – how much better for HPC?

    Garima Kochhar, Kihoon Yoon, Joshua Weage. HPC Innovation Lab. 25 Sep 2017.

    The document below presents performance results on Skylake based Dell EMC 14th generation systems for a variety of HPC benchmarks and applications (Stream, HPL, WRF, BWA, Ansys Fluent, STAR-CCM+ and LS-DYNA). It compares the performance of these new systems to previous generations, going as far back as Westmere (Intel Xeon 5500 series) in Dell EMC's 11th generation servers, showing the potential improvements when moving to this latest instance of server technology.

  • Hotfixes

    Optional Hotfix 654339 for 8.6 MR3 Password Management Released

    This is an optional hotfix for: 

    • Password Management Role

     

     The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    Password Manager

    The same error message is displayed for the incorrect password, access denied and other errors.

    647264

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/233001 

  • Hotfixes

    Mandatory Hotfix 654335 for 8.6 MR3 Windows Connector Released

    This is a Cumulative Mandatory hotfix and can be installed on the following vWorkspace roles:

    • Windows Connector

    The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    Windows connector

    Focus changes to PNTSC after launching a pinned application via the task bar

    627437

    Windows connector

    The Win+M keys combination does not minimize the full screen session if keyboard is set ‘On the local’

    654197

    Windows connector

    Published App with multiple modal dialogue boxes does not respond

    633773

    Windows connector

    Seamless application focus blinks if a seamless application with several modal windows was activated after the local application

    654260

    Windows connector

    The “Web site or address” field is not focused on the Welcome screen after configuration has been deleted

    598945

    Windows connector

    The Search field is not cleared after deleting and then downloading a configuration

    588327

    Windows connector

    The Search applications function does not work after the ‘Back door’ settings are closed

    596787

    Windows connector

    There are black borders around VW Seamless applications

    653831

    Windows connector

    PNTSC hangs on logoff when using Olympus VD client

    653863

    Windows connector

    After minimizing and maximizing the session, it is spanned over both monitors incorrectly

    654028

    Windows connector

    User’s credentials should not be updated in connection automatically if Password Manager hasn’t changed user’s password

    654116

    Windows connector

    If there are two monitors, only half of the remote desktop is shown when the Span Multi-Monitor option is not selected

    653940 

    Windows connector

    The remote session is not maximized on the second monitor after it was minimized on the first monitor and moved to the second one

    654403

    Windows connector

    It is impossible to connect to the application using Kerberos Authentication

    654004

    Windows connector

    Connector closes if switched between two seamless applications with opened printer preference page and changed default paper

    654219

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/232979 

  • Hotfixes

    Mandatory Hotfix 654337 for 8.6 MR3 Web Access Released

    This is a mandatory hotfix and can be installed on the following vWorkspace roles -

     

    • Web Access

     

     The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    Web Access

    UPN login causes black screen when logging into a full desktop, but works fine via Web Access 

    654026

    Web Access

    Notification message about user account locking has become more informative

    654143

    Web Access

    Connector installation from Web Access does not work

    453332

    Web Access

    Writing issues (keyboard) in some applications like Wireshark and Cisco Packet Tracer using HTML5

    653999

    Web Access

    Connector Policy breaks Web Access

    654128

    Web Access

    Web Access multi-line text in Browser messages does not work properly

    654217

    Web Access

    Session does not launch if user uses UPN credentials and NLA is disabled on the server side

    654482

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/232974 

  • Hotfixes

    Mandatory Hotfix 654514 for 8.6 MR3 PNTools/RDSH Released

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles:

    • Remote Desktop Session Host (RDSH)
    • PNTools (VDI)

     The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    PNtools and RDSH

    There are black borders around the vWorkspace Seamless applications

    653831

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/232958

  • Hotfixes

    Mandatory Hotfix 654338 for 8.6 MR3 Management Console Released

    This is a mandatory hotfix and can be installed on the following vWorkspace roles -

     

    • vWorkspace Management Console

     

     The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    Management Console

    Wizards in the MC display ‘_’ symbol instead of ‘&’ symbol

    653743

    Management Console

    The Active Directory Path item is not disabled if the Workgroup item is selected on the Domain or Workgroup page in the New Operating System Customization Properties wizard

    653745

    Management Console

    Web Access multi-line text in Browser messages does not work properly

    654217

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/232952 

  • Hotfixes

    Mandatory Hotfix 654336 for 8.6 MR3 Connection Broker Released

    This is a cumulative mandatory hotfix and can be installed on the following vWorkspace roles -

     

    • vWorkspace Connection Broker

     

     The following is a list of issues resolved in this release.

     Feature

    Description

    Feature ID

    Connection Broker

    VMware Standard Clones with MS Sysprep do not retain MAC address 

    622600

    Connection Broker

    Reprovision using task automation does not work when using a new template

    627696

    Connection Broker

    VM created from Windows Server 2016 template on SCVMM server does not initialize

    653746

    Connection Broker

    VM RAM on VMware resets to a template value after reprovisioning

    653747

    Connection Broker

    Thread handle leak in broker service

    653771

    Connection Broker

    VM Reprovision can cause duplicate Mac address in VMware

    653939

    Connection Broker

    VMs created in the Hyper-V FailOver Cluster with High Availability enabled are reprovisioned without VHD

    654101

    Connection Broker

    Linked reprovision to Existing Template or New Template doesn’t work

    654142

    Connection Broker

    After adding Hyper-V host to MC, getting of Hyper-V volume information fails and the host does not initialize

    654160

    Connection Broker

    On VMware, VM Virtual Disks setting resets to template's value after reprovisioning

    654312

    This hotfix is available for download at: https://support.quest.com/vworkspace/kb/232947

  • Dell TechCenter

    Rack Scale & Ready: DSS 9000

    Author: Robert Tung, Sr. Consultant, Product Management, Extreme Scale Infrastructure 

    This month, Dell EMC Extreme Scale Infrastructure (ESI) group is releasing the latest version of the DSS 9000 – an open, agile and efficient rack scale infrastructure solution.  It’s specifically designed using hyperscale principles to help carriers and service providers accelerate the shift to software-defined data centers, to realize their digital transformation goals, and to stay competitive in a rapidly changing business environment.

    Carriers and service providers are competing in arenas that are being upended by trends like Big Data Analytics, the Internet of Things, and Edge Computing. In some cases they are directly competing with industry-leading hyperscale cloud pioneers. They need infrastructure that can easily and rapidly scale to answer the demands for compute and storage – and that can be readily managed at the rack and datacenter level.

    Scale-out hardware design

    One of the major challenges to rapidly growing IT environments is the ability to scale with minimum disruption to business operations. The DSS 9000 is designed to make scaling easy – both at the rack level and within the rack. As a pre-integrated rack solution, DSS 9000 deployment is as simple as rolling the pre-configured rack into place and plugging it in. It is highly flexible and available in multiple heights that can be tailored to meet specific workload needs using modular blocks that can be loaded with sleds of varying widths containing compute and storage resources. This flexible approach allows infrastructure to be easily and rapidly changed or scaled as needed over time.

    Flexibility extends to the design of the sleds as well. The DSS 9000 offers three variable-width server sleds: full-width (DSS 9600), half-width (DSS 9620) and third-width (DSS 9630). Each supports two Intel® Scalable Family processors, 16 DDR4 DIMMs, and variable amounts of supported storage. (See sidebar.) Compute-intensive environments can scale up to 96 nodes per rack using the third width sleds.

    DSS 9620 half-width server

    The DSS 9000 also supports a full width JBOD storage sled that delivers up to twelve 3.5” drives and can be daisy chained to a head node to provide up to 288 TB of additional storage.

    Innovative rack management

    One of the most unique features of the DSS 9000 is its Rack Manager component. It is designed to provide enhanced rack-level management functions through a single point of control. At its core is an embedded CentOS server powered by an Intel® Atom microprocessor that handles the increasingly sophisticated management tasks that are common in today’s data centers. It is connected to each block in the rack and, through them, to each individual node using a 1Gbit management network that is independent of the data network. It is also connected to the power bays and the cooling fans which are disaggregated from the sleds to improve overall reliability and serviceability.

    Using Rack Manager with a Redfish- based Command Language Interface, you can get information from, and manage all of the devices in the rack. This allows you to consolidate rack-wide operations including firmware updates to all devices in the rack or control power and cooling at the block level as well as many other possibilities.

    Rack scale management allows you to greatly reduce the time consumed in day-to-day operations across massive infrastructure environments and also reduces the number of management switches involved compared to typical datacenters. 

    Open networking… open everything

    The DSS 9000 is designed without an embedded fabric in support of an open networking approach.  Data center managers can implement whatever vendor’s networking stack makes the most sense for them. The DSS 9000 supports any standard 19” switch via its Open IT bay capability.

    In fact, the Open IT bay capability allows the rack configuration to accommodate any standard 19” server, storage or networking equipment.  The Redfish API compliance also makes it possible for the DSS 9000 to support the rack-wide management of this ancillary third-party gear. Such open technologies and interoperability are part of what led the Open Compute Project (OCP) Foundation to recognize the DSS 9000 as OCP-INSPIRED™.

    More choices for workload optimization

    The Extreme Scale Infrastructure organization has more than a decade of experience working with customers to deliver tailored, hyperscale-inspired features to optimize their infrastructure for specific workloads. That history of innovation has led us to integrate the latest compute, networking and storage options into the DSS 9000 to allow customers to meet their needs now and into the future.

    For more information refer to the DSS 9000 website or email ESI@dell.com.