Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
CommAutoTestGroup - Blog CommAutoTestGroup 1
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Latest Blog Posts
  • vWorkspace - Blog

    Mandatory Hotfix 654125 for 8.6 MR3 Android Connector Released

    This is a mandatory hotfix for: 


    • Android Connector


    The following is a list of issues resolved in this release.



    Feature ID


    It is impossible to launch any application with non-English user name if Cred SSP is enabled for the farm properties



    This hotfix is available for download at: 

  • Windows 10 IoT Enterprise (WIE10) - Blog

    Flash Module Failure with WF (Write Filter) Enabled permanently

    Just a friendly reminder that Wyse Thin Clients are not intended to be used as PCs i.e with their Write Filter turned off permanently (This excludes routine maintenance where WF is turned off for Windows Updates or application installation).

    Updated Terms:

    Please be advised that Dell Wyse Windows Embedded Thin Clients are intended to be used as thin clients only and not as personal computers. Dell Wyse is not responsible for, and will not, warrant, support, repair or replace any thin client device or component that is not used for its intended purpose. As an example, and without limitation, any operation of a Wyse Windows Embedded Thin Client with the write filter turned off during regular use (except as required for image upgrades, applying security patches, registry changes and application installation) is beyond the scope of the intended purpose, will prematurely wear out your Flash/SSD storage and will invalidate the thin client product warranty. In addition, enabling the Windows Page File is beyond the scope of the intended purpose and will invalidate the thin client product warranty.

  • vWorkspace - Blog

    Mandatory Hotfix 654037 for 8.6 MR3 Mac Connector Released

    This is a mandatory hotfix for: 


    • Mac Connector


    The following is a list of issues resolved in this release.



    Feature ID


    Picture is pasted instead of copied text when copying from Excel 2016 to Mac version of Excel



    It is impossible to launch any application with non-English user name if Cred SSP is enabled for the farm properties




    This hotfix is available for download at: 

  • vWorkspace - Blog

    Mandatory Hotfix 654123 for 8.6 MR3 Linux Connector Released

    This is a mandatory hotfix for: 


    • Linux Connector


    The following is a list of issues resolved in this release.



    Feature ID


    It is impossible to launch any application with non-English user name if Cred SSP is enabled for the farm properties





    This hotfix is available for download at: 


  • Security - Blog

    Dell Technical Support | BitLocker Webinar

    Dell Social Media | Webinar Series  


    Thanks to everyone that tuned in for this month’s technical support webinar. Don’t worry if you missed the live session though!


    We’ve uploaded a recording of the webinar to our support page <Here> so you can share, and catch up.


    You can also download the deck used in the presentation <Here>.


    For all those who were able to join, we hope you found our presentation both enjoyable and informative. To learn more about our Technical Support Webinar series and sign up for future events at  


    For additional BitLocker support find below a few of the most popular solutions we’ve identified. 


     In this quick 30 minute session we’ll take you through an introduction to BitLocker, along with some basic & advanced troubleshooting tips and tricks in answer to the questions we’ve identified as trending through social media, specifically regarding BitLocker key prompts.

     Webinar | Archive

    Want to learn more? You can find recordings of our previous Webinars below and at

    Create & Use Dell Windows 10 Media

    Here we discuss how to create bootable USB media for the installation of Windows 10 on your Dell computer, general operating system installation advice, and how to use command prompt for driver installation (pre-OS).

    Ubuntu Basics & Installation

    In this quick 40 minute session we’ll take you through the steps to complete an Ubuntu OS installation, while keeping your current Windows install and all your data intact. We’ll also be providing a brief introduction to Ubuntu and talking with Barton George who will be providing his insights into Dells Project Sputnik, Ubuntu collaboration.

    Thunderbolt & TB16 Troubleshooting

    In this webinar we take you through some of the intricacies relating to Thunderbolt technology, including technical specification overviews and comparisons, along with Docking/Adapter hardware solutions available from Dell that will help you best take advantage of the technology.



    Have a topic that you’d be interested to see discussed?

    Submit your suggestions to us @DellCaresPRO

    Dell-Shawn B

    Dell EMC | Social Media

  • vWorkspace - Blog

    What's new for vWorkspace - March 2017

    Updated monthly, this publication provides you with new and recently revised information and is organized in the following categories; Documentation, Notifications, Patches, Product Life Cycle, Release, Knowledge Base Articles.

    Subscribe to the RSS (Use IE only)


    Knowledgebase Articles


    226670 - Mandatory Hotfix 653995 for 8.6 MR3 Connection Broker

    This mandatory hotfix addresses the following issues: Broker CPU usage has increased and log file size...

    Created: March 1, 2017


     226832 - Users getting Your active session has expired when trying to log in to web portal after upgrade

    After upgrading vWorkspace to 8.6.3 users immediately receive the message "Your active session has expired. Please log in to continue" and are...

    Created: March 6, 2017


    227674 - Video: How to Configure vWorkspace Web Access for Defender two factor authentication

    Created: March 27, 2017


    227676 - Quest Defender and Quest Desktop Virtualization integration

    Created: March 27, 2017


    227768 - What are the subversion for the different Service Packs of vWorkspace 8.6.x

    vWorkspace in its version 8.6 has 3 different Service Packs on top of its Main Release Those are however not the version number you will...

    Created: March 29, 2017


    227781 - How to set the minimum memory with HyperV

    The Hyper-V role in Windows 2012 has an improved Dynamic memory feature that adds a property called Minimum Memory. This allows you to specify a...

    Created: March 29, 2017


    227791 - Broker service will not start after server updates.

    Server updates were performed on the vWorkspace Connection Broker and now the broker service will not start.

    Created: March 29, 2017



    137081 - Are Generation 2 Hyper-V Virtual machines supported in vWorkspace

    When attempting to create a template machine, the following error is seen when trying to install the instant provisioning tools: Floppy disk...

    Revised: March 2, 2017


    225565 - Is VMware 6.5 currently supported?

    Is VMware vSphere 6.5 supported in any of the current versions of vWorkspace?

    Revised: March 7, 2017


    226670 - Mandatory Hotfix 653995 for 8.6 MR3 Connection Broker

    This mandatory hotfix addresses the following issues: Broker CPU usage has increased and when logging...

    Revised: March 17, 2017


    155517 - What are the Requirements for a Connector Broker 8.5?

    What are the requirement to deploy the Connection Broker properly?

    Revised: March 22, 2017 


    Product Life Cycle -vWorkspace

    Revised: March 2017

  • General HPC

    Virtualized HPC Performance with VMware vSphere 6.5 on a Dell PowerEdge C6320 Cluster

    This article presents performance comparisons of several typical MPI applications — LAMMPS, WRF, OpenFOAM, and STAR-CCM+ — running on a traditional, bare-metal HPC cluster versus a virtualized cluster running VMware’s vSphere virtualization platform. The tests were performed on a 32-node, EDR-connected Dell PowerEdge C6320 cluster, located in the Dell EMC HPC Innovation Lab in Austin, Texas. In addition to performance results, virtual cluster architecture and configuration recommendations for optimal performance are described.

    Why HPC virtualization

    Interest in HPC virtualization and cloud have grown rapidly. While much of the interest stems from gaining the general value of cloud technologies, there are specific benefits of virtualizing HPC and supporting it in a cloud environment, such as centralized operation, cluster resource sharing, research environment reproducibility, multi-tenant data security, fault isolation and resiliency, dynamic load balancing, efficient power management, etc. Figure 1 illustrates several HPC virtualization benefits.

    Despite the potential benefits of moving HPC workloads to a private, public, or hybrid cloud, performance concerns have been a barrier to adoption. We focus here on the use of on-premises, private clouds for HPC — environments in which appropriate tuning can be applied to deliver maximum application performance. HPC virtualization performance is primarily determined by two factors; hardware virtualization support and virtual infrastructure capability. With advances in both VMware vSphere as well as x86 microprocessor architecture, throughput applications can generally run at close to full speed in the VMware virtualized environment — with less than 5% performance degradation compared to native, and often just 1 – 2% [1]. MPI applications by nature are more challenging, requiring sustained and intensive communication between nodes, making them sensitive to interconnect performance. With our continued performance optimization efforts, we see decreasing overheads running these challenging HPC workloads [2] and this blog post presents some MPI results as examples.

    Figure 1: Illustration of several HPC virtualization benefits

    Testbed Configuration

    As illustrated in Figure 2, the testbed consists of 32 Dell PowerEdge C6320 compute nodes and one management node. vCenter [3], the vSphere management component, as well as NFS and DNS are running in virtual machines (VMs) on the management node. VMware DirectPath I/O technology [4] (i.e., passthrough mode) is used to allow the guest OS (the operating system running within a VM) to directly access the EDR InfiniBand device, which shortens the message delivery path by bypassing the network virtualization layer to deliver best performance. Native tests were run using CentOS on each host, while virtual tests were run with the VMware ESXi hypervisor running on each host along with a single virtual machine running the same CentOS version.

    Figure 2: Testbed Virtual Cluster Architecture

    Table 1 shows all cluster hardware and software details, and Table 2 shows a summary of BIOS and vSphere settings.

    Table 1: Cluster Hardware and Software Details



    Dell PowerEdge C6320


    Dual 10-core Intel Xeon E5-2660 v3 processors@2.6GHz (Haswell)


    128GB DDR4


    Mellanox ConnectX-4 VPI adapter card; EDR IB (100Gb/s)


    VMware vSphere

    ESXi hypervisor


    vCenter management server


    BIOS, Firmware and OS




    OS Distribution (virtual and native)

    CentOS 7.2



    OFED and MPI



    Open MPI

    (LAMMPS, WRF and OpenFOAM)


    Intel MPI (STAR-CCM+)











    Table 2: BIOS and vSphere Settings

    BIOS Settings

    Hardware-assisted virtualization


    Power profile

    Performance Per Watt (OS)

    Logical processor


    Node interleaving

    Disabled (default)

    vSphere Settings

    ESXi power policy

    Balanced (default)

    DirectPath I/O

    Enabled for EDR InfiniBand

    VM size

    20 virtual CPUs, 100GB memory

    Virtual NUMA topology (vNUMA)

    Auto detected (default)

    Memory reservation

    Fully reserved

    CPU Scheduler affinity

    None (default)


    Figures 3-6 show native versus virtual performance ratios with the settings in Table 2 applied. A value of 1.0 means that virtual performance is identical to native. Applications were benchmarked using a strong scaling methodology — problem sizes remained constant as job sizes were scaled. In the Figure legends, ‘nXnpY’ indicates a test run on X nodes using a total of Y MPI ranks. Benchmark problems were selected to achieve reasonable parallel efficiency at the largest scale tested. All MPI processes were consecutively mapped from node 1 to node 32.

    As can be seen from the results, the majority of tests show degradations under 5%, though there are increasing overheads as we scale. At the highest scale tested (n32np640), performance degradation varies by applications and benchmark problems, with the largest degradation seen with LAMMPS atomic fluid (25%) and the smallest seen with STAR-CCM+ EmpHydroCyclone_30M (6%). Single-node STAR-CCM+ results are anomalous and currently under study. As we continue our performance optimization work, we expect to report better and more scalable results in the future.

    Figure 3: LAMMPS native vs. virtual performance. Higher is better.

    Figure 4: WRF native vs. virtual performance. Higher is better.


    Figure 5: OpenFOAM native vs. virtual performance. Higher is better.

    Figure 6: STAR-CCM+ native vs. virtual performance. Higher is better.

    Best Practices

    The following configurations are suggested to achieve optimal virtual performance for HPC. For more comprehensive vSphere performance guidance, please see [5] and [6].


    • Enable hardware-assisted virtualization features , e.g. Intel VT.
    • Enable logical processors. Though logical processors (hyper-threading) usually does not help HPC performance, enable it but configure the virtual CPUs (vCPUs) of a VM to each use a physical core and leave extra threads/logical cores for ESXi hypervisor helper threads to run.
    • It’s recommended to configure BIOS settings to allow ESXi the most flexibility in using power management features. In order to allow ESXi to control power-saving features, set the power policy to the “OS Controlled” profile.
    • Leave node interleaving disabled to let the ESXi hypervisor detect NUMA and apply NUMA optimizations


    • Configure EDR InfiniBand in DirectPath I/O mode for each VM
    • Properly size VMs:

    MPI workloads are CPU-heavy and can make use of all cores, thus requiring a large VM. However, CPU or memory overcommit would greatly impact performance. In our tests, each VM is configured with 20vCPUs, using all physical cores, and 100 GB fully reserved memory, leaving some free memory to consume ESXi hypervisor memory overhead.

    • ESXi power management policy:

    There are three ESXi power management policies: “High Performance”, “Balanced” (default), “Low Power” and “Custom”. Though “High performance” power management would slightly increase performance of latency-sensitive workloads, in situations in which a system’s load is low enough to allow Turbo to operate, it will prevent the system from going into C/C1E states, leading to lower Turbo boost benefits. The “Balanced” power policy will reduce host power consumption while having little or no impact on performance. It’s recommended to use this default.

    • Virtual NUMA

    Virtual NUMA (vNUMA) exposes NUMA topology to the guest OS, allowing NUMA-aware OSes and applications to make efficient use of the underlying hardware. This is an out-of-the-box feature in vSphere.

    Conclusion and Future Work

    Virtualization holds promise for HPC, offering new capabilities and increased flexibility beyond what is available in traditional, unvirtualized environments. These values are only useful, however, if high performance can be maintained. In this short post, we have shown that performance degradations for a range of common MPI applications can be kept under 10%, with our highest scale testing showing larger slowdowns in some cases. With throughput applications running at very close to native speeds, and with the results shown here, it is clear that virtualization can be a viable and useful approach for a variety of HPC use-cases. As we continue to analyze and address remaining sources of performance overhead, the value of the approach will only continue to expand.

    If you have any technical questions regarding VMware HPC virtualization, please feel free to contact us!


    These results have been produced in collaboration with our Dell Technology colleagues in the Dell EMC HPC Innovation Lab who have given us access to the compute cluster used to produce these results and to continue our analysis of remaining performance overheads.


    1. J. Simons, E. DeMattia, and C. Chaubal, “Virtualizing HPC and Technical Computing with VMware vSphere,” VMware Technical White Paper,
    2. N.Zhang, J.Simons, “Performance of RDMA and HPC Applications in Virtual Machines using FDR InfiniBand on VMware vSphere,” VMware Technical White Paper,
    3. vCenter Server for vSphere Management, VMware Documentation,
    4. DirectPath I/O, VMware Docuementation,
    5. VMware Performance Team, "Performance Best Practices for VMware vSphere 6.0," VMware Technical White Paper,***/digitalmarketing/vmware/en/pdf/techpaper/vmware-perfbest-practices-vsphere6-0-white-paper.pdf.
    6. Bhavesh Davda, "Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs," VMware Technical White Paper,

    Na Zhang is member of the technical staff working on HPC within VMware’s Office of the CTO. Her current focus is on performance and solutions of HPC virtualization. Na has Ph.D. degree in Applied Mathematics from Stony Brook University. Her research primarily focused on design and analysis of parallel algorithms for large- and multi-scale simulations running on supercomputers.

  • General HPC

    Deep Learning Inference on P40 GPUs

    Authors: Rengan Xu, Frank Han and Nishanth Dandapanthu. Dell EMC HPC Innovation Lab. Mar. 2017

    Introduction to P40 GPU and TensorRT

    Deep Learning (DL) has two major phases: training and inference/testing/scoring. The training phase builds a deep neural network (DNN) model with the existing large amount of data. And the inference phase uses the trained model to make prediction from new data. The inference can be done in the data center, embedded system, auto and mobile devices, etc. Usually inference must respond to user request as quickly as possible (often in real time). To meet the low-latency requirement of inference, NVIDIA® launched Tesla® P4 and P40 GPUs. Aside from high floating point throughput and efficiency, both GPUs introduce two new optimized instructions designed specifically for inference computations. The two new instructions are 8-bit integer (INT8) 4-element vector dot product (DP4A) and 16-bit 2-element vector dot product (DP2A) instructions. Deep learning researchers have found using FP16 is able to achieve the same inference accuracy as FP32 and many applications only require INT8 or lower precision to keep an acceptable inference accuracy. Tesla P4 delivers a peak of 21.8 INT8 TIOP/s (Tera Integer Operations per Second), while P40 delivers a peak of 47.0 INT8 TIOP/s. This blog only focuses on P40 GPU.

    TensorRTTM, previously called GIE (GPU Inference Engine), is a high performance deep learning inference engine for production deployment of deep learning applications that maximizes inference throughput and efficiency. TensorRT provides users the ability to take advantage of fast reduced precision instructions provided in the Pascal GPUs. TensorRT v2 supports the INT8 reduced precision operations that are available on the P40.

    Testing Methodology

    This blog quantifies the performance of deep learning inference using TensorRT on Dell’s PowerEdge C4130 server which is equipped with 4 Tesla P40 GPUs. Since TensorRT is only available for Ubuntu OS, all the experiments were done on Ubuntu. Table 1 shows the hardware and software details. The inference benchmark we used was giexec in TensorRT sample codes. The synthetic images which were filled with random non-zero numbers to simulate real images were used in this sample code. Two classic neural networks were tested: AlexNet (2012 ImageNet winner) and GoogLeNet (2014 ImageNet winner) which is much deeper and complicated than AlexNet.

    We measured the inference performance in images/sec which means the number of images that can be processed per second. To measure the performance improvement of the current generation GPU P40, we also compared its performance with the previous generation GPU M40. The most important goal of this testing is to measure the inference performance in INT8 mode, compared to FP32 mode. P40 uses the new Pascal architecture and supports the new INT8 instructions. The previous generation GPU M40 uses Maxwell architecture and does not support INT8 instructions. The theoretical performance of INT8, FP32 in both M40 and P40 is shown in Table 2. We measured the performance FP32 on both devices and both FP32 and INT8 on the P40.

    Table 1: Hardware configuration and software details


    PowerEdge C4130 (configuration G)


    2 x Intel Xeon CPU E5-2690 v4 @2.6GHz (Broadwell)


    256GB DDR4 @ 2400MHz


    400GB SSD


    4x Tesla P40 with 24GB GPU memory

    Software and Firmware

    Operating System

    Ubuntu 14.04



    CUDA and driver version

    8.0.44 (375.20)

    TensorRT Version

    2.0 EA

    Table 2: Comparison between Tesla M40 and P40


    Tesla M40

    Tesla P40

    INT8 (TIOP/s)



    FP32 (TFLOP/s)



    Performance Evaluation

    In this section, we will present the inference performance with TensorRT on GoogLeNet and AlexNet. We also implemented the benchmark with MPI so that it can be run on multiple P40 GPUs within a node. We will also compare the performance of P40 with M40. Lastly we will show the performance impact when using different batch sizes.

    Figure 1 shows the inference performance with TensorRT library for both GoogLeNet and AlexNet. We can see that INT8 mode is ~3x faster than FP32 in both neural networks. This is expected since the theoretical speedup of INT8 is 4x compared to FP32 if only multiplications are performed and no other overhead is incurred. However, there are kernel launches, occupancy limits, data movement and math other than multiplications, so the speedup is reduced to about 3x faster.

    Figure 1: Inference performance with TensorRT library

    Dell’s PowerEdge C4130 supports up to 4 GPUs in a server. To make use of all GPUs, we implemented the inference benchmark using MPI so that each MPI process runs on each GPU. Figure 2 and Figure 3 show the multi-GPU inference performance on GoogLeNet and AlexNet, respectively. When using multiple GPUs, linear speedup were achieved for both neural networks. This is because each GPU processes its own images and there is no communications and synchronizations among used GPUs.

    Figure 2: Multi-GPU inference performance with TensorRT GoogLeNet

    Figure 3: Multi-GPU inference performance with TensorRT AlexNet

    To highlight the performance advantage of P40 GPU and its native support for INT8, we compared the inference performance between P40 with the previous generation GPU M40. The result is shown in Figure 5 and Figure 6 for GoogLeNet and AlexNet, respectively. In FP32 mode, P40 is 1.7x faster than M40. And the INT8 mode in P40 is 4.4x faster than FP32 mode in M40.

    Figure 4: Inference performance comparison between P40 and M40

    Figure 5: Inference performance comparison between P40 and M40

    Deep learning inference can be applied in different scenarios. Some scenarios require large batch size and some scenarios even requires no batching at all (i.e. batch size is 1). Therefore we also measured the performance difference when using different batch sizes and the result is shown in Figure 6. Note that the purpose here is not comparing the performance of GoogLeNet and AlexNet, instead the purpose is to check how the performance changes with different batch sizes for each neural network. It can be seen that without batch processing the inference performance is very low. This is because the GPU is not assigned enough workloads to keep it busy. The larger the batch size is, the higher the inference performance is, although the rate of the speed increasing becomes slower. When batch size is 4096, GoogLeNet stopped running because the required GPU memory for this neural network exceeds the GPU memory limit. But AlexNet was able to run because it is a less complicated neural network than GoogLeNet and therefore it requires less GPU memory. So the largest batch size is only limited by GPU memory.

    Figure 6: Inference performance with different batch sizes

    Conclusions and Future Work

    In this blog, we presented the inference performance in deep learning with NVIDIA® TensorRT library on P40 and M40 GPUs. As a result, the INT8 support in P40 is about 3x faster than FP32 mode in P40 and 4.4x faster than FP32 mode in the previous generation GPU M40. Multiple GPUs can increase the inferencing performance linearly because of no communications and synchronizations. We also noticed that higher batch size leads to higher inference performance and the largest batch size is only limited by GPU memory size. In the future work, we will evaluate the inference performance with real world deep learning applications.

  • Data Security

    Ding Dong...Baidu at the Door

    - by Matt Halsey

    Have a connected home? Have an internet connection? Then you too can have a conversation with Chinese website Baidu.

    Huge Vulnerability Discovered in the Ring Doorbell This article highlights the intrinsic need for there to be a means to secure IoT devices.

    It was only a few months ago that the Mirai botnet, using home video surveillance cameras, was able to launch the largest DDoS attack in history.

    Read the article.

    Then you can read the comments from someone claiming to be the head of security at Ring, name Matt, here (italics added):

    Hi I'm the VP of Security at Ring and I thought it might be helpful to give you all some background on what you are seeing.

    Occasionally at the end of live call or motion, we will lose connectivity. Rather than abandoning the entire call, we send the last few audio packets that are corrupted anyway to a non-routable address on a protocol no one uses. The right way to do that is to use a virtual interface or the loopback to discard the packets. The choice to send it to somewhere across the world and let the ISP deal with blocking is a poor design choice that the teams on working on addressing ASAP.

    From a risk/disclosure perspective, it's relatively benign but like the everyone else, when my team first saw it in the wild we had similar concerns.

    i will circle back when we have updated firmware.


    Ring Pro doorbell - calling China?

    So what to do:

    1. Go to Industrial Internet Consortium and see how Dell and EMC, now Dell|EMC and Dell Technologies are helping to secure the IoT world.

    2. Realize that IoT is in its infancy if not earlier where security is when we used to leave Telnet, TFTP, and FTP ports open on our internet facing servers....

    3. Be ready to help our customers understand that encryption, especially our products, can help protect them when vendors of IoT devices don't finish their job in securing the devices.

  • Dell Big Data - Blog

    Getting started with machine-generated data

    By Brett Roberts with Debra Slapak


    The amount of machine-generated data being created each day is massive and--as we all know--can be extremely valuable. Insights extracted from this data have the potential to help you improve operational efficiency, customer experience, security and much more. But getting started can present real challenges and really big questions, such as "How do we consolidate all of this complex data and analyze it to deliver actionable insights?" Dell EMC works with Splunk to address these challenges and simplify those first steps.


    Splunk’s proven platform for real-time operational intelligence helps reduce the complexity of harnessing machine-generated data by providing users with an end-to-end platform to collect, search, analyze and visualize this data.  For the Splunk platform to be used to its full potential, organizations need infrastructure that meets or exceeds Splunk’s reference architecture specifications. Dell EMC has partnered with Splunk to create highly-optimized and powerful solutions that help solve machine-generated data challenges. Read more in a recently posted blog about how Splunk and Dell EMC can help you on your journey to valuable insights with machine-generated data.