Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
Custom Solutions Engineering Blog Custom Solutions Engineering 9
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Dell Cloud Blog Cloud 42
Dell Cloud OpenStack Solutions - Blog Dell Cloud OpenStack Solutions 0
Dell Lifecycle Controller Integration for SCVMM - Blog Dell Lifecycle Controller Integration for SCVMM 0
Dell Premier - Blog Dell Premier 3
Dell TechCenter TechCenter 1,862
Desktop Authority Desktop Authority 25
Featured Content - Blog Featured Content 0
Foglight for Databases Foglight for Databases 35
Foglight for Virtualization and Storage Management Virtualization Infrastructure Management 256
General HPC High Performance Computing 229
High Performance Computing - Blog High Performance Computing 35
Hotfixes vWorkspace 66
HPC Community Blogs High Performance Computing 27
HPC GPU Computing High Performance Computing 18
HPC Power and Cooling High Performance Computing 4
HPC Storage and File Systems High Performance Computing 21
Information Management Welcome to the Dell Software Information Management blog! Our top experts discuss big data, predictive analytics, database management, data replication, and more. Information Management 229
KACE Blog KACE 143
Life Sciences High Performance Computing 12
OMIMSSC - Blogs OMIMSSC 0
On Demand Services Dell On-Demand 3
Open Networking: The Whale that swallowed SDN TechCenter 0
Product Releases vWorkspace 13
Security - Blog Security 3
SharePoint for All SharePoint for All 388
Statistica Statistica 24
Systems Developed by and for Developers Dell Big Data 1
TechCenter News TechCenter Extras 47
The NFV Cloud Community Blog The NFV Cloud Community 0
Thought Leadership Service Provider Solutions 0
vWorkspace - Blog vWorkspace 512
Windows 10 IoT Enterprise (WIE10) - Blog Wyse Thin Clients running Windows 10 IoT Enterprise Windows 10 IoT Enterprise (WIE10) 6
Latest Blog Posts
  • vWorkspace - Blog

    What's new for vWorkspace - April / May / June 2018

    Now updated quarterly, this publication provides you with new and recently revised information and is organized in the following categories; Documentation, Notifications, Patches, Product Life Cycle, Release, Knowledge Base Articles.

    Subscribe to the RSS (Use IE only)

     

    Knowledgebase Articles

    New 

    254153 - Mandatory Hotfix 655084 for 8.6 MR3 Connection Broker

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles: Connection Broker Please see the full Release Notes...

    Created: April 16, 2018

     

    254155 - Mandatory Hotfix 655082 for 8.6 MR3 Management Console

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles: Management Console Please see the full Release Notes...

    Created: April 16, 2018

     

    254156 - Mandatory Cumulative Hotfix 655081 for Web Access

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles:   Web Access   Please see the full Release Notes...

    Created: April 16, 2018

     

    254158 - Mandatory Cumulative Hotfix 655085 for 8.6 MR3 Windows connector

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles: Windows Connector Please see the full Release Notes...

    Created: April 16, 2018

     

    254164 - Mandatory Hotfix 655086 for PNTools / RDSH role

    This is a Mandatory hotfix and can be installed on the following vWorkspace roles: Remote Desktop Session Host (RDSH) PNTools (VDI...

    Created: April 16, 2018

     

    254191 - Mandatory Hotfix 655071 for 8.6 MR3 Android Connector

    This mandatory hotfix addresses the following issues:   Two-Factor Authentication stabilization BYOD stabilization Password Manager...

    Created: April 17, 2018

     

    255135 - Mandatory Hotfix 655127 for 8.6 MR3 Mac connector

    This cumulative mandatory hotfix addresses the following issues: The macOS Connector might stop working if the macOS device is an Active...

    Created: May 4, 2018

     

     255201 - Mandatory Hotfix 655128 for 8.6 MR3 iOS Connector

    This mandatory hotfix addresses the following issues:   Fixes and improvements of the BYOD feature Fixes and improvements of the...

    Created: May 7, 2018

     

     255659 - An unexpected error has occurred in the program. Failed to activate control VB.UserControl

    When launching the vWorkspace Management console you receive the following error:   An unexpected error has occurred in the program. Failed to...

    Created: May 17, 2018

     

    Revised

    178358 - Windows 10 Support

    Microsoft Windows 10 is now supported as of vWorkspace 8.6.2; Creator Update (1703) is added with vWorkspace 8.6.3

    Revised: April 2, 2018

     

    225565 - Is VMware 6.7 currently supported

    Is VMware vSphere 6.7 supported in any of the current versions of vWorkspace?

    Revised: April 18, 2018

     

    255659 - An unexpected error has occurred in the program. Failed to activate control VB.UserControl

    When launching the vWorkspace Management console you receive the following error:   An unexpected error has occurred in the program. Failed to...

    Revised: June 18, 2018

     

    58105 - Troubleshooting connectivity issues with vWorkspace and VDI/TS systems.

    Various errors can occur if communications or connectivity issues exist between the vWorkspace Connection Broker(s) and the Virtual Desktop...

    Revised: June 25, 2018

     

     56803 - How to Enable Connection Broker Logging

    Steps to enable Connection Broker logging in vWorkspace.

    Revised: June 29, 2018

     

     88861 - How to repair or fully rebuild Windows WMI Repository

    How to repair or fully rebuild Windows WMI Repository

    Revised: June 29, 2018

     

     105373 - Video: How to enable the Diagnostics Tool

    How to use the diagnostic tool in vWorkspace 8.x.

    Revised: June 29, 2018

     

     

    Product Life Cycle - vWorkspace

     

  • Life Sciences

    The Complexity of Learning Concept - Machine Learning in Genomics #1

    How much data do we need to use a ML algorithm?

    Although this is the most common question, it is hard to answer since the amount of data mainly depends on how complex the learning concept is. In Machine Learning (ML), the learning complexity can be broken down into informational and computational complexities. Further, informational complexity considers two aspects, how many training examples are needed (sample complexity) and how fast a learner/model’s estimate can converge to the true population parameters (rate of convergence). Computational complexity refers the types of algorithms and the computational resources to extract the learner/model’s prediction within a reasonable time. As you can guess now, this blog will cover informational complexity to answer the question.

    Learn from an example – ‘To be or Not to be banana’

    Let’s try to learn what banana is. In this example, banana is the learning concept (one hypothesis, that is ‘to be’ or ‘not to be banana’), and the various descriptions associated with banana can be featuresIdea such as colors and shapes. Unlike the way human can process the concept of banana – the human does not require non-banana information to classify a banana, typical machine learning algorithm requires counter-examples. Although there is One Class Classification (OCC) which has been widely used for outlier or anomaly detection, this is harder than the problem of conventional binary/multi-class classification.

    Let’s place another concept ‘Apple’ into this example and make this practice as a binary-class classification. By doing this, we just made the learning concept simpler, ‘to be banana = not apple’ and ‘not to be banana = apple’. This is little counter-intuitive since adding an additional learning concept into a model makes the model simpler: however, OCC basically refers one versus all others, and the number of all other cases are pretty much infinite. This is where we are in terms of ML; one of the simplest learning activities for human is the most difficult problem to solve in ML. Before generating some data for banana, we need to define some terms.

    • Instances[ii] X describes banana with features such as color (f1 = yellow, green or red, |f1|=3), shape (f2 = cylinder or sphere, |f2|=2) and class label (C → {banana, apple}, |C|=2). These values for color and shape need to be enumerated. For examples, we can assign integers to each value like (Yellow=1, Green=2, Red=3), (Cylinder=1, Sphere=2) and (banana=0, apple=1) (Table 1)
    • The target function t generates a prediction for ‘is this banana or apple’ as a number ranging between 0 ≤ t(xi) ≤ 1. Typically, we want to have a prediction, t(xi) as close as c(xi), 0 ≤ in, where n is the total number of samples.
    • The hypothesis space H can be defined as the conjunction of features and target function h(xi) = (f1i, f2i, t(xi)).
    • Training examples S must contain roughly the same number of banana (0) and apple (1) examples. A sample is described as s(xi) = (f1i, f2i, c(xi))

    Sample complexity – estimate the size of training data set in a quick and dirty way

    Ideally, we want to have all the instances in the training sample set S covering all the possible combinations of features with respect to t as you can see in Table 1. There are three possible values for f1 and two possible values for f2. Also, there are two classes in this example. Therefore, the number of all the possible instances |X| = |f1| x |f2| x |C| = 3 x 2 x 2 = 12. However, f2 is a lucky feature[iii] that is mutually exclusive between banana and apple. Hence, |f2| is considered as 1 in this case. In addition to that, we can subtract one case because there is no red banana. For this example, only 5 instances can exhaust the entire sample space H. In general, the number of features (columns) in a data set is exponentially proportional to the required number of training samples (|S| = n, where n is the unique number of samples in the set). If we assume that all features are binary like a simple value of yes or no, then |X| = 2 x 2 x 2 = 23. Two to the power of the number of columns is the minimum n in the simplest case. This example only works when the values in all the features are discrete values. If we use the gradient color values for Color (RGB 256 color pallet ranges from 0 to 16777215 in decimal), the required number of training samples will increase quite significantly because now you need to multiply 16777216 for f1 if all the possible colors exist in H.

    It is worth noting that the number of instances we calculate here does not always guarantee that a learner/model can converge properly. If you have the number of data equal or below this number, the amount of data is simply too small for the most of algorithm except a ML algorithm evaluating one feature at a time such as a decision tree. As a rough rule of thumb, many statisticians say that a sample size of 30 is large enough. This rule can be applied for a regression based ML algorithm that assumes one smooth linear decision boundary. Although an optimal n could be different on a case-by-case basis, it is not a bad idea to start from the total number of samples of N = |X| x 30.

    Table 1 Training sample set S

    Instances

    f1

    Color (Yellow=1, Green=2, Red=3)

    f2

    Shape (Cylinder=1, Sphere=2)

    Ci

    Class = Banana (0) or Apple (1)
    x1, Banana 1 1 1 0
    x2, Banana 2 2 1 0
    x3, Apple 1 1[iv] 2 1
    x4, Apple 2 2 2 1
    x5, Apple 3 3 2 1

    Learning curve – an accurate way to estimate the size of training data set  

    In ML, learning curve refers a plot of the classification accuracy against the training set size. This is not an estimation method; it requires building a classification model multiple times with the different size of training data set. This is a good technique for sanity-checks (underfitting – high bias and overfitting – high variance) for a model. It also can be utilized to optimize/improve the performance.

    Figure 1 shows two examples of learning curves that represent underfitting (left-side plot) and overfitting (right-side plot). Basically, these are the plots of the training and cross-validation error (root mean square error in this example) when the size of training set increases. For underfitting case (left-side plot), both the training and cross-validation errors are very high, and the increment of training set size does not help. This indicates that the features in the data set are not quite relevant to the target learning concept. The examples are confusing the model. On the other hand, the right-side plot shows an overfitting case. The validation error is much higher than the training error in this case. The model trains on the identical samples repeatedly, and the training error climbs continuously while the cross-validation error continue to decrease. The performance for an overfitted model usually looks good; however, it will fail miserably for the real-life data.

    Back in the day, not a single ML paper was accepted without a learning curve. Without this simple plot, the entire performance claim will be unverifiable.

    Resources

    Internal web page

    External web page

     

    Contacts
    Americas
    Kihoon Yoon
    Sr. Principal Systems Dev Eng
    Kihoon.Yoon@dell.com
    +1 512 728 4191



    Idea Or attributes. A feature is an individual measurable property or characteristic of a phenomenon being observed.

    [ii] Instance is also referred as example or sample

    [iii] If a feature like f2 exists in a data set, we could make 100% accurate prediction simply by looking at f2.

    [iv] Is there yellow apple? Yes, Golden Delicious is yellow.

  • Dell TechCenter

    DellEMC Release of VMware vSphere 6.7

    This blog post is written by Vijay Kumar from Hypervisor Engg. Dell EMC

    VMware vSphere 6.7 is the next major release after the update release vSphere 6.5 Update 1. vSphere 6.7 was released from VMware on April 17, 2018. The importance of vSphere 6.7 is the number of bug fixes contained in the software as well as a number of feature updates. vSphere 6.7 is supported on Dell EMC's 13th and 14th Generation of PowerEdge Servers.

    This version of ESXi is now Factory installed and shipped from Dell. Customers who choose to buy Dell EMC's 14th generation of servers can opt for factory Installation on either the BOSS-S1 (Boot Optimized Storage System) M.2 drives or on the Dual IDSDM SD cards. Dell customized version of ESXi 6.7 is posted at Dell support page

    VMware ESXi 6.7 New Features

    Refer to VMware's white paper on what's new with vSphere 6.7. Some of the highlights are below

    • NVDIMM-N (pMEM) 
    • TPM 2.0 support
    • 4KN enablement
    • QuickBoot
    • Instant Clone
    • Per-VM EVC
    • RoCE v2

    Dell 6.7 documents posted @ Dell support page

    • VMware vSphere 6.7 on Dell EMC PowerEdge Servers Compatibility Matrix
    • VMware ESXi vMotion Support on Dell EMC PowerEdge Servers Compatibility Matrix
    • VMware vSphere ESXi 6.7 on Dell EMC PowerEdge Servers Installation Instructions and Important Information Guide
    • VMware vSphere ESXi 6.7 on Dell EMC PowerEdge Systems Image Customization Information
    • VMware vSphere 6.7 on Dell EMC PowerEdge Servers Release Notes
    • VMware vSphere 6.7 on Dell EMC PowerEdge Servers Getting Started Guide

    References

  • Dell TechCenter

    Top 5 Most Exciting Dockercon 2018 Announcements

    Yet another amazing Dockercon !

    I attended Dockercon 2018 last week which happened in the most geographically blessed city of Northern California – San Francisco & in the largest convention and exhibition complex- Moscone Center. With over 5000+ attendees from around the globe, 100+ sponsors, Hallway tracks, workshops & Hands-on labs, Dockercon allowed developers, sysadmins, Product Managers & industry evangelists come closer to share their wealth of experience around the container technology.This time I was lucky enough to get chance to visit Docker HQ, Townsend Street for the first time. It was an emotional as well as proud feeling to be part of such vibrant community home.

    This Dockercon, there has been couple of exciting announcements.Three of the new features were targeted at Docker EE, while the two were for Docker Desktop. Here’s a rundown of what I think are the most 5 exciting announcements made last week:

    Under this blog post, I will go through each one of the announcements in details.

    1. Federated Application Management in Docker Enterprise Edition

     

    With an estimated 85% of today’s enterprise IT organizations employing a multi-cloud strategy, it has become more critical that customers have a ‘single pane of glass’ for managing their entire application portfolio. Most enterprise organisations have a hybrid and multi-cloud strategy. Containers has helped to make applications portable but let us accept the fact that even though containers are portable today but the management of containers is still a nightmare. The reason being –

    • Each Cloud is managed under a separate operational model, duplicating efforts
    • Different security and access policies across each platform
    • Content is hard to distribute and track
    • Poor Infrastructure utilisation still remains
    • Emergence of Cloud-hosted K8s is exacerbating the challenges with managing containerised applications across multiple Clouds

    This time Docker introduced new application management capabilities for Docker Enterprise Edition that will allow organisations to federate applications across Docker Enterprise Edition environments deployed on-premises and in the cloud as well as across cloud-hosted Kubernetes. This includes Azure Kubernetes Service (AKS), AWS Elastic Container Service for Kubernetes (EKS), and Google Kubernetes Engine (GKE). The federated application management feature will automate the management and security of container applications on premises and across Kubernetes-based cloud services.It will provide a single management platform to enterprises so that they can centrally control and secure the software supply chain for all the containerized applications.

    With this announcement, undoubtedly Docker Enterprise Edition is the only enterprise-ready container platform that can deliver federated application management with a secure supply chain. Not only does Docker give you your choice of Linux distribution or Windows Server, the choice of running in a virtual machine or on bare metal, running traditional or microservices applications with either Swarm or Kubernetes orchestration, it also gives you the flexibility to choose the right cloud for your needs.

    If you want to read more about it, please refer this official blog.

    2. Kubernetes Support for Windows Server Container in Docker Enterprise Edition

    The partnership between Docker and Microsoft is not new. They have been working together since 2014 to bring containers to Windows and .NET applications. This DockerCon, Docker & Microsoft both shared the next step in this partnership with the preview and demonstration of Kubernetes support on Windows Server with Docker Enterprise Edition.

    With this announcement, Docker is the only platform to support production-grade Linux and Windows containers, as well as dual orchestration options with Swarm and Kubernetes.

    There has been a rapid rise of Windows containers as organizations recognize the benefits of containerisation and want to apply them across their entire application portfolio and not just their Linux-based applications.

    Docker and Microsoft brought container technology into Windows Server 2016, ensuring consistency for the same Docker Compose file and CLI commands across both Linux and Windows. Windows Server ships with a Docker Enterprise Edition engine, meaning all Windows containers today are based on Docker. Recognizing that most enterprise organizations have both Windows and Linux applications in their environment, we followed that up in 2017 with the ability to manage mixed Windows and Linux clusters in the same Docker Enterprise Edition environment, enabling support for hybrid applications and driving higher efficiencies and lower overhead for organizations. Using Swarm orchestration, operations teams could support different application teams with secure isolation between them, while also allowing Windows and Linux containers to communicate over a common overlay network.

    If you want to know further details, refer this official blog.

    3. Docker Desktop Template-Based Workflows for Enterprise Developers

    Dockercon 2018 was NOT just for Enterprise customers, but  also for Developers. Talking about the new capabilities for Docker Desktop, it is getting new template-based workflows which will enable developers to build new containerized applications without having to learn Docker commands or write Docker files. This template-based workflows will also help development teams to share their own practices within the organisation.

    On the 1st day of Dockercon, Docker team previewed an upcoming Docker Desktop feature that will make it easier than ever to design your own container-based applications. For a certain set of developers, the current iteration of Docker Desktop has everything one might need to containerize an applications, but it does require an understanding of the Dockerfile and Compose file specifications in order to get started and the Docker CLI to build and run your applications.

    In the upcoming Docker Desktop release, you can expect the below features –

    • You will see new option – “Design New application”  as shown below Preference Pane UI.

    • It will be 100% graphical tool/feature.
    • This tool is a gift for anyone who doesn’t want to write Dockerfiles or Docker compose file.
    • Once a user click the button to start the “Custom application” workflow , he will be presented with a list  services which he can add to the application.
    • Each service which is selected will eventually become a container in the final application, but Docker Desktop will take care of creating the Dockerfiles and Compose files  as complete later steps.
    • Under this beta release, currently one can do some basic customization to the service like changing versions, port numbers, and a few other options depending on the service selected.
    • When all the services are selected, one should be ready to proceed, supply the application a name and specify where to store the files that will be generated and then hit the “Assemble” button.
    • The assemble step creates the Dockerfiles for each service, the Compose file used to start the entire application, and for most services some basic code stubs, giving one enough to start the application.

     

    If you’re interested in getting early access to the new app design feature in Docker Desktop then please sign up at beta.docker.com.

    4. Making Compose Easier with to use with the Application Packages

    Soon after Dockercon, one of the most promising tool announced for Developers was Docker Application Packages (docker-app). The “docker-app” is an experimental utility to help make Compose files more reusable and sharable.

    What problem does application packages solve?

    Compose files do a great job of describing a set of related services. Not only are Compose files easy to write, they are generally easy to read as well. However, a couple of problems often emerge:

    1. You have several environments where you want to deploy the application, with small configuration differences
    2. You have lots of similar applications

    Fundamentally, Compose files are not easy to share between concerns. Docker Application Packages aim to solve these problems and make Compose more useful for development and production.

    In my next blog post, I will talk more about this tool. If you want to try your hands, head over to https://github.com/docker/app

    5. Upcoming Support for Serverless Platform under Docker EE

    Recently, Function as a Service (FaaS) programming paradigm has gained a lot of traction in the cloud community. At first, only large cloud providers such as AWS Lambda, Google Cloud Functions or Azure Functions provided such services with a pay-per-invocation model, but since then interest has increased for developers and entreprises to build their own solutions on an open source model.

    This Dockercon, Docker identified at least 9 different frameworks out of which the following six: OpenFaaSnuclioGestaltRiffFn and OpenWhisk were already confirmed to be supported under the upcoming Docker Enterprise Edition. Docker, Inc started an open source repository to document how to install all these frameworks on Kubernetes on Docker EE, with the goal of providing a benchmark of these frameworks: docker serverless benchmark Github Repository. Pull Requests are welcome to document how to install other serverless frameworks on Docker EE.

    Did you find this blog helpful? I am really excited about the upcoming Docker days and feel that these upcoming features will really excite the community. If you have any questions, join me this July 7th at Docker Bangalore Meetup Group, Nutanix Office where I can going to go deeper into Dockercon 2018 Announcements. See you there.

  • General HPC

    New NVIDIA V100 32GB GPUs, Initial performance results

    Deepthi Cherlopalle, HPC and AI Innovation Lab. June 2018

    GPUs are useful for accelerating large matrix operations, analytics, deep learning workloads and several other use cases. NVIDIA introduced the Pascal line of their Tesla GPUs in 2016, the Volta line of GPUs in 2017, and recently announced their latest Tesla GPU based on the Volta architecture with 32GB of GPU memory. The V100 GPU is available with both PCIe and NVLink version, allowing GPU-to-GPU communication over PCIe or over NVLink. The NVLink version of the GPU is also called an SXM2 module.

    This blog will give an introduction to the new Volta V100-32GB GPUs and compare the HPL performance between different V100 models. Tests were performed using a Dell EMC PowerEdge C4140 with both PCIe and SXM2 configurations.  There are several other platforms which support GPUs:  PowerEdge R740, PowerEdge R740XD, PowerEdge R840, and PowerEdge R940xa.  A similar study was conducted in the past comparing the performance of the P100 and V100 GPUs with the HPL, HPCG, AMBER, and LAMMPS applications. 

    Table 1 below provides an overview of Volta device specifications.

    Table 1: GPU Specifications
     
     
     
    GPU Architecture
    Volta
    NVIDIA Tensor cores
    640
    NVIDIA CUDA Cores
    5140
    GPU Max Clock Rate
    1380MHz
    1530MHz
    Double precision performance
    7TFlops
    7.8TFlops
    Single precision performance
    14TFlops
    15.7TFlops
    GPU memory
    16/32GB
    16/32GB
    Interconnect Bandwidth
    32GB/s
    300GB/s
    System Interface
    PCIe Gen3
    NVIDIA NVLink
    Max Power Consumption
    250 watts
    300 watts

    The PowerEdge C4140 Server is an accelerator optimized server with support for two Intel Xeon Scalable processors and four NVIDIA Tesla GPUs (PCIe or NVLink) in a 1U form factor. The PCIe version of the GPUs is supported with standard PCIe Gen3 connections between GPU to CPU. The NVLink configuration allows GPU-to-GPU communication over the NVLink interconnect. Applications that can take advantage of the higher NVLink bandwidth and the higher clock rate of the V100-SXM2 module can benefit from this option. The PowerEdge C4140 platform is available in four different Configurations: B, C, K, and G. The configurations are distinct in their PCIe lane layout and NVLink capability and are shown in Figure 1 through Figure 4.

    In Configuration B, the GPU to GPU communication is through a PCIe switch, and the PCIe switch is connected to a single CPU. In Configuration C and G two GPUs are connected to each CPU, however in Configuration C the two GPUs are directly connected to each CPU, where as in Configuration G the GPUs are connected to the CPU via a PCIe switch. The PCIe Switch in Configuration G is logically divided into two virtual switches mapping 2GPUs to each CPU. In Configuration K, GPU-to-GPU communication is over NVLink, with all GPUs connected to a single CPU. As seen in the figures below all the configurations have additional x16 slots available apart from the GPU slots.

    Figure 1: PowerEdge C4140 Configuration B                                          Figure 2: PowerEdge C4140 Configuration C

    Figure 3: PowerEdge C4140 Configuration G                                         Figure 4: PowerEdge C4140 Configuration K

    The PowerEdge C4140 platform can support a variety of Intel Xeon CPU models, up to 1.5 TB of memory with 24 DIMM slots, multiple network adapters and provides several local storage options. For more information on this server click here.

    To evaluate the performance difference between the V100-16GB and the V100-32GB GPUs, a series of tests were conducted. These tests were run on a single PowerEdge C4140 server with the configurations detailed below in Table 2-4.

    Table 2: Tested Configurations Details

    Table 3: Hardware Configuration

     

    Table 4: Software/Firmware Configuration:

    HPL performance

    High Performance Linpack (HPL) is a standard HPC benchmark used to measure computing power. It is also used as a reference benchmark by the Top500 list to rank supercomputers worldwide. This benchmark provides a measurement of the peak computational performance of the entire system. There are few parameters that are significant in this benchmark:

    • N is the problem size
    • NB is the block size
    • Rpeak is the theoretical peak of the system.
    • Rmax is the maximum measured performance achieved on the system.
    • The efficiency is defined as the ratio of Rmax to Rpeak.

     The resultant performance of HPL is reported in GFLOPS.

     N is the problem size provided as input to the benchmark and determines the size of the dense linear matrix that is solved by HPL. HPL performance tends to increase with increasing N value (problem size) until limits of system memory, CPU or data communication bandwidth begins to limit the performance. For GPU system, the highest HPL performance will commonly occur when the problem size is close to the size of the GPUs memory and the performance will be higher when a larger problem size will fit in that memory.

    In this section of the blog, the HPL performance of the NVIDIA V100-16GB and the V100-32GB GPUs is compared using PowerEdge C4140 configuration B and K (refer to Table 2). Recall that configuration B uses PCIe V100s with 250W power limit and configuration K uses SXM2 V100s with higher clocks and 300W power limit. Figure 5 shows the maximum performance that can be achieved on different configurations. We measured a 14% improvement when running HPL on V100-32GB with PCIe versus V100-16GB with PCIe, and there was a 16% improvement between V100-16GB SXM2 and V100-32GB SXM2.  The size of the GPU memory made a big difference in terms of performance as the larger memory GPU can accommodate a larger problem size, a larger N.

    As seen in Table 1 the V100-16GB, V100-32GB PCIe and V100-16GB, V100-32GB SXM2 have the same number of cores, double precision performance and GPU Bandwidth except for the GPU memory. We also measured ~6% HPL performance improvement from PCIe to SXM2 GPUs which is a small delta in HPL performance but Deep learning frameworks like Tensor Flow and Caffe show much more performance improvement.

    Running HPL using only CPUs yields ~2.3TFLOPS with the Xeon Gold 6148; therefore, one PowerEdge C4140 system with four GPUs provides floating point capabilities equal to about nine two socket Intel Xeon 6148 servers. 

     

    Figure 5:  HPL Performance on different C4140 configurations.

    Figure 6 and Figure 7 shows the performance of V100 16GB vs 32GB GPU for different values of N. Table 2  shows the configurations used for this test. These graphs helps us visualize how the GPU cards perform with different problem sizes. As explained above, the problem size is calculated based on the size of the GPU memory, the 32GB GPU can accommodate a larger problem size than the 16GB GPU. When a problem size that is larger than what will fit in GPU memory is executed on a GPU system, the system memory attached to the CPU is used, and this leads to a drop in performance as the data must move from system memory to GPU memory. For ease of understanding the test data is split into two different graphs.

     

    Figure 6:  HPL performance with different problem sizes (N)

    In Figure 6 we notice that the HPL performance for both the cards is similar until the problem size (N) approximately fills up V100-16GB memory, the same problem size (N) would approximately fill up half the memory for V100-32GB GPUs. In the second graph in Figure 7 we notice that the performance of the V100 16GB GPU drops as it cannot fit larger problem sizes in the GPU memory and must start to use system host memory. The 32GB GPU continues to perform better with larger and larger N until the problem size reaches the maximum capacity of the V100 32GB memory.

     

    Figure 7:  HPL performance with different problem sizes (N)

    Conclusion and Future work:

    PowerEdge C4140 is one of the most prominent GPU based server options for HPC related solutions. We measured a 14-17% improvement in HPL performance when moving from the smaller memory V100-16GB GPU to the larger memory V100-32GB GPU. For memory bound applications, the new Volta 32GB cards would be the preferred option.

    For future work, we will run molecular dynamic applications, deep learning workloads and compare the performance between different Volta cards and C4140 configurations.  

    Please contact HPC innovation lab if you’d like to evaluate the performance of your application on PowerEdge Servers.

     

  • General HPC

    Collaboration Showcase: Dell EMC, TACC and Intel join forces on Stampede2 performance studies

    Dell EMC Solutions, June 2018

     

    Stampede2 system, is the result of collaboration between the Texas Advanced Computing Center (TACC), Dell EMC and Intel. Stampede2 consists of 1,736 Dell EMC PowerEdge C6420 nodes with dual-socket Intel Skylake processors, 4,204 Dell EMC PowerEdge C6320p nodes with Intel Knights Landing bootable processors, a total of 5,940 compute nodes, and 24 additional login and management servers, Dell EMC Networking H-series switches, all interconnected by an Intel Omni-Path Architecture (OPA) fabric.

     

    Two technical white papers were recently published through the joint efforts of TACC, Dell EMC and Intel. One white paper describes the Network Integration and Testing Best Practices on the Stampede2 cluster. The other white paper discusses the Application Performance of Intel Skylake and Intel Knights Landing Processors on Stampede2 and highlights the significant performance advantage of Intel Skylake processor at a multi node scale in four commonly used applications: NAMD, LAMMPS, Gromacs and WRF. For build details, please contact your Dell EMC representative. If you have VASP license, we are happy to share VASP benchmark results as well.

     

    Deploying Intel Omni-Path Architecture Fabric in Stampede2 at the Texas Advanced Computing CenterNetwork Integration and Testing Best Practices (H17245)

     

    Application Performance of Intel Skylake and Intel Knights Landing Processors on Stampede2 (H17212)

  • Dell TechCenter

    Persistent Memory (NVDIMM-N) support on Dell EMC PowerEdge servers and VMware ESXi

    This blog is written by Dell Hypervisor Engg.

    Persistent Memory(also known as Non Volatile Memory (NVM)) is a random access memory type which retains it’s contents even when system power goes down in the event of an unexpected power loss, user initiated shutdown, system crash etc. Dell EMC introduced support for NVDIMM-N from their 14th generation of PowerEdge servers. VMware announced support for NVDIMM-N from vSphere ESXi 6.7 onwards. The NVDIMM-N resides in a standard CPU Memory slot, placing data closer to the processor thus reducing the latency and fetch maximum performance. This document detail about the support stance of NVDIMM-N and VMware ESXi specific to Dell EMC PowerEdge servers. This paper provides an insight into the usecases where NVDIMM is involved and the behavior caveats of the same. 

    Dell EMC support for Persistent Memory (PMem) and VMware ESXi

    Dell EMC started supporting PMem (also known as Non-volatile Memory (NVM)) from their 14th generation of PowerEdge server release onwards. However, VMware introduced their support of NVDIMM from vSphere 6.7 release. Refer to section “Server Hardware Configuration” in NVDIMM-N user guide to know the PowerEdge server models that supports NVDIMM-N. The server support matrix is same for VMware ESXi as well.  The hardware and firmware requirements for NVDIMM-N to function properly in ESXi is documented in the user guide. Dell EMC highly recommend customers to go through the user guide before getting started with NVDIMM-N.
     
    Refer to Dell EMC whitepaper to know about the usecases, utilities available to monitor and manage NVDIMM-N on VMware ESXi.

  • Dell TechCenter

    Why 4K drive recommended for OS installation?

    Overview

    This blog helps to understand why the transition happened from 512 bytes sector disk to 4096 bytes sector disk. The blog also gives answers to why 4096 bytes (4K) sector disk should be opted for OS installation. The blog first explains about sector layout to understand the need of migration, then gives reasoning behind the migration and finally it covers the benefits of 4K sector drive over 512 bytes sector drive.

    Sector layout

    A sector is the minimum storage unit of a hard disk drive. It is a subdivision of a track on a hard disk drive. The sector size is an important factor in the design of Operating System because it represents the atomic unit of I/O operations on a hard disk drive. In Linux, you can check the size of the disk sector using "fdisk -l" command.

                                                                  Figure-1: The disk sector size in Linux

    As shown in Figure-1, both the logical and physical sectors are 512bytes long for this Linux system.

    The sector layout is structured as follows:
    1) Gap section: Each sector on a drive is separated by gap section.
    2) Sync section: It indicates the beginning of the sector.
    3) Address Mark section: It contains information related to sector identification e.g. sector’s number and location.
    4) Data section: It contains the actual user data.
    5) ECC section: It contains error correction codes that are used to repair and recover data that might be damaged during the disk read/write process.

    Each sector stores a fixed amount of user data, traditionally 512 bytes for hard disk drives. But because of better data integrity at higher densities and robust error correction capabilities newer HDDs now store 4096 bytes (4K) in each sector.

    Need for large sector

    The number of bits stored on a given length of track is termed as areal density. Increasing areal density is a trend in the disk drive industry not only because it allows greater volumes of data to be stored in the same physical space but it also improves transfer speed at which that medium can operate. With the increase in areal density, the sector has now consumed a smaller and smaller amount of space on the hard drive surface. This creates a problem because the physical size of the sectors on hard drives has shrunk but media defects have not. If the data in a hard drive sector consumes smaller areas then error correction becomes challenging. This is because media defects of the same size can damage a higher percentage of the data in the disk which has small area for a sector than the disk which has large area for a sector.

    There are two approaches to solve this problem. The first approach is to invest more disk space to ECC bytes to assure continued data reliability. But if we invest more disk space to ECC bytes this will lead to less disk format efficiency. Disk format efficiency is defined as (number of user data bytes X 100) / total number of bytes on disk. Another disadvantage is that the more ECC bits included, the disk controller requires more processing power to process the ECC algorithm. 

    Second approach is to increase the size of the data block and slightly increase the ECC bytes for each data block. With the increase of data block size, the amount of overhead required for each sector to store control information like gap, sync, address mark section etc. would reduce. For each sector the ECC bytes will increase but overall ECC bytes required for a disk would reduce because of larger sector. Reducing the overall amount of space used for error correction code improves format efficiency and increased ECC bytes for each sector gives capability to use more efficient and powerful error-correction algorithms. Thus, transition to a larger sector size has two benefits: improved reliability and greater disk capacity. 

    Why 4K only?

    From a throughput perspective, the ideal block size should be roughly equal to the characteristic size of a typical data transaction. We have to acknowledge that the average file size today is more than 512 bytes. Now a days applications in modern systems use data in large blocks, much larger than the traditional 512-byte sector size. Too small block sizes cause too much transaction overhead. While in case of large block sizes each transaction transfers a large amount of unnecessary data. 

    The size of a standard transaction in relational data Base systems is 4K. The consensus of opinion in the hard disk drive industry has been that physical block sizes of 4K-Block would provide a good compromise. It also corresponds to paging size used by operating systems and processors.

    Benefits

    • Improvement in Format Efficiency

      Figure-2: 512 bytes block vs 4096 bytes block

     

    Figure-3: Format Efficiency improvement in 4K disk

     512 byte sector format   4096 byte sector forma
     Gap, sync & address mark   15 bytes  15 bytes
     User data  512 bytes  4096 bytes
     Error-correcting code  50 bytes  100 bytes
     Total  577 bytes  4211 bytes
     Format Efficiency  88.7%  97.3%

                                                         Table-1: Format Efficiency improvement in 4K disk

     

    As we see in Figure-2, 4K sectors are 8 times as large as traditional 512 byte ones. Hence for the same data payload one need 8 times less gap, sync and address mark sections and 4 times less error correction code section. Reducing the amount of space used for error correction code and other non-data section improves format efficiency for 4K Format. Format efficiency improvement is shown in Figure-3 and Table-1, there is a gain of 8.6% format efficiency for 4K sector disk over 512byte sector disk.

    • Reliability and Error Correction

    Figure-4: Effect of media defect on disk density

    As shown in Figure-4, the effect of media defect on disk with higher areal density is more than the disk with the lower areal density disk. As areal density increases we need more ECC bytes to retain same level of error correction capability. The 4K format provides enough space to expand the ECC field from 50 to 100 bytes to accommodate new ECC algorithms. The enhanced ECC coverage improves the ability to detect and correct processed data errors beyond the 50-byte defect length associated with the 512-byte sector format.

    4K drive Support on OS & Dell PowerEdge Servers

    4K Data disks are supported on Windows Server 2012 but as boot disk only supported in UEFI mode. For Linux, 4K hard drives require a minimum of RHEL 6.1 and SLES 11 SP2. 4K boot drives are only supported in UEFI mode in Linux. Kernel support for 4K drives is available in kernel versions 2.6.31 and above.
    PERC H330, H730, H730P, H830, FD33xS, and FD33xD cards support 4K block size disk drives, which enables you to efficiently use the storage space. 4K disks can be used on the Dell PowerEdge Servers supporting above PERC cards.

    Conclusion

    The physical size of each sector on the disk has become smaller as a result of increase in areal densities in disk drives. If the number of disk defects does not scale at the same rate, then we expect more sectors to be corrupted and we need strong error correction capability for each sector. Disk drives with larger physical sectors and more ECC bytes for each sector provide enhanced data protection and correction algorithms. The 4K format helps to achieve better format efficiencies and improves the reliability and error correction capability. This transition will result in better user experiences, hence the 4K drive should be opted for OS installation. 

    References:

    http://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/512e_4Kn_Disk_Formats_120413.pdf

    https://www.seagate.com/files/www-content/product-content/enterprise-performance-savvio-fam/enterprise-performance-15k-hdd/_cross-product/_shared/doc/seagate-fast-format-white-paper-04tp699-1-1701us.pdf

  • Life Sciences

    De Novo Assembly with SPAdes

    Overview

    We published the whitepaper, “Dell EMC PowerEdge R940 makes De Novo Aseembly easier”, last year to study the behavior of SOAPdenovo2 [1]. However, the whitepaper is limited to one De Novo assembly application. Hence, we want to expand our application coverage little further. We decided to test SPAdes (2012) since it is a relatively new application and reported for some improvement on the Euler-Velvet-SC assembler (2011) and SOAPdenovoIdea. SPAdes is also based on de Bruijn graph algorithm like most of the assemblers targeting Next Generation Sequencing (NGS) data. De Bruijin graph-based assemblers would be more appropriate for larger datasets having more than a hundred-millions of short reads.

     As shown in Figure 1, Greedy-Extension and overlap-layout-consensus (OLC) approaches were used in the very early next gen assemblers [2]. Greedy-Extension’s heuristic is that the highest scoring alignment takes on another read with the highest score. However, this approach is vulnerable to imperfect overlaps and multiple matches among the reads and leads to an incomplete assembly or an arrested assembly. OLC approach works better for long reads such as Sanger or other technology generating more than 100bp due to minimum overlap threshold (454, Ion Torrent, PacBio, and so on). De Bruijin graph-based assemblers are more suitable for short read sequencing technologies such as Illumina. The approach breaks the sequencing reads into successive k-mers, and the graph maps the k-mers. Each k-mer forms a node, and edges are drawn between each k-mer in a read.

    Figure 1 Overview of de novo short reads assemblers. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3056720/

    SPAdes is a relatively recent application based on de Bruijn graph for both single-cell and multicell data. It improves on the recently released Euler Velvet Single Cell (E +V- SC) assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data).

    All tests were performed on Dell EMC PowerEdge R940 configured as shown in Table 1. The total number of cores available in the system is 96, and the total amount of memory is 1.5TB.

    Table 1 Dell EMC PowerEdge R940 Configuration
    Dell EMC PowerEdge R940
    CPU 4x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz (Skylake)
    RAM 48x 32GB @2666 MHz
    OS RHEL 7.4
    Kernel 3.10.0-693.el7.x86_64
    Local Storage 12x 1.2TB 10K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard Drive in RAID 0
    Interconnect Intel® Omni-Path
    BIOS System Profile Performance Optimized
    Logical Processor Disabled
    Virtualization Technology Disabled
    SPAdes Version 3.10.1
    Python Version 2.7.13

    The data used for the tests is a paired-end read, ERR318658 which can be downloaded from European Nucleotide Archive (ENA). The read generated from blood sample as a control to identify somatic alterations in the primary and metastatic colorectal tumors. This data contains 3.2 Billion Reads (BR) with the read length of 101 nucleotides.

    Performance Evaluation

    SPAdes runs three sets of de Bruijn graphs with 21-mer, 33-mer, and 55-mer consecutively. This is the main difference with regards to SOAPdenovo2 which run a single k-mer, either 63-mer or 127-mer.

    In Figure 2, the runtimes, wall-clock times, are plotted in days (blue bars) with various number of cores, 28, 46, and 92 cores. Since we do not want to use the entire cores of each socket, 92 cores were picked as the maximum number of cores for the system. One core per socket was reserved for OS and other maintenance processes. Subsequent tests were done by reducing the number of cores in half. Peak memory consumptions for each case is plotted as a line graph. SPAdes runs significantly longer than SOAPdenovo2 due to the multiple iterations on three different k-mers.

    SPAdes benchmark

    The peak memory consumption is very similar to SOAPdenovo2. Both applications require slightly less than 800GB memory to process 3.2 BR.

    Conclusion

    Utilizing more cores helps to reduce the runtime of SPAdes significantly as shown in Figure 2. For SPAdes, it is recommendable to use the highest core count CPUs like Intel Xeon Plantinum 8180 processor with 28 cores and 3.80GHz to bring down the runtime further.

    Resources

    Internal web page

    1. http://en.community.dell.com/techcenter/blueprints/blueprint_for_hpc/m/mediagallery/20444301

    External web page

    1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874646/

    Contacts
    Americas
    Kihoon Yoon
    Sr. Principal Systems Dev Eng
    Kihoon.Yoon@dell.com
    +1 512 728 4191



    Idea It refers an earlier version of SOAPdenovo, not SOAPdenovo2.

  • Windows 10 IoT Enterprise (WIE10) - Blog

    Quick Start: Enhanced Out Of Box Experience (OOBE)

    I am excited to announce the availability of Quick Start on the newly launched Wyse 5070 WIE10 thin client. The Quick Start product runs on first boot and can be launched manually as required. Quick Start provides the end user with an enhanced first time out-of-box experience aka OOBE and informs the user about the product details-both hardware as well as software. Upon walking through the screens, the end user is prompted to configure the thin client if they chose to, or simply proceed with using their brand-new Dell Wyse 5070 thin client.

    Here are some screenshots:

    And: