Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
Custom Solutions Engineering Blog Custom Solutions Engineering 8
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Dell Cloud Blog Cloud 42
Dell Cloud OpenStack Solutions - Blog Dell Cloud OpenStack Solutions 0
Dell Lifecycle Controller Integration for SCVMM - Blog Dell Lifecycle Controller Integration for SCVMM 0
Dell Premier - Blog Dell Premier 3
Dell TechCenter TechCenter 1,858
Desktop Authority Desktop Authority 25
Featured Content - Blog Featured Content 0
Foglight for Databases Foglight for Databases 35
Foglight for Virtualization and Storage Management Virtualization Infrastructure Management 256
General HPC High Performance Computing 227
High Performance Computing - Blog High Performance Computing 35
Hotfixes vWorkspace 66
HPC Community Blogs High Performance Computing 27
HPC GPU Computing High Performance Computing 18
HPC Power and Cooling High Performance Computing 4
HPC Storage and File Systems High Performance Computing 21
Information Management Welcome to the Dell Software Information Management blog! Our top experts discuss big data, predictive analytics, database management, data replication, and more. Information Management 229
KACE Blog KACE 143
Life Sciences High Performance Computing 9
OMIMSSC - Blogs OMIMSSC 0
On Demand Services Dell On-Demand 3
Open Networking: The Whale that swallowed SDN TechCenter 0
Product Releases vWorkspace 13
Security - Blog Security 3
SharePoint for All SharePoint for All 388
Statistica Statistica 24
Systems Developed by and for Developers Dell Big Data 1
TechCenter News TechCenter Extras 47
The NFV Cloud Community Blog The NFV Cloud Community 0
Thought Leadership Service Provider Solutions 0
vWorkspace - Blog vWorkspace 511
Windows 10 IoT Enterprise (WIE10) - Blog Wyse Thin Clients running Windows 10 IoT Enterprise Windows 10 IoT Enterprise (WIE10) 4
Latest Blog Posts
  • Dell Cloud Blog

    Get started today with TP2 and get ready for Azure Stack

    At this week’s ignite, you must have heard about the availability of Technical Preview 2 (TP2) for Azure Stack. We are excited at Dell EMC and encourage our customers to deploy TP2 as one of the key first steps towards adopting Azure Stack before it is available at GA.

    At this year’s worldwide partner conference, Microsoft announced our partnership delivering Azure Stack as an Integrated System targeting availability of mid-2017. For Dell EMC, this is a continuation of our joint development partnership with Microsoft going back to the early days developing Cloud Platform Systems with a focus on delivering integrated Systems for Hybrid Cloud solutions to our customers. Azure Stack is the next phase of our partnership.

    TP2 helps you prepare your infrastructure, operations and your application teams to hit the ground running when we bring you the Integrated System for Azure Stack at GA. Key areas to start planning include (and not limited to):

    1. Application needs

      1. IaaS and PaaS capabilities within TP2 and leading into GA

      2. Capacity and Performance needs

      3. Scenarios: PoC\DevTest and Production

      4. DevOps practices and infrastructure

    2. Infrastructure needs

      1. Identity and Access for Admins and Tenants: Azure AD or ADFS

      2. Network integration into your existing border devices

      3. ITSM

      4. Organization Security posture

      5. People and process

    3. Deployment scenarios

      1. Azure connected or Island

      2. Single or Multi-Region

      3. Capacity and Scale Units

    You can confidently begin your journey knowing that at GA we will bring you an Integrated System to deliver Compute, Storage, Networking and the Azure Stack software pre-configured and fully supported to meet your application and infrastructure needs.

    With TP2, customers can deploy a single node Azure Stack system and explore user experiences around (and not limited to):

    1. Create plans and offers around key Azure Services

    2. Run cloud native workloads such as 3 tier app from ARM from

    3. Build a portfolio of cloud service using gallery items

    Our recommended TP2 System is an R630 designed to be cost effective as a Proof of Concept (PoC) node and enables you to test and develop PoCs for your use cases. This system non-resilient and is designed to only run as a single node PoC system. Multi-node configurations will be available at GA and this system will not be upgradable to multi-node configurations. This TP2 configuration is designed with the goal to continue to deliver updates to a single-node PoC System including TP3 and into GA targeted as PoC and Dev Test use cases. 

    PoC System Specification:

    R630 (2.5" Chassis)

    Chassis Configuration

    10 x 2.5" Chassis

    Processor Configuration:

    Default: Intel® Xeon® E5-2630 v4 2.2GHz,25M Cache,8.0 GT/s QPI,Turbo,HT,10C/20T (85W)
    Option 1: Intel® Xeon® E5-2640 v4 2.4GHz,25M Cache,8.0GT/s QPI,Turbo,HT,10C/20T (90W)
    Option 2: Intel® Xeon® E5-2650 v4 2.2GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (105W)

    Memory:

    Default: 128GB (8 x 16GB RDIMM, 2400Mhz)Option:256 GB (8 x 32 GB RDIMM, 2400Mhz)

    StorageController

    Internal:HBA330

    Storage -OS Boot:

    1 x 400GB Solid State Drive SATA Mix Use MLC 6Gpbs 2.5in Hot-plug Drive, S3610

    Storage - Cache (SSD):

    Default: 2 x 200GB Solid State Drive SATA Write Intensive 6Gbps 2.5in Hot-plug Drive, S3710

    Storage –Data (HDD):

    Default: 6x 1TB 7.2K RPM SATA 6Gbps 2.5in Hot-plug Hard Drive

    NetworkCards:

    NDC:Intel X520 DP 10Gb + Intel i350 DP 1 GbE

    Power Supply:

    Redundant750 Watts

    Contact one of our Cloud Specialists to order the system and reach us as mscloud_sales@dell.com. If you would like to see a demo of TP2, contact your Dell Account rep or Channel Partner Rep to engage the Dell EMC Customer Solution Center.

    Links to for more information on TP2:

    Microsoft TP2 Announce

    Microsoft Azure Stack PoC architecture

    TP2 deployment pre-requisites

    Download TP2

    Disclaimer: Dell does not offer support for Azure Stack TP2 at this time. Dell is actively testing and working closely with Microsoft on Azure Stack, but since it is still in development, the exact hardware components/configurations that Dell will fully support are still being determined.  The information divulged in our online documents prior to Dell launching and shipping Azure Stack may not directly reflect Dell supported product offerings with the final release of Azure Stack. We are, however, very interested in your results/feedback/suggestions! Please leave comments below.

  • Dell Big Data - Blog

    Meet us at Strata + Hadoop World and Let the Transformation Begin


    We are headed out to the Big Show for Big Data, the Strata+Hadoop World event being held September 27-29, in New York City. We look forward to meeting with partners and customers as we take a closer look at the customer journey and the possibilities that exist in driving Big Data Hadoop adoption. Dell EMC has integrated all the key components for modern digital transformation, taking you on a Big Data journey that focuses on analytics, integration, and infrastructure. We have a number of exciting discussions planned and invite you to attend the events or connect with our team directly at booth #501. We will have some great giveaways that you won’t want to miss out on. You can also join us throughout the conference for All Day - Facebook LIVE videos on the Dell EMC Big Data Facebook page.

  • Dell Big Data - Blog

    Hadoop World creates a stage for the new Dell EMC portfolio

    By Armando Acosta
     

    The Strata + Hadoop World conference gets under way today, Tuesday, September 26, at the Jacob Javits Center in New York City. As always, the event will be a showcase for leading-edge technologies related to big data, analytics, machine learning and the like, but this year’s event brings some added attractions.

    For starters, the conference will be the first major event to put the spotlight on the broad portfolio of Dell EMC solutions for unlocking the value of data and enabling the data analytics journey. As individual companies, both Dell and EMC had impressive product families in this space. And now that the two companies have become one newly formed company, the combined portfolio is arguably one of the best in the industry. In many ways, we’re talking about a “1 + 1 = 3” equation.

    The Dell EMC portfolio for big data and modern analytics includes integrated, end-to-end solutions based on validated architectures incorporating Cloudera distributions for Hadoop, Intel technologies, and analytic software, along with Dell EMC servers, storage, and networking.  The portfolio spans from starter bundles and reference architectures to integrated appliances, validated systems and engineered solutions. Our portfolio makes it easier for customers by simplifying the architecture, design, configuration/testing, deployment and management. By utilizing the Dell EMC portfolio, customers can minimize the time, effort, and resources to validate an architecture. Dell EMC has optimized the infrastructure to help free customers’ time to focus on their use cases.

    For customers, the Dell EMC portfolio equates to a tremendous amount of choice and flexibility in deployment model, allowing customers to buy, deploy and operate solutions for big data and modern analytics no matter where they are in their journey. From industry-leading integration capabilities to direct-attached and shared storage, from real-time analytics to virtualized environments and hybrid clouds, choice spans the portfolio. The Dell EMC portfolio is configured and tuned to provide leading performance to run analytics workloads, enabling faster decision making.

    Recent advances in the portfolio will be in the spotlight at the Dell EMC booth #501 at Strata + Hadoop World and will include use case-based solutions and validated systems for Cloudera Hadoop deployments. Our first iteration of the Hadoop reference architecture was published in 2011, when we partnered with Cloudera and Intel to develop a groundbreaking architecture for Apache Hadoop, which was then a young platform. Since then, hundreds of organizations have deployed big data environments based on our validated systems.

    The widespread adoption of simplified and cost-effective validated systems points to a broader theme that will permeate the Strata + Hadoop World conference. That is one of Hadoop as a maturing platform that is heading into the mainstream of enterprise IT and delivering proven business value.

    About that business value? Dell EMC and Intel commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study to examine the potential ROI enterprises may realize by deploying the Dell EMC | Cloudera Apache Hadoop Solution, accelerated by Intel. Based on interviews with organizations using these solutions, the TEI study identified these three-year risk-adjusted results:

    • Payback: Six months
    • Net present value (new income): $5.6 million
    • ROI: 97 percent—meaning for every $1.00 spent, $1.97 is returned

    Clearly, there are many reasons to be excited about how far we’ve come with Hadoop, and the potential to take the platform to all new levels with the Dell EMC portfolio. If you’re heading to Strata + Hadoop World, you will have many opportunities to learn more about the work Dell EMC is doing to help organizations unlock the value of their most precious commodity—their data.

    In the meantime, you can learn more at Dell.com/Hadoop.

     

    Armando Acosta is the Hadoop planning and product manager and Subject Matter Expert at Dell EMC.

     

     

     

  • Dell TechCenter

    Managing your Linux infrastructure with Ansible Part 2

    A couple of months ago I wrote a blog introducing Ansible and explained the type of tasks that can be easily automated with Ansible. Here I provide an overview of the most important concepts and share useful tips learned from experience in the past few months.

    Tasks: A task is the smallest unit of work. It can be an action like “Install a database”, “Install a web server”, “Create a firewall rule” or “Copy this configuration file to that server”.

    Plays: A play is made up of tasks. For example, the play “Prepare a database to be used by a web server” is made up of tasks: 1) “Install the database package” 2) “Set a password for the database administrator” 3) “Create a database” and 4) “Set access to the database”.

    Playbook: A playbook is made up of plays. A playbook could be “Prepare my web site with a database backend”, and the plays would be 1) “Set up the database server” and 2) “Set up the web server”.

    Roles: Roles are used to save and organize playbooks and allows sharing and reuse of existing roles. Following the previous examples: if you need to fully configure a web server, you can use roles that others have written and shared. Since roles are highly configurable (if written correctly) they can be easily re-used to suit any given deployment requirements.

    Ansible Galaxy: Ansible Galaxy is an online repository where roles are uploaded so they can be shared with others. It is integrated with GitHub, so roles can be organized into git repositories and then shared via Ansible Galaxy.

    These definitions can be depicted as shown below:

    Please note this is just one way to organize what we want to do. We could have split up installation of the database and the web server into separate playbooks and into different roles. Most roles in Ansible galaxy install and configure individual applications. For example, here is one for installing mysql and another one for installing httpd.

    Tips for writing plays and playbooks

    The best source for learning Ansible is the official documentation site. And as usual, online search is your friend. I recommend starting with simple tasks like installing applications or creating users. Once you are ready, follow these guidelines:

    • When testing, use a small subset of servers so that your plays execute faster. If they are successful in one server, they will be successful in others (assuming there aren’t any hardware dependencies in your playbooks).
    • Always do a dry run to make sure all commands are working (run with --check-mode flag).
    • Test as often as you need to without fear of breaking things. Tasks describe a desired state, so if a desired state is already achieved, it will simply ignore it.
    • Be sure all host names defined in /etc/ansible/hosts are resolvable.
    • Because communication to remote hosts is done using SSH, keys have to be accepted by the control machine, so either 1) exchange keys with remote hosts prior to starting or 2) be ready to type in “Yes” to accept SSH key exchange requests for each remote host you want to manage.
    • Although you can combine tasks for different Linux distributions in one playbook, it’s cleaner to write separate playbook for each distro.

    Next up

    In my next blog, I will share a role for adding the official Dell repositories for installing OpenManage Server Administrator and Dell System Update on RHEL and Ubuntu operating systems.

  • Dell TechCenter

    13G PowerEdge Server Performance Sensitivity to Memory Configuration

    Author: Bruce Wagner, September 2016 (Solutions Performance Analysis Lab)

     

    The goal of this blog is to illustrate the performance impact of DDR4 memory selection. Measurements were made on a Broadwell-EP CPU system configuration using the industry standard benchmarks listed in the following table 1.

     

    Table 1: Detail of Server and Applications used with Intel Broadwell processor

    Server

    Dell PowerEdge R630

    Processor

    2 x E5-2699 v4 @2.2GHz, 22 core, 145W, 55M L3 Cache

    Memory

    DDR4 product offerings including:

    8GB 1Rx8 2400MT/s RDIMM (DPN 888JG)

    32GB 2Rx8 2400MT/s RDIMM (DPN CPC7G)

    64GB 4Rx8 2400MT/s LR-DIMM (DPN 29GM8)

    Power Supply

    1 x 750W

    Operating System

    Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)

    BIOS options

    Memory Operating Mode – Optimizer

    Node Interleaving – Disabled

    Snoop mode – Opportunistic Snoop Broadcast

    Logical Processor – Enabled

    System profile – Performance

    BIOS Firmware

    2.1.7

    iDRAC Firmware

    2.30.30.30

    SPECcpu2006

    Intel optimized 16.0.0.101 linux64 binaries (http://www.spec.org/cpu2006)

    STREAM

    v5.10 source from https://www.cs.virginia.edu/stream/

    Intel Parallel Studio 2016 update2 compilation

     

    Table 2 and figure 1 detail the memory subsystem within the 13G PowerEdge R630 as containing 24 DIMM sockets split into two sets of 12, one set per processor. Each 12-socket set is organized into four channels with three DIMM sockets per channel.

     

    Table 2: Memory channels

    Processor

    Channel 0 DIMM Slots

    Channel 1 DIMM Slots

    Channel 2 DIMM Slots

    Channel 3 DIMM Slots

    CPU 1

    A1, A5, A9

    A2, A6, A10

    A3, A7, A11

    A4, A8, A12

    CPU 2

    B1, B5, B9

    B2, B6, B10

    B3, B7, B11

    B4, B8, B12

     

    Figure 1: Memory socket locations

     

    Figure 2: Performance Impact of Memory Type

     

    From Figure 2 we see that a memory configuration based upon Registered DIMMs (RDIMMs) provides a comprehensive 3.1% performance advantage as compared to an equivalent sized one composed of Load-Reduced DIMMs (LR-DIMM) despite both running at 2400 MT/s. LR-DIMMs make larger capacity memory configurations possible, but their inherently higher access latency results reduced application performance. LR-DIMMs also impose a nearly 30% power consumption penalty over the equivalent size/speed RDIMM. LR-DIMM should be resorted to only when the total system memory capacity requirement dictates a 3DPC configuration.

     

     Table 3: Memory speed limits for 13G PowerEdge Models

     

     

    Figure 3: Performance Impact of DIMM Rank Organization
     

    From figure 3 we see that a 1DPC memory configuration composed of DIMMs of dual rank internal organization outperforms one composed of single rank DIMMs by 14%. This is due to DRAM’s large inherent delay when reversing read and write cycle access on a given rank leading to a significant reduction in throughput bandwidth on the memory channel. Given dual rank DIMMs or multiple DIMMs per channel, the CPU’s integrated memory controller can overlap schedule reads and writes on the memory channel to minimize RW turnaround time impact.


     

    Figure 4: Performance Impact of Memory Speed

     

     Figure 4 shows that a 2400 MT/s memory configuration provides 14% higher overall application performance than a 2133 MT/s one all other factors being the same. Modern 8Mbit 1.2V DDR4 DIMM technology is such that the higher speed incurs only a nominal increase in power consumption and thermal dissipation. 2400 MT/s DIMMs pricing and availability is also rapidly trending to be the commodity sweet spot.

     

    Figure 5: Performance Impact of DIMM Slot Population

    Figure 5 shows that a 2DPC population results in a slight 0.9% workload performance uplift over a 1DPC one attributed to the same memory controller data transfer overlap efficiency improvements as discussed for figure 3. A 3DPC result is shown to further highlight the marked performance degradation that results from the necessity to down clock the memory subsystem from 2400 MT/s to 1866 MT/s.

    Figure 6: Performance Impact of DIMM Population Balance

     

    In figure 6 we see a wide disparity in overall system memory bandwidth as a result of DIMM population balance.

    Although the default Optimizer (aka Independent Channel) Memory Operating Mode supports odd numbers of DIMMs per CPU, there is a severe performance penalty in doing so.

    The full list of memory module installation guidelines can be found within the product owner’s manual available thru www.dell.com.

    In summary, to maximize workload performance the recommendation for 13G 2 socket servers is to populate all available channels with (2) dual–rank registered, 2400 MT/s DIMMs per channel.

  • Dell Cloud Blog

    Get started and streamline your journey to cloud with Hybrid Cloud System for Microsoft

    At this year’s worldwide partner conference, Microsoft announced our partnership delivering Azure Stack as an Integrated System targeting availability of mid-2017. For Dell, this is a continuation of the partnership and joint development with Microsoft going back to the early days developing Cloud Platform Systems with a focus on delivering integrated Systems for Hybrid Cloud solutions to our customers. Azure Stack is the next phase of that partnership.

    Another key highlight from the Microsoft announcement was Microsoft’s recommendation to adopt Cloud Platform Systems (CPS) today to move forward in the cloud journey.

     

    “Customers ready to deploy an Azure-consistent cloud on their premises now should buy Microsoft Cloud Platform Solution (CPS). Customers will be able to use Azure Stack to manage CPS resources thereby preserving investments in CPS.”Mike Neil, Corporate Vice President, Enterprise Cloud, Microsoft Corporation

     

    The Dell Hybrid Cloud System for Microsoft CPS Standard (DHCS), our integrated system for CPS hybrid cloud, is a great way to get started and streamline your journey to cloud. The early steps to a hybrid cloud for customers are usually evolutionary, but still impact applications, security, infrastructure, and the operating model. The path to Azure Stack is in steps even further along that journey; so getting started with DHCS today helps with three key areas:

    1. Rationalizing applications
    2. Adopting the cloud model
    3. Getting started with hybrid

     

    So for example (area #1), most applications today are virtualized, either traditional enterprise (Exchange, SharePoint) or so-called “n-tier” such as web or databases. As a first step, you inventory the applications and assess your options based on classification of applications and data, cost, security, and so forth. By the end of this step, you have identified the applications to rehost as IaaS virtual machines (VMs) in a cloud infrastructure like DHCS, and a migration plan for the applications and data. Eventually as you re-tool yourself to rebuild some of your existing applications as cloud native or develop new cloud native applications, Azure Stack will provide you with a platform to develop and deliver them on premises. With our integrated system for Azure Stack, you can continue to run your traditional applications on DHCS while managing them from Azure Stack as IaaS VMs without having to migrate or upgrade your current investments.

     

    Area #2 is the part of your journey towards cloud having to do with adopting the cloud model. Orienting your business and operating model towards service delivery and consumption is key to getting most from cloud, and takes time and experience to achieve. Adopting multi-tenancy, self-service, metering and governance are critical first steps towards being truly cloud native. With a consumption model, you are now able to increase utilization and gain control of your resources and reduce cost risk while rapidly delivering services that your tenants need. DHCS comes ready to enable adoption of the cloud model on premises, on a software-defined infrastructure that is familiar and proven in the market today.

     

    Hybrid adoption is the final area most customers struggle with. We have identified two main hybrid use cases to get started that bring value to customers today, and integrated them out-of-the-box into DHCS. With the Backup and Site Recovery services from the Microsoft Azure public cloud, you not only get integration into Azure, but also the ability to efficiently implement a business-continuity strategy with zero CAPEX and with OPEX based on consumption for your on-premises cloud.

     

    With the Dell Hybrid Cloud System for Microsoft, you get a platform ready to rehost your applications today and deliver them as services to your tenants, enabling self-service, metering, and governance. You also get the option to consume Microsoft Azure services like Backup and Site Recovery out of the box. DHCS is built on a familiar and proven technology stack with Windows Server, System Center and Windows Azure Pack, enabling you to focus less on the workings of the technology and more on areas that transform your business as you continue to take advantage of cloud.

     

    Whether you choose to rehost the applications and adopt IaaS with DHCS or eventually re-factor applications to leverage any of the Azure platform-as-a-service capabilities, Dell will partner with you along this journey and protect your investments as you adopt DHCS today and plan for Microsoft Azure Stack tomorrow.

  • General HPC

    Application Performance Study on Intel Broadwell EX processors

    Author: Yogendra Sharma, Ashish Singh, September 2016 (HPC Innovation Lab)

    This blog describes the performance analysis on a PowerEdge R930 server powered by four Intel Xeon E7-8890 v4 @2.2GHz processors (code named as Broadwell-EX). Primary objective of this blog is to compare the performance of HPL, STREAM and few scientific applications ANSYS Fluent and WRF with the previous generation of Intel processor Intel Xeon E7-8890 v3 @2.5GHz codenamed Haswell-EX. Below are the configurations used for this study.

    Platform

    PowerEdge R930

    PowerEdge R930

    Processor

    4 x Intel Xeon E7-8890 v3@2.5GHz (18 cores) 45MB L3 cache 165W

    4 x Intel Xeon E7-8890 v4@2.2GHz (24 cores) 60MB L3 cache 165W

    Memory

    1024 GB = 64 x 16GB DDR4 @1866MHz RDIMMS

     

    1024 GB = 32 x 32GB DDR4 @1866MHz RDIMMS

    BIOS Settings

    BIOS

    Version 1.0.9

    Version 2.0.1

    Processor Settings > Logical Processors

    Disabled

    Disabled

    Processor Settings > QPI Speed

    Maximum Data Rate

    Maximum Data Rate

    Processor Settings > System Profile

    Performance

    Performance

    Software and Firmware

    Operating System

    RHEL 6.6 x86_64

    RHEL 7.2 x86_64

    Intel Compiler

    Version 15.0.2

    Version 16.0.3

    Intel MKL

    Version 11.2

    Version 11.3

    Intel MPI

    Version 5.0

    Version 5.1.3

    Benchmark and Applications

    LINPACK

    V2.1 from MKL 11.2

    V2.1 from MKL 11.3

    STREAM

    v5.10, Array Size 1800000000, Iterations 100

    v5.10, Array Size 1800000000, Iterations 100

    WRF

    v3.5.1, Input Data Conus12KM, Netcdf-4.3.1.1

    V3.8 Input Data Conus12KM, Netcdf-4.4.0

     ANSYS Fluent  v15, Input Data: truck_poly_14m, sedan_4m, aircraft_2m  v16, Input Data: truck_poly_14m, sedan_4m, aircraft_2m

     

           Table 1: Details of Server and HPC Applications used with Broadwell-EX processors

    ____________________________________________________________________________________________________________________________________

    In this section of the blog, we have compared benchmark numbers with two generations of processors on the same server platform i.e. PowerEdge R930 as well as performance of Broadwell-EX processors with different CPU profiles and memory snoop modes namely Home Snoop (HS) and Cluster On Die(COD).

    The High Performance Linpack Benchmark is a measure of a system's floating point computing power. It measures how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering. HPL benchmark was run on both PowerEdge R930 servers (With Broadwell-EX and Haswell-EX ) with block size of NB=192 and problem size of N=340992.

      

    Figure 1: Comparing HPL Performance across BIOS profiles      Figure 2: Comparing HPL Performance over two generations of processors

    Figure 1 depicts the performance of PowerEdge R930 server with Broadwell-EX processors on different BIOS options. HS (Home snoop mode) performs better than the COD (Cluster-on-die) on both of the system profiles Performance and DAPC. Figure 2 compares the performance between four socket Intel Xeon E7-8890 v3 and Intel Xeon E7-8890 v4 processor servers. HPL showed 47% performance improvement with four Intel Xeon E7-8890 v4 processors on R930 server in comparison to four Intel Xeon E7-8890 v3 processors. This was due to ~33% increase in the number of cores and 13% increase due to new improved version of both Intel compiler and Intel MKL.  

    Stream benchmark is a synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.

     

      

    Figure 3: Comparing STREAM Performance across BIOS profiles   Figure 4: Comparing STREAM Performance over two generations of processors

     

    As per Figure 3, the memory bandwidth of PowerEdge R930 server with Intel Broadwell-EX processors are same on different bios profiles. Figure4 shows the memory bandwidth of both Intel Xeon Broadwell-EX and Intel Xeon Haswell-EX processors with PowerEdge R930 server. Both Haswell-EX and Broadwell-EX support DDR3 and DDR4 memories respectively, while the platform with this configuration supports 1600MT/s of memory frequency for both generation of processors. Due to the same memory frequency supported by the PowerEdge R930 platform for both generation of processors, both Intel Xeon processors have same memory bandwidth of 260GB/s with the PowerEdge R930 server.

    The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs. It features two dynamical cores, a data assimilation system, and a software architecture facilitating parallel computation and system extensibility. The model serves a wide range of meteorological applications across scales from tens of meters to thousands of kilometers. WRF can generate atmospheric simulations using real data or idealized conditions. We used the CONUS12km and CONUS2.5km benchmark datasets for this study. CONUS12km is a single domain and small size (48hours, 12km resolution case over the Continental U.S. (CONUS) domain from October 24, 2001) benchmark with 72 seconds of time step. CONUS2.5km is a single domain and large size (Latter 3hours of a 9hours, 2.5km resolution case over the Continental U.S. (CONUS) domain from June 4, 2005) benchmark with 15 seconds of time step. WRF decomposes the domain into tasks or patches. Each patch can be further decomposed into tiles that are processed separately, but by default there is only one tile for every run. If the single tile is too large to fit into the cache of the CPU and/or core, it slows down computation due to WRF’s memory bandwidth sensitivity. In order to reduce the size of the tile, it is possible to increase the number of tiles by defining “numtile = x” in input file or defining environment variable “WRF_NUM_TILES = x”. For both CONUS 12km and CONUS 2.5km the number of tiles are chosen based on best performance which is equal to 56.

      Figure 5: Comparing WRF Performance across BIOS profiles

    Figure 5 demonstrates the comparison of WRF datasets on different BIOS profiles .With Conus 12KM data ,all the bios profiles performs equally well because of the smaller data size while for CONUS 2.5KM Perf.COD (Performance System Profile with Cluster-On-Die snoop mode) gives best performance. As per the figure 5, the Cluster-on-Die snoop mode is performing 2% higher than Home snoop mode, while the Performance system profile gives 1% better performance than DAPC.

     Figure 6: Comparing WRF Performance over two generations of processors

    Figure 6 shows the performance comparison between Intel Xeon Haswell-EX and Intel Xeon Broadwell-EX processors with PowerEdge R930 server. As shown in the graph, Broadwell-EX performs 24% better than Haswell-EX for CONUS 12KM data set and 6% better for CONUS 2.5KM.

    ANSYS Fluent is a computational fluid dynamics (CFD) software tool. Fluent includes well-validated physical modeling capabilities to deliver fast and accurate results across the widest range of CFD and multi physics applications.

    Figure 7: Comparing Fluent Performance across BIOS profiles

    We used three different datasets for Fluent with ‘Solver Rating’ (Higher is better) as the performance metric. The above graph Figure 7 shows that all three datasets performed 4% better with Perf.COD (Performance System Profile with Cluster-On-Die snoop mode) bios profile than others. While, the DAPC.HS (DAPC system profile with Home snoop mode) bios profile shows lowest performance. For all three datasets ,the COD snoop mode performs 2% to 3% better than Home snoop mode and Performance system profile performs 2% to 4% better than DAPC. For all these three datasets the behaviour of Fluent is consistent.

    Figure 8: Comparing Fluent Performance over two generations of processors

     

    As shown above in Figure 8, for all the test cases on PowerEdge R930 with Broadwell-EX ,Fluent showed 13% to 27% performance improvement in-comparision to PowerEdge R930 with Haswell-EX.

    ________________________________________________________________________________________________

     

    Conclusion:

    Overall, Broadwell-EX processor makes the PowerEdge R930 server more powerful and more efficient. With Broadwell-EX, the HPL performance increses in the smae manner as increase in the number of cores in comparison to Haswell-EX. There is also increase in the performance for real time applications depending on their nature of computation. So, it can be a good choice to upgrade for those who are using compute hungry applications.

     


  • Dell TechCenter

    Configuring TPM 2.0 on Dell PowerEdge 13G servers for Windows Server 2012R2 and Windows Server 2016

    Authors: Thomas Cantwell and Gong Wang

    Important! Some of the bios settings described in this article are not yet in released Dell BIOS.  The BIOS releases that will coincide with Windows Server 2016 Dell launch will carry these new settings.

    Introduction -

    TPM 2.0 is the latest release of Trusted Platform Module (TPM) that can be installed on Dell PowerEdge 13G servers. To properly configure the TPM on Dell PowerEdge servers, you must use different settings for Windows Server 2012R2 and Windows Server 2016 to match the OS capabilities.

    The system must be configured for UEFI boot mode prior to OS installation (Caution! If you install the OS in legacy bios mode, you must reinstall the OS to switch to UEFI mode).

     

     

    Dell BIOS settings to enable TPM, and change settings can be found under the System Security tab in BIOS:

     

    Windows Server 2012R2 –

    To use TPM 2.0 on Windows Server 2012R2, you must install a hotfix - https://support.microsoft.com/en-us/kb/3095701 . Without this hotfix, the OS will not be able to recognize the TPM.

     

    The following setting will be available in bios releases that will coincide with Windows Server 2016 launch from Dell. In BIOS (under TPM Advanced), set the TPM to SHA1, as shown.  Windows Server 2012R2 only supports SHA1.

      

     

    Windows Server 2016 –

     

    Windows Server 2016 is due to ship soon. There are some important and significant bios setting modifications that must be made to fully leverage Windows Server 2016. This allows the server to be ready for the TPM-trusted Guarded Host deployment, on which the Shielded Virtual Machines can run. Guarded Host and Shielded VM are new to Windows Server 2016.

     

    As above, the server must be configured for UEFI mode to enable TPM 2.0 to be fully functional. TPM 2.0 is supported in UEFI mode only. To deploy the TPM-based guarded host, system BIOS settings must be configured as following:

    • Boot Settings: UEFI

    • System Security > Secure Boot > Secure Boot Enabled (Required for Guarded Host)

    • System Security > TPM Security: On

    • BIOS settings as shown below – Windows Server 2016 supports the newer SHA256, which is more secure, so set to SHA256.

     

     

     

  • vWorkspace - Blog

    What's new for vWorkspace - August 2016

    Updated monthly, this publication provides you with new and recently revised information and is organized in the following categories; Documentation, Notifications, Patches, Product Life Cycle, Release, Knowledge Base Articles.

     Subscribe to the RSS (Use IE only)

     

    Patches

    None at this time 

     

    Downloads

    Product Release Notification – vWorkspace 8.6.1

    Type: Patch Release Created: August 2016

     

       

    Knowledgebase Articles

    New 

     

    210044 - The connector for Mac crashes when a second user tries to connect

    There are multiple users who can access a Mac computer. User A installs the Mac connector and can run it and connect to vWorkspace. User B...

    Created: August 4, 2016

     

    210083 - After Windows Update patching, EOP is not working anymore

    After updating the Gold Image via Windows Update, EOP Printers are not redirected anymore to VDI. In the Event Viewer, the following...

    Created: August 5, 2016

     

    210082 - When using HTML5 to connect to Seamless App, the keyboard layout is not the one defined on the client computer

    When using HTML5 to connect to Seamless App (Managed Application) , the keyboard layout is not the one defined on the client computer.

    Created: August 5, 2016

     

    211463 - How to Provision a Remote Desktop Session Host.

    Use the following steps to configure a template that is used to create a virtual computer within a Session Host computer group.

    Created: August 26, 2016

     

    Revised

    206895 - vWorkspace farm connection failed for specific user "CKerbTicketsForAuth::requestTicketsFromSystem:protocolStatus failed"

    When trying to connect to the vWorkspace farm, connection fails for at least one user with the error: Connection to [farm] failed...

    Revised: August 2, 2016

     

    66448 - Troubleshooting Guide for vWorkspace 7.x

    Revised: August 3, 2016

     

    209120 - Attempting to connect to Web Access using the Mac Connector results in error: Failed to load settings from Pit file

    Mac fails to connect to vWorkspace through Web Access and the SSL Gateway/Secure Access Server Error is: Failed to load settings from Pit file

    Revised: August 3, 2016

     

    204875 - Errors retrieving specific template VMs from Hyper-V

    When importing templates from Hyper-V or attempting to deploy VDIs from specific templates, the following error is generated: &quot...

    Revised: August 3, 2016

     

    90422 - How to ignore the remote (client-side) keyboard layout when establishing a remote connection

    When making a remote connection, via vWorkspace or MS RDP, by default the client-side keyboard layout is reflected. So a user connecting...

    Revised: August 3, 2016

    120050 - Multimonitor not working or Audio and microphone are not redirected when connecting to Windows 7 Professional VDI

    When connecting to the Windows 7 Professional VDI machines, multimonitor does not work and audio and microphone are not redirected.

    Revised: August 3, 2016

     

    209063 - How to clear the username box on a failed login attempt to WebAccess

    Is there any way to clear the Username box on the Web Access login page following a failed login? The password box clears but we would...

    Revised: August 3, 2016

     

    205626 - No data to be displayed in vWorkspace Reporting and Monitoring

    Foglight Monitoring software is not working and not able to monitor any systems in VDI environment

    Revised: August 3, 2016

     

    109645 - Error "Server [Server_name] could not be contacted."

    Revised: August 3, 2016

     

    94615 - "An error occurred. Please try again later." when using vWorkspace to view YouTube content

    When using vWorkspace 7.6/8.0 Flash Redirection to visit YouTube a warning message will be displayed in place of the video content, the message...

    Revised: August 3, 2016

     

    208139 - Connection Broker Lookup Test on vWorkspace Secure Access Gateway times out

    When using the Connectivity Test button to test the connection with brokers, the test times out.

    Revised: August 3, 2016

     

    97037 - Error: "CKerbTicketsForAuth::requestTicketsFromSystem: ProtocolStatus failed" When trying to connect using App Portal.

    App Portal Connection fails with this error: CKerbTicketsForAuth::requestTicketsFromSystem: ProtocolStatus failed

    Revised: August 3, 2016

     

    124356 - How To Utilize User Profiles to save desktop icon arrangement

    By using either Folder Redirection or vWorkspace User Profiles, it is easy to save a Users desktop icons. This article will take this one stage...

    Revised: August 3, 2016

     

    186075 - How to enable Desktop Integrated Mode in the 8.6.x Windows connector.

    When installing the 8.6 or higher Windows connector, an additional shortcut for the Desktop Integrated (DI) mode is not available in the start...

    Revised: August 3, 2016

     

    207442 - Launching App-V application errors with "The package containing your application is not published on this machine"

    When trying to launch an App-V application the following error is seen: <img alt="App-v error" src="https://prod-support-images-cfm.s3.amazonaws...

    Revised: August 3, 2016

     

    208223 - In Linux-based 10Zig thin clients, the USB redirection is not working

    When connecting from Windows-based 10Zig thin client devices, USB redirection is working. From Linux-based ones it doesn’t work.

    Revised: August 3, 2016

     

    207445 - Is it possible to upgrade vWorkspace in stages?

    In many cases, upgrading a large vWorkspace environment cannot be done in a short time period. Is it possible to upgrade vWorkspace in stages?

    Revised: August 3, 2016

     

    207445 - Is it possible to upgrade vWorkspace in stages?

    In many cases, upgrading a large vWorkspace environment cannot be done in a short time period. Is it possible to upgrade vWorkspace in stages?

    Revised: August 9, 2016

     

    94387 - Linked clone failed CloneVmLinked SoapException thrown during CreateLinkedClone_Task!

    When trying to create a linked clone it gives the error CloneVmLinked: SoapException thrown during CreateLinkedClone_task!

    Revised: August 11, 2016

     

    114826 - Dell vWorkspace Optional Hotfix 258745 Version 7.6.1 for Linux Connector for Wyse T50

    Revised: August 22, 2016

     

    156633 - How to enable Drag and Drop to Published App

    The user can’t use Drag and Drop over Seamless Application

    Revised: August 19, 2016

     

    88354 - Installing AppSense after PNTools causes AppSense to not function correctly

    On a Windows 7 VDI if PNTools is installed first, then AppSense’s Environment manager agent is installed, then any actions; set up in...

    Revised: August 23, 2016

     

     

    Product Life Cycle

    Product Life Cycle - vWorkspace

    Revised: August 18, 2016 

     

  • Dell TechCenter

    HPC Benchmarks and Applications Performance Study on Broadwell-EP 4S Processor

    Author:  Neha Kashyap, August 2016 (HPC Innovation Lab)

    The intent of this blog is to illustrate and analyze the performance obtained on Intel Broadwell-EP 4S processor with the focus on HPC applications, two synthetic benchmarks along with four applications namely, High Performance Linpack (HPL), STREAM, Weather Research and Forecasting (WRF), NAnoscale Molecular Dynamics (NAMD) and Fluent. These runs have been performed on a standalone single server PowerEdge R830. Combinations of System BIOS profiles and Memory Snoop modes are compared for better analysis.

    Table 1:  Details of Server and Applications used with Intel Broadwell processor

    Server

    Dell PowerEdge R830

    Processor

    4 x E5-4669 v4 @2.2GHz, 22 core, 135W, 55M L3 Cache

    AVX Base Frequency @1.7GHz

    Memory

    32 x 16GB DDR4 @ 2400MHz (Total=512GB)

    Power Supply

    2 x 1600W

    Operating System

    Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)

    BIOS options

    System profile – Performance and Performance Per Watt (DAPC)

    Snoop modes – Cluster on Die (COD) and  Home Snoop (HS)

    Logical Processor and Node Interleaving – Disabled

    I/O Non-Posted Prefetch – Disabled

    BIOS Firmware

    1.0.2

    iDRAC Firmware

    2.35.35.35

    STREAM

    v5.10

    HPL

    From Intel MKL (Problem size is 253440) 

    Intel Compiler

    From Intel Parallel studio 2016 update3

    Intel MKL

    11.3

    MPI

    Intel MPI – 5.1.3

    NetCDF

    4.4.0

    NetCDF-Fortran

    4.4.2

    FFTW

    2.1.5

    WRF

    3.8

    NAMD

    2.11

    Ansys Fluent

    v16.0

    The server used for obtaining results is Dell’s PowerEdge 13th generation server. It is a high-performance four-socket, 2U rack server. It supports massive memory density (up to 3TB). The Intel® Xeon® Processor E5-4669 v4 (Product Family E5-4600 v4) is a 14nm die supporting two snoop modes in 4 socket systems - Home Snoop and Cluster on Die (COD). It is based on the micro-architecture code named Broadwell-EX.

    Default snoop mode being Home Snoop mode. Cluster on Die is available only on processor models having more than 12 cores on Intel’s Broadwell series. In this mode, there is a logical split in the socket into two NUMA domains and are exposed to the operating system. The total number of cores, level 3 Cache is divided equally in the two sliced NUMA domains. Each having one home agent with equal number of cores cache slices in each NUMA domain. The NUMA domain (cores plus home agent) is called a cluster. The COD mode is best suitable for highly NUMA optimized workloads.

     

    The STREAM is a synthetic HPC benchmark. It evaluates the sustained memory bandwidth in MB/s by counting only the bytes that the user program requested to be loaded or stored to the memory. The “TRIAD function” score reported by this benchmark is used to analyze the performance of stream bandwidth test.  Operation carried out by TRIAD is -   TRIAD:  a(i) = b(i) + q*c(i)

     Figure 1: Memory Bandwidth with STREAM

    From figure 1 it can be observed that the DAPC.COD performs the best. DAPC and Performance profiles have similar performance, the memory bandwidth varies slightly from 0.2% - 0.4% for HS and COD mode respectively. COD snoop mode performs ~2.9-3.0% better than HS.

                                       

                                 Figure 2: STREAM Memory Bandwidth on DAPC.COD   

    Here, figure 2 shows the performance obtained with STREAM Triad memory bandwidth in DAPC.COD configuration taking into account local, local NUMA and remote bandwidth. The full system memory bandwidth is ~ 226 GB/s.

    To obtain the local memory bandwidth, the processes are bind to a socket and only the memory local to that socket is accessed (same NUMA node with NUMA enabled). The local NUMA node with 11 threads gives almost half the performance when compared to local socket with 22 threads since the number of processors becomes half. For remote memory, the memory bandwidth from remote to same socket drops by 64% while from remote to other socket, the processes are bind to another socket and memory which is remote to that socket is accessed through QPI link (remote NUMA node). Since there is limitation to QPI bandwidth, the remote memory bandwidth of remote to other socket is going down another 5% when compared to remote to same socket.

     

    High Performance LINPACK (HPL) is an industry standard compute intensive benchmark. It is traditionally used to stress the compute and memory subsystem. It measures the speed with which computer solves linear equations and calculates a system’s floating-point computing power. Intel’s Math Kernel Library (software library to perform numerical linear algebra on digital systems) is required by it.

                                

                                   Figure 3: HPL Performance and Efficiency

    Figure 3 illustrates HPL performance benchmark. From this graph, it is clear that DAPC and Performance profiles are giving almost similar results whereas there is difference when comparing HS with COD. With DAPC, COD is 6.2% higher than HS whereas with Performance profile, COD yields 6.6% higher performance than HS. The HPL performance is analyzed in “GFLOP/second”. The Efficiency is more than 100% because of the AVX base frequency.   

     

    The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction used for both atmospheric research and weather forecasting needs. It generates atmospheric simulations using real data (observations, analysis) or idealized conditions. It features two dynamical cores, a data assimilation system, and a software architecture which allows for parallel computation and system extensibility. For this study, CONUS12km (small) and CONUS2.5km (large) data set is taken and the “Average Time Step” computed is used as a metric to analyze the performance.

    CONUS12km is a single domain and medium size (48hours, 12km resolution case over the Continental U.S. (CONUS) domain from October 24, 2001) benchmark with a time step of 72 seconds. CONUS2.5km is a single domain and large size (Latter 3hours of a 9hours, 2.5km resolution case over the Continental U.S. (CONUS) domain from June 4, 2005 with a time step of 15 second).

    The number of tiles is chosen based on best performance obtained by experimentation and is defined by setting the environment variable “WRF_NUM_TILES = x” where x denotes number of tiles. The application was compiled with the “sm + dm” mode. The combinations of MPI and OpenMP processes that were used are as follows:

    Table 3:  WRF Application Parameters used with Intel Broadwell processor

    E5-4669v4

    CONUS12km

    CONUS2.5km

    Total no. of cores

    MPI Processes

    OMP Processes

    TILES

    MPI Processes

    OMP Processes

    TILES

    88

    44

    2

    44

    44

    2

    56

                           

    Figure 4: Performance with WRF, CONUS12km      Figure 5: Performance with WRF, CONUS2.5km

    Figure 4 illustrates CONUS 12km. With COD there is 4.5% improvement in the average time step when compared with HS. While for CONUS2.5km, DAPC and Performance profiles are showing a variance from 0.4% to 1.6% for HS and COD respectively. COD performs ~2.1% - 3.2% better than HS. Due to the large dataset size, CONUS2.5km can more efficiently utilize larger number of processors. For both datasets DAPC.COD performs the best.

     

    NAMD is a molecular dynamics research application which is portable, parallel and object oriented designed specifically for high-performance simulations of large bio molecular systems. It is developed using charm++. For this study, the three widely used datasets have been taken, namely ApoA1 (92,224 Atoms) – standard NAMD cross-platform benchmark, F1ATPase (327,506 Atoms) and STMV (virus, 1,066,628 Atoms) – useful for demonstrating scaling to thousands of processors. ApoA1 is the smallest and STMV being the largest dataset.

     Figure 6: Performance of NAMD on BIOS Profiles (The Lower the Better)

    The performance obtained on ApoA1 across BIOS profiles is same while for ATPase it is almost similar. The difference is visible in case of STMV (larger dataset) over other two datasets since the number of atoms increases which in turn helps in proper utilization of processors. In case of STMV, DAPC and Performance profiles vary from 0.7% to 1.3% for HS and COD respectively. Also the COD snoop mode performs ~6.1% - 6.7% better than HS.

     

    Ansys Fluent is a powerful computational fluid dynamics (CFD) software tool.  “Solver Rating” (the higher the better) is considered as a metric to analyze the performance on six input data sets for Fluent.

          

    Figure 7: Performance comparison of Fluent on BIOS Profiles (The Higher the Better)

    The Perf.COD is expected to perform best with Fluent. For all datasets the Solver Rating varies from 2-5%.

    To conclude, the R830 platform is performing upto the mark with all expected output. It is a good pick for HPC workloads giving best results with DAPC.COD system BIOS profile. It is a great choice in terms of overall system architecture improvements and supporting latest processor.