Our community is talking about the new Dell Technologies. Join the discussion in the Dell EMC Community Network:
At this week’s ignite, you must have heard about the availability of Technical Preview 2 (TP2) for Azure Stack. We are excited at Dell EMC and encourage our customers to deploy TP2 as one of the key first steps towards adopting Azure Stack before it is available at GA.
At this year’s worldwide partner conference, Microsoft announced our partnership delivering Azure Stack as an Integrated System targeting availability of mid-2017. For Dell EMC, this is a continuation of our joint development partnership with Microsoft going back to the early days developing Cloud Platform Systems with a focus on delivering integrated Systems for Hybrid Cloud solutions to our customers. Azure Stack is the next phase of our partnership.
TP2 helps you prepare your infrastructure, operations and your application teams to hit the ground running when we bring you the Integrated System for Azure Stack at GA. Key areas to start planning include (and not limited to):
IaaS and PaaS capabilities within TP2 and leading into GA
Capacity and Performance needs
Scenarios: PoC\DevTest and Production
DevOps practices and infrastructure
Identity and Access for Admins and Tenants: Azure AD or ADFS
Network integration into your existing border devices
Organization Security posture
People and process
Azure connected or Island
Single or Multi-Region
Capacity and Scale Units
You can confidently begin your journey knowing that at GA we will bring you an Integrated System to deliver Compute, Storage, Networking and the Azure Stack software pre-configured and fully supported to meet your application and infrastructure needs.
With TP2, customers can deploy a single node Azure Stack system and explore user experiences around (and not limited to):
Create plans and offers around key Azure Services
Run cloud native workloads such as 3 tier app from ARM from
Build a portfolio of cloud service using gallery items
Our recommended TP2 System is an R630 designed to be cost effective as a Proof of Concept (PoC) node and enables you to test and develop PoCs for your use cases. This system non-resilient and is designed to only run as a single node PoC system. Multi-node configurations will be available at GA and this system will not be upgradable to multi-node configurations. This TP2 configuration is designed with the goal to continue to deliver updates to a single-node PoC System including TP3 and into GA targeted as PoC and Dev Test use cases.
PoC System Specification:
R630 (2.5" Chassis)
10 x 2.5" Chassis
Default: Intel® Xeon® E5-2630 v4 2.2GHz,25M Cache,8.0 GT/s QPI,Turbo,HT,10C/20T (85W)Option 1: Intel® Xeon® E5-2640 v4 2.4GHz,25M Cache,8.0GT/s QPI,Turbo,HT,10C/20T (90W) Option 2: Intel® Xeon® E5-2650 v4 2.2GHz,30M Cache,9.60GT/s QPI,Turbo,HT,12C/24T (105W)
Default: 128GB (8 x 16GB RDIMM, 2400Mhz)Option:256 GB (8 x 32 GB RDIMM, 2400Mhz)
Storage -OS Boot:
1 x 400GB Solid State Drive SATA Mix Use MLC 6Gpbs 2.5in Hot-plug Drive, S3610
Storage - Cache (SSD):
Default: 2 x 200GB Solid State Drive SATA Write Intensive 6Gbps 2.5in Hot-plug Drive, S3710
Storage –Data (HDD):
Default: 6x 1TB 7.2K RPM SATA 6Gbps 2.5in Hot-plug Hard Drive
NDC:Intel X520 DP 10Gb + Intel i350 DP 1 GbE
Contact one of our Cloud Specialists to order the system and reach us as email@example.com. If you would like to see a demo of TP2, contact your Dell Account rep or Channel Partner Rep to engage the Dell EMC Customer Solution Center.
Microsoft TP2 Announce
Microsoft Azure Stack PoC architecture
TP2 deployment pre-requisites
Disclaimer: Dell does not offer support for Azure Stack TP2 at this time. Dell is actively testing and working closely with Microsoft on Azure Stack, but since it is still in development, the exact hardware components/configurations that Dell will fully support are still being determined. The information divulged in our online documents prior to Dell launching and shipping Azure Stack may not directly reflect Dell supported product offerings with the final release of Azure Stack. We are, however, very interested in your results/feedback/suggestions! Please leave comments below.
We are headed out to the Big Show for Big Data, the Strata+Hadoop World event being held September 27-29, in New York City. We look forward to meeting with partners and customers as we take a closer look at the customer journey and the possibilities that exist in driving Big Data Hadoop adoption. Dell EMC has integrated all the key components for modern digital transformation, taking you on a Big Data journey that focuses on analytics, integration, and infrastructure. We have a number of exciting discussions planned and invite you to attend the events or connect with our team directly at booth #501. We will have some great giveaways that you won’t want to miss out on. You can also join us throughout the conference for All Day - Facebook LIVE videos on the Dell EMC Big Data Facebook page.
By Armando Acosta
The Strata + Hadoop World conference gets under way today, Tuesday, September 26, at the Jacob Javits Center in New York City. As always, the event will be a showcase for leading-edge technologies related to big data, analytics, machine learning and the like, but this year’s event brings some added attractions.
For starters, the conference will be the first major event to put the spotlight on the broad portfolio of Dell EMC solutions for unlocking the value of data and enabling the data analytics journey. As individual companies, both Dell and EMC had impressive product families in this space. And now that the two companies have become one newly formed company, the combined portfolio is arguably one of the best in the industry. In many ways, we’re talking about a “1 + 1 = 3” equation.
The Dell EMC portfolio for big data and modern analytics includes integrated, end-to-end solutions based on validated architectures incorporating Cloudera distributions for Hadoop, Intel technologies, and analytic software, along with Dell EMC servers, storage, and networking. The portfolio spans from starter bundles and reference architectures to integrated appliances, validated systems and engineered solutions. Our portfolio makes it easier for customers by simplifying the architecture, design, configuration/testing, deployment and management. By utilizing the Dell EMC portfolio, customers can minimize the time, effort, and resources to validate an architecture. Dell EMC has optimized the infrastructure to help free customers’ time to focus on their use cases.
For customers, the Dell EMC portfolio equates to a tremendous amount of choice and flexibility in deployment model, allowing customers to buy, deploy and operate solutions for big data and modern analytics no matter where they are in their journey. From industry-leading integration capabilities to direct-attached and shared storage, from real-time analytics to virtualized environments and hybrid clouds, choice spans the portfolio. The Dell EMC portfolio is configured and tuned to provide leading performance to run analytics workloads, enabling faster decision making.
Recent advances in the portfolio will be in the spotlight at the Dell EMC booth #501 at Strata + Hadoop World and will include use case-based solutions and validated systems for Cloudera Hadoop deployments. Our first iteration of the Hadoop reference architecture was published in 2011, when we partnered with Cloudera and Intel to develop a groundbreaking architecture for Apache Hadoop, which was then a young platform. Since then, hundreds of organizations have deployed big data environments based on our validated systems.
The widespread adoption of simplified and cost-effective validated systems points to a broader theme that will permeate the Strata + Hadoop World conference. That is one of Hadoop as a maturing platform that is heading into the mainstream of enterprise IT and delivering proven business value.
About that business value? Dell EMC and Intel commissioned Forrester Consulting to conduct a Total Economic Impact™ (TEI) study to examine the potential ROI enterprises may realize by deploying the Dell EMC | Cloudera Apache Hadoop Solution, accelerated by Intel. Based on interviews with organizations using these solutions, the TEI study identified these three-year risk-adjusted results:
Clearly, there are many reasons to be excited about how far we’ve come with Hadoop, and the potential to take the platform to all new levels with the Dell EMC portfolio. If you’re heading to Strata + Hadoop World, you will have many opportunities to learn more about the work Dell EMC is doing to help organizations unlock the value of their most precious commodity—their data.
In the meantime, you can learn more at Dell.com/Hadoop.
Armando Acosta is the Hadoop planning and product manager and Subject Matter Expert at Dell EMC.
A couple of months ago I wrote a blog introducing Ansible and explained the type of tasks that can be easily automated with Ansible. Here I provide an overview of the most important concepts and share useful tips learned from experience in the past few months.
Tasks: A task is the smallest unit of work. It can be an action like “Install a database”, “Install a web server”, “Create a firewall rule” or “Copy this configuration file to that server”.
Plays: A play is made up of tasks. For example, the play “Prepare a database to be used by a web server” is made up of tasks: 1) “Install the database package” 2) “Set a password for the database administrator” 3) “Create a database” and 4) “Set access to the database”.
Playbook: A playbook is made up of plays. A playbook could be “Prepare my web site with a database backend”, and the plays would be 1) “Set up the database server” and 2) “Set up the web server”.
Roles: Roles are used to save and organize playbooks and allows sharing and reuse of existing roles. Following the previous examples: if you need to fully configure a web server, you can use roles that others have written and shared. Since roles are highly configurable (if written correctly) they can be easily re-used to suit any given deployment requirements.
Ansible Galaxy: Ansible Galaxy is an online repository where roles are uploaded so they can be shared with others. It is integrated with GitHub, so roles can be organized into git repositories and then shared via Ansible Galaxy.
These definitions can be depicted as shown below:
Please note this is just one way to organize what we want to do. We could have split up installation of the database and the web server into separate playbooks and into different roles. Most roles in Ansible galaxy install and configure individual applications. For example, here is one for installing mysql and another one for installing httpd.
Tips for writing plays and playbooks
The best source for learning Ansible is the official documentation site. And as usual, online search is your friend. I recommend starting with simple tasks like installing applications or creating users. Once you are ready, follow these guidelines:
In my next blog, I will share a role for adding the official Dell repositories for installing OpenManage Server Administrator and Dell System Update on RHEL and Ubuntu operating systems.
Author: Bruce Wagner, September 2016 (Solutions Performance Analysis Lab)
The goal of this blog is to illustrate the performance impact of DDR4 memory selection. Measurements were made on a Broadwell-EP CPU system configuration using the industry standard benchmarks listed in the following table 1.
Table 1: Detail of Server and Applications used with Intel Broadwell processor
Dell PowerEdge R630
2 x E5-2699 v4 @2.2GHz, 22 core, 145W, 55M L3 Cache
DDR4 product offerings including:
8GB 1Rx8 2400MT/s RDIMM (DPN 888JG)
32GB 2Rx8 2400MT/s RDIMM (DPN CPC7G)
64GB 4Rx8 2400MT/s LR-DIMM (DPN 29GM8)
1 x 750W
Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)
Memory Operating Mode – Optimizer
Node Interleaving – Disabled
Snoop mode – Opportunistic Snoop Broadcast
Logical Processor – Enabled
System profile – Performance
Intel optimized 126.96.36.199 linux64 binaries (http://www.spec.org/cpu2006)
v5.10 source from https://www.cs.virginia.edu/stream/
Intel Parallel Studio 2016 update2 compilation
Table 2 and figure 1 detail the memory subsystem within the 13G PowerEdge R630 as containing 24 DIMM sockets split into two sets of 12, one set per processor. Each 12-socket set is organized into four channels with three DIMM sockets per channel.
Table 2: Memory channels
Channel 0 DIMM Slots
Channel 1 DIMM Slots
Channel 2 DIMM Slots
Channel 3 DIMM Slots
A1, A5, A9
A2, A6, A10
A3, A7, A11
A4, A8, A12
B1, B5, B9
B2, B6, B10
B3, B7, B11
B4, B8, B12
Figure 1: Memory socket locations
Figure 2: Performance Impact of Memory Type
From Figure 2 we see that a memory configuration based upon Registered DIMMs (RDIMMs) provides a comprehensive 3.1% performance advantage as compared to an equivalent sized one composed of Load-Reduced DIMMs (LR-DIMM) despite both running at 2400 MT/s. LR-DIMMs make larger capacity memory configurations possible, but their inherently higher access latency results reduced application performance. LR-DIMMs also impose a nearly 30% power consumption penalty over the equivalent size/speed RDIMM. LR-DIMM should be resorted to only when the total system memory capacity requirement dictates a 3DPC configuration.
Table 3: Memory speed limits for 13G PowerEdge Models
Figure 3: Performance Impact of DIMM Rank Organization
From figure 3 we see that a 1DPC memory configuration composed of DIMMs of dual rank internal organization outperforms one composed of single rank DIMMs by 14%. This is due to DRAM’s large inherent delay when reversing read and write cycle access on a given rank leading to a significant reduction in throughput bandwidth on the memory channel. Given dual rank DIMMs or multiple DIMMs per channel, the CPU’s integrated memory controller can overlap schedule reads and writes on the memory channel to minimize RW turnaround time impact.
Figure 4: Performance Impact of Memory Speed
Figure 4 shows that a 2400 MT/s memory configuration provides 14% higher overall application performance than a 2133 MT/s one all other factors being the same. Modern 8Mbit 1.2V DDR4 DIMM technology is such that the higher speed incurs only a nominal increase in power consumption and thermal dissipation. 2400 MT/s DIMMs pricing and availability is also rapidly trending to be the commodity sweet spot.
Figure 5: Performance Impact of DIMM Slot Population
Figure 5 shows that a 2DPC population results in a slight 0.9% workload performance uplift over a 1DPC one attributed to the same memory controller data transfer overlap efficiency improvements as discussed for figure 3. A 3DPC result is shown to further highlight the marked performance degradation that results from the necessity to down clock the memory subsystem from 2400 MT/s to 1866 MT/s.
Figure 6: Performance Impact of DIMM Population Balance
In figure 6 we see a wide disparity in overall system memory bandwidth as a result of DIMM population balance.
Although the default Optimizer (aka Independent Channel) Memory Operating Mode supports odd numbers of DIMMs per CPU, there is a severe performance penalty in doing so.
The full list of memory module installation guidelines can be found within the product owner’s manual available thru www.dell.com.
In summary, to maximize workload performance the recommendation for 13G 2 socket servers is to populate all available channels with (2) dual–rank registered, 2400 MT/s DIMMs per channel.
At this year’s worldwide partner conference, Microsoft announced our partnership delivering Azure Stack as an Integrated System targeting availability of mid-2017. For Dell, this is a continuation of the partnership and joint development with Microsoft going back to the early days developing Cloud Platform Systems with a focus on delivering integrated Systems for Hybrid Cloud solutions to our customers. Azure Stack is the next phase of that partnership.
Another key highlight from the Microsoft announcement was Microsoft’s recommendation to adopt Cloud Platform Systems (CPS) today to move forward in the cloud journey.
“Customers ready to deploy an Azure-consistent cloud on their premises now should buy Microsoft Cloud Platform Solution (CPS). Customers will be able to use Azure Stack to manage CPS resources thereby preserving investments in CPS.” – Mike Neil, Corporate Vice President, Enterprise Cloud, Microsoft Corporation
The Dell Hybrid Cloud System for Microsoft CPS Standard (DHCS), our integrated system for CPS hybrid cloud, is a great way to get started and streamline your journey to cloud. The early steps to a hybrid cloud for customers are usually evolutionary, but still impact applications, security, infrastructure, and the operating model. The path to Azure Stack is in steps even further along that journey; so getting started with DHCS today helps with three key areas:
So for example (area #1), most applications today are virtualized, either traditional enterprise (Exchange, SharePoint) or so-called “n-tier” such as web or databases. As a first step, you inventory the applications and assess your options based on classification of applications and data, cost, security, and so forth. By the end of this step, you have identified the applications to rehost as IaaS virtual machines (VMs) in a cloud infrastructure like DHCS, and a migration plan for the applications and data. Eventually as you re-tool yourself to rebuild some of your existing applications as cloud native or develop new cloud native applications, Azure Stack will provide you with a platform to develop and deliver them on premises. With our integrated system for Azure Stack, you can continue to run your traditional applications on DHCS while managing them from Azure Stack as IaaS VMs without having to migrate or upgrade your current investments.
Area #2 is the part of your journey towards cloud having to do with adopting the cloud model. Orienting your business and operating model towards service delivery and consumption is key to getting most from cloud, and takes time and experience to achieve. Adopting multi-tenancy, self-service, metering and governance are critical first steps towards being truly cloud native. With a consumption model, you are now able to increase utilization and gain control of your resources and reduce cost risk while rapidly delivering services that your tenants need. DHCS comes ready to enable adoption of the cloud model on premises, on a software-defined infrastructure that is familiar and proven in the market today.
Hybrid adoption is the final area most customers struggle with. We have identified two main hybrid use cases to get started that bring value to customers today, and integrated them out-of-the-box into DHCS. With the Backup and Site Recovery services from the Microsoft Azure public cloud, you not only get integration into Azure, but also the ability to efficiently implement a business-continuity strategy with zero CAPEX and with OPEX based on consumption for your on-premises cloud.
With the Dell Hybrid Cloud System for Microsoft, you get a platform ready to rehost your applications today and deliver them as services to your tenants, enabling self-service, metering, and governance. You also get the option to consume Microsoft Azure services like Backup and Site Recovery out of the box. DHCS is built on a familiar and proven technology stack with Windows Server, System Center and Windows Azure Pack, enabling you to focus less on the workings of the technology and more on areas that transform your business as you continue to take advantage of cloud.
Whether you choose to rehost the applications and adopt IaaS with DHCS or eventually re-factor applications to leverage any of the Azure platform-as-a-service capabilities, Dell will partner with you along this journey and protect your investments as you adopt DHCS today and plan for Microsoft Azure Stack tomorrow.
Author: Yogendra Sharma, Ashish Singh, September 2016 (HPC Innovation Lab)
This blog describes the performance analysis on a PowerEdge R930 server powered by four Intel Xeon E7-8890 v4 @2.2GHz processors (code named as Broadwell-EX). Primary objective of this blog is to compare the performance of HPL, STREAM and few scientific applications ANSYS Fluent and WRF with the previous generation of Intel processor Intel Xeon E7-8890 v3 @2.5GHz codenamed Haswell-EX. Below are the configurations used for this study.
4 x Intel Xeon E7-8890 firstname.lastname@example.orgGHz (18 cores) 45MB L3 cache 165W
4 x Intel Xeon E7-8890 email@example.comGHz (24 cores) 60MB L3 cache 165W
1024 GB = 64 x 16GB DDR4 @1866MHz RDIMMS
1024 GB = 32 x 32GB DDR4 @1866MHz RDIMMS
Processor Settings > Logical Processors
Processor Settings > QPI Speed
Maximum Data Rate
Processor Settings > System Profile
Software and Firmware
RHEL 6.6 x86_64
RHEL 7.2 x86_64
Benchmark and Applications
V2.1 from MKL 11.2
V2.1 from MKL 11.3
v5.10, Array Size 1800000000, Iterations 100
v3.5.1, Input Data Conus12KM, Netcdf-188.8.131.52
V3.8 Input Data Conus12KM, Netcdf-4.4.0
Table 1: Details of Server and HPC Applications used with Broadwell-EX processors
In this section of the blog, we have compared benchmark numbers with two generations of processors on the same server platform i.e. PowerEdge R930 as well as performance of Broadwell-EX processors with different CPU profiles and memory snoop modes namely Home Snoop (HS) and Cluster On Die(COD).
The High Performance Linpack Benchmark is a measure of a system's floating point computing power. It measures how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering. HPL benchmark was run on both PowerEdge R930 servers (With Broadwell-EX and Haswell-EX ) with block size of NB=192 and problem size of N=340992.
Figure 1: Comparing HPL Performance across BIOS profiles Figure 2: Comparing HPL Performance over two generations of processors
Figure 1 depicts the performance of PowerEdge R930 server with Broadwell-EX processors on different BIOS options. HS (Home snoop mode) performs better than the COD (Cluster-on-die) on both of the system profiles Performance and DAPC. Figure 2 compares the performance between four socket Intel Xeon E7-8890 v3 and Intel Xeon E7-8890 v4 processor servers. HPL showed 47% performance improvement with four Intel Xeon E7-8890 v4 processors on R930 server in comparison to four Intel Xeon E7-8890 v3 processors. This was due to ~33% increase in the number of cores and 13% increase due to new improved version of both Intel compiler and Intel MKL.
Stream benchmark is a synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
Figure 3: Comparing STREAM Performance across BIOS profiles Figure 4: Comparing STREAM Performance over two generations of processors
As per Figure 3, the memory bandwidth of PowerEdge R930 server with Intel Broadwell-EX processors are same on different bios profiles. Figure4 shows the memory bandwidth of both Intel Xeon Broadwell-EX and Intel Xeon Haswell-EX processors with PowerEdge R930 server. Both Haswell-EX and Broadwell-EX support DDR3 and DDR4 memories respectively, while the platform with this configuration supports 1600MT/s of memory frequency for both generation of processors. Due to the same memory frequency supported by the PowerEdge R930 platform for both generation of processors, both Intel Xeon processors have same memory bandwidth of 260GB/s with the PowerEdge R930 server.
The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs. It features two dynamical cores, a data assimilation system, and a software architecture facilitating parallel computation and system extensibility. The model serves a wide range of meteorological applications across scales from tens of meters to thousands of kilometers. WRF can generate atmospheric simulations using real data or idealized conditions. We used the CONUS12km and CONUS2.5km benchmark datasets for this study. CONUS12km is a single domain and small size (48hours, 12km resolution case over the Continental U.S. (CONUS) domain from October 24, 2001) benchmark with 72 seconds of time step. CONUS2.5km is a single domain and large size (Latter 3hours of a 9hours, 2.5km resolution case over the Continental U.S. (CONUS) domain from June 4, 2005) benchmark with 15 seconds of time step. WRF decomposes the domain into tasks or patches. Each patch can be further decomposed into tiles that are processed separately, but by default there is only one tile for every run. If the single tile is too large to fit into the cache of the CPU and/or core, it slows down computation due to WRF’s memory bandwidth sensitivity. In order to reduce the size of the tile, it is possible to increase the number of tiles by defining “numtile = x” in input file or defining environment variable “WRF_NUM_TILES = x”. For both CONUS 12km and CONUS 2.5km the number of tiles are chosen based on best performance which is equal to 56.
Figure 5: Comparing WRF Performance across BIOS profiles
Figure 5 demonstrates the comparison of WRF datasets on different BIOS profiles .With Conus 12KM data ,all the bios profiles performs equally well because of the smaller data size while for CONUS 2.5KM Perf.COD (Performance System Profile with Cluster-On-Die snoop mode) gives best performance. As per the figure 5, the Cluster-on-Die snoop mode is performing 2% higher than Home snoop mode, while the Performance system profile gives 1% better performance than DAPC.
Figure 6: Comparing WRF Performance over two generations of processors
Figure 6 shows the performance comparison between Intel Xeon Haswell-EX and Intel Xeon Broadwell-EX processors with PowerEdge R930 server. As shown in the graph, Broadwell-EX performs 24% better than Haswell-EX for CONUS 12KM data set and 6% better for CONUS 2.5KM.
ANSYS Fluent is a computational fluid dynamics (CFD) software tool. Fluent includes well-validated physical modeling capabilities to deliver fast and accurate results across the widest range of CFD and multi physics applications.
Figure 7: Comparing Fluent Performance across BIOS profiles
We used three different datasets for Fluent with ‘Solver Rating’ (Higher is better) as the performance metric. The above graph Figure 7 shows that all three datasets performed 4% better with Perf.COD (Performance System Profile with Cluster-On-Die snoop mode) bios profile than others. While, the DAPC.HS (DAPC system profile with Home snoop mode) bios profile shows lowest performance. For all three datasets ,the COD snoop mode performs 2% to 3% better than Home snoop mode and Performance system profile performs 2% to 4% better than DAPC. For all these three datasets the behaviour of Fluent is consistent.
Figure 8: Comparing Fluent Performance over two generations of processors
As shown above in Figure 8, for all the test cases on PowerEdge R930 with Broadwell-EX ,Fluent showed 13% to 27% performance improvement in-comparision to PowerEdge R930 with Haswell-EX.
Overall, Broadwell-EX processor makes the PowerEdge R930 server more powerful and more efficient. With Broadwell-EX, the HPL performance increses in the smae manner as increase in the number of cores in comparison to Haswell-EX. There is also increase in the performance for real time applications depending on their nature of computation. So, it can be a good choice to upgrade for those who are using compute hungry applications.
Authors: Thomas Cantwell and Gong Wang
Important! Some of the bios settings described in this article are not yet in released Dell BIOS. The BIOS releases that will coincide with Windows Server 2016 Dell launch will carry these new settings.
TPM 2.0 is the latest release of Trusted Platform Module (TPM) that can be installed on Dell PowerEdge 13G servers. To properly configure the TPM on Dell PowerEdge servers, you must use different settings for Windows Server 2012R2 and Windows Server 2016 to match the OS capabilities.
The system must be configured for UEFI boot mode prior to OS installation (Caution! If you install the OS in legacy bios mode, you must reinstall the OS to switch to UEFI mode).
If the system boot mode is set to Legacy BIOS, the TPM will be “enabled with reduced functionality” in Windows Server OS (see: https://support.microsoft.com/en-us/kb/3123365 )
Dell BIOS settings to enable TPM, and change settings can be found under the System Security tab in BIOS:
Windows Server 2012R2 –
To use TPM 2.0 on Windows Server 2012R2, you must install a hotfix - https://support.microsoft.com/en-us/kb/3095701 . Without this hotfix, the OS will not be able to recognize the TPM.
The following setting will be available in bios releases that will coincide with Windows Server 2016 launch from Dell. In BIOS (under TPM Advanced), set the TPM to SHA1, as shown. Windows Server 2012R2 only supports SHA1.
Windows Server 2016 –
Windows Server 2016 is due to ship soon. There are some important and significant bios setting modifications that must be made to fully leverage Windows Server 2016. This allows the server to be ready for the TPM-trusted Guarded Host deployment, on which the Shielded Virtual Machines can run. Guarded Host and Shielded VM are new to Windows Server 2016.
As above, the server must be configured for UEFI mode to enable TPM 2.0 to be fully functional. TPM 2.0 is supported in UEFI mode only. To deploy the TPM-based guarded host, system BIOS settings must be configured as following:
Boot Settings: UEFI
System Security > Secure Boot > Secure Boot Enabled (Required for Guarded Host)
System Security > TPM Security: On
BIOS settings as shown below – Windows Server 2016 supports the newer SHA256, which is more secure, so set to SHA256.
Updated monthly, this publication provides you with new and recently revised information and is organized in the following categories; Documentation, Notifications, Patches, Product Life Cycle, Release, Knowledge Base Articles.
Subscribe to the RSS (Use IE only)
None at this time
Product Release Notification – vWorkspace 8.6.1
Type: Patch Release Created: August 2016
210044 - The connector for Mac crashes when a second user tries to connect
There are multiple users who can access a Mac computer. User A installs the Mac connector and can run it and connect to vWorkspace. User B...
Created: August 4, 2016
210083 - After Windows Update patching, EOP is not working anymore
After updating the Gold Image via Windows Update, EOP Printers are not redirected anymore to VDI. In the Event Viewer, the following...
Created: August 5, 2016
210082 - When using HTML5 to connect to Seamless App, the keyboard layout is not the one defined on the client computer
When using HTML5 to connect to Seamless App (Managed Application) , the keyboard layout is not the one defined on the client computer.
211463 - How to Provision a Remote Desktop Session Host.
Use the following steps to configure a template that is used to create a virtual computer within a Session Host computer group.
Created: August 26, 2016
206895 - vWorkspace farm connection failed for specific user "CKerbTicketsForAuth::requestTicketsFromSystem:protocolStatus failed"
When trying to connect to the vWorkspace farm, connection fails for at least one user with the error: Connection to [farm] failed...
Revised: August 2, 2016
66448 - Troubleshooting Guide for vWorkspace 7.x
Revised: August 3, 2016
209120 - Attempting to connect to Web Access using the Mac Connector results in error: Failed to load settings from Pit file
Mac fails to connect to vWorkspace through Web Access and the SSL Gateway/Secure Access Server Error is: Failed to load settings from Pit file
204875 - Errors retrieving specific template VMs from Hyper-V
When importing templates from Hyper-V or attempting to deploy VDIs from specific templates, the following error is generated: "...
90422 - How to ignore the remote (client-side) keyboard layout when establishing a remote connection
When making a remote connection, via vWorkspace or MS RDP, by default the client-side keyboard layout is reflected. So a user connecting...
120050 - Multimonitor not working or Audio and microphone are not redirected when connecting to Windows 7 Professional VDI
When connecting to the Windows 7 Professional VDI machines, multimonitor does not work and audio and microphone are not redirected.
209063 - How to clear the username box on a failed login attempt to WebAccess
Is there any way to clear the Username box on the Web Access login page following a failed login? The password box clears but we would...
205626 - No data to be displayed in vWorkspace Reporting and Monitoring
Foglight Monitoring software is not working and not able to monitor any systems in VDI environment
109645 - Error "Server [Server_name] could not be contacted."
94615 - "An error occurred. Please try again later." when using vWorkspace to view YouTube content
When using vWorkspace 7.6/8.0 Flash Redirection to visit YouTube a warning message will be displayed in place of the video content, the message...
208139 - Connection Broker Lookup Test on vWorkspace Secure Access Gateway times out
When using the Connectivity Test button to test the connection with brokers, the test times out.
97037 - Error: "CKerbTicketsForAuth::requestTicketsFromSystem: ProtocolStatus failed" When trying to connect using App Portal.
App Portal Connection fails with this error: CKerbTicketsForAuth::requestTicketsFromSystem: ProtocolStatus failed
124356 - How To Utilize User Profiles to save desktop icon arrangement
By using either Folder Redirection or vWorkspace User Profiles, it is easy to save a Users desktop icons. This article will take this one stage...
186075 - How to enable Desktop Integrated Mode in the 8.6.x Windows connector.
When installing the 8.6 or higher Windows connector, an additional shortcut for the Desktop Integrated (DI) mode is not available in the start...
207442 - Launching App-V application errors with "The package containing your application is not published on this machine"
When trying to launch an App-V application the following error is seen: <img alt="App-v error" src="https://prod-support-images-cfm.s3.amazonaws...
208223 - In Linux-based 10Zig thin clients, the USB redirection is not working
When connecting from Windows-based 10Zig thin client devices, USB redirection is working. From Linux-based ones it doesn’t work.
207445 - Is it possible to upgrade vWorkspace in stages?
In many cases, upgrading a large vWorkspace environment cannot be done in a short time period. Is it possible to upgrade vWorkspace in stages?
Revised: August 9, 2016
94387 - Linked clone failed CloneVmLinked SoapException thrown during CreateLinkedClone_Task!
When trying to create a linked clone it gives the error CloneVmLinked: SoapException thrown during CreateLinkedClone_task!
Revised: August 11, 2016
114826 - Dell vWorkspace Optional Hotfix 258745 Version 7.6.1 for Linux Connector for Wyse T50
Revised: August 22, 2016
156633 - How to enable Drag and Drop to Published App
The user can’t use Drag and Drop over Seamless Application
Revised: August 19, 2016
88354 - Installing AppSense after PNTools causes AppSense to not function correctly
On a Windows 7 VDI if PNTools is installed first, then AppSense’s Environment manager agent is installed, then any actions; set up in...
Revised: August 23, 2016
Product Life Cycle
Product Life Cycle - vWorkspace
Revised: August 18, 2016
The intent of this blog is to illustrate and analyze the performance obtained on Intel Broadwell-EP 4S processor with the focus on HPC applications, two synthetic benchmarks along with four applications namely, High Performance Linpack (HPL), STREAM, Weather Research and Forecasting (WRF), NAnoscale Molecular Dynamics (NAMD) and Fluent. These runs have been performed on a standalone single server PowerEdge R830. Combinations of System BIOS profiles and Memory Snoop modes are compared for better analysis.
Table 1: Details of Server and Applications used with Intel Broadwell processor
Dell PowerEdge R830
4 x E5-4669 v4 @2.2GHz, 22 core, 135W, 55M L3 Cache
AVX Base Frequency @1.7GHz
32 x 16GB DDR4 @ 2400MHz (Total=512GB)
2 x 1600W
System profile – Performance and Performance Per Watt (DAPC)
Snoop modes – Cluster on Die (COD) and Home Snoop (HS)
Logical Processor and Node Interleaving – Disabled
I/O Non-Posted Prefetch – Disabled
From Intel MKL (Problem size is 253440)
From Intel Parallel studio 2016 update3
Intel MPI – 5.1.3
The server used for obtaining results is Dell’s PowerEdge 13th generation server. It is a high-performance four-socket, 2U rack server. It supports massive memory density (up to 3TB). The Intel® Xeon® Processor E5-4669 v4 (Product Family E5-4600 v4) is a 14nm die supporting two snoop modes in 4 socket systems - Home Snoop and Cluster on Die (COD). It is based on the micro-architecture code named Broadwell-EX.
Default snoop mode being Home Snoop mode. Cluster on Die is available only on processor models having more than 12 cores on Intel’s Broadwell series. In this mode, there is a logical split in the socket into two NUMA domains and are exposed to the operating system. The total number of cores, level 3 Cache is divided equally in the two sliced NUMA domains. Each having one home agent with equal number of cores cache slices in each NUMA domain. The NUMA domain (cores plus home agent) is called a cluster. The COD mode is best suitable for highly NUMA optimized workloads.
The STREAM is a synthetic HPC benchmark. It evaluates the sustained memory bandwidth in MB/s by counting only the bytes that the user program requested to be loaded or stored to the memory. The “TRIAD function” score reported by this benchmark is used to analyze the performance of stream bandwidth test. Operation carried out by TRIAD is - TRIAD: a(i) = b(i) + q*c(i)
Figure 1: Memory Bandwidth with STREAM
From figure 1 it can be observed that the DAPC.COD performs the best. DAPC and Performance profiles have similar performance, the memory bandwidth varies slightly from 0.2% - 0.4% for HS and COD mode respectively. COD snoop mode performs ~2.9-3.0% better than HS.
Figure 2: STREAM Memory Bandwidth on DAPC.COD
Here, figure 2 shows the performance obtained with STREAM Triad memory bandwidth in DAPC.COD configuration taking into account local, local NUMA and remote bandwidth. The full system memory bandwidth is ~ 226 GB/s.
To obtain the local memory bandwidth, the processes are bind to a socket and only the memory local to that socket is accessed (same NUMA node with NUMA enabled). The local NUMA node with 11 threads gives almost half the performance when compared to local socket with 22 threads since the number of processors becomes half. For remote memory, the memory bandwidth from remote to same socket drops by 64% while from remote to other socket, the processes are bind to another socket and memory which is remote to that socket is accessed through QPI link (remote NUMA node). Since there is limitation to QPI bandwidth, the remote memory bandwidth of remote to other socket is going down another 5% when compared to remote to same socket.
High Performance LINPACK (HPL) is an industry standard compute intensive benchmark. It is traditionally used to stress the compute and memory subsystem. It measures the speed with which computer solves linear equations and calculates a system’s floating-point computing power. Intel’s Math Kernel Library (software library to perform numerical linear algebra on digital systems) is required by it.
Figure 3: HPL Performance and Efficiency
Figure 3 illustrates HPL performance benchmark. From this graph, it is clear that DAPC and Performance profiles are giving almost similar results whereas there is difference when comparing HS with COD. With DAPC, COD is 6.2% higher than HS whereas with Performance profile, COD yields 6.6% higher performance than HS. The HPL performance is analyzed in “GFLOP/second”. The Efficiency is more than 100% because of the AVX base frequency.
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction used for both atmospheric research and weather forecasting needs. It generates atmospheric simulations using real data (observations, analysis) or idealized conditions. It features two dynamical cores, a data assimilation system, and a software architecture which allows for parallel computation and system extensibility. For this study, CONUS12km (small) and CONUS2.5km (large) data set is taken and the “Average Time Step” computed is used as a metric to analyze the performance.
CONUS12km is a single domain and medium size (48hours, 12km resolution case over the Continental U.S. (CONUS) domain from October 24, 2001) benchmark with a time step of 72 seconds. CONUS2.5km is a single domain and large size (Latter 3hours of a 9hours, 2.5km resolution case over the Continental U.S. (CONUS) domain from June 4, 2005 with a time step of 15 second).
The number of tiles is chosen based on best performance obtained by experimentation and is defined by setting the environment variable “WRF_NUM_TILES = x” where x denotes number of tiles. The application was compiled with the “sm + dm” mode. The combinations of MPI and OpenMP processes that were used are as follows:
Table 3: WRF Application Parameters used with Intel Broadwell processor
Total no. of cores
Figure 4: Performance with WRF, CONUS12km Figure 5: Performance with WRF, CONUS2.5km
Figure 4 illustrates CONUS 12km. With COD there is 4.5% improvement in the average time step when compared with HS. While for CONUS2.5km, DAPC and Performance profiles are showing a variance from 0.4% to 1.6% for HS and COD respectively. COD performs ~2.1% - 3.2% better than HS. Due to the large dataset size, CONUS2.5km can more efficiently utilize larger number of processors. For both datasets DAPC.COD performs the best.
NAMD is a molecular dynamics research application which is portable, parallel and object oriented designed specifically for high-performance simulations of large bio molecular systems. It is developed using charm++. For this study, the three widely used datasets have been taken, namely ApoA1 (92,224 Atoms) – standard NAMD cross-platform benchmark, F1ATPase (327,506 Atoms) and STMV (virus, 1,066,628 Atoms) – useful for demonstrating scaling to thousands of processors. ApoA1 is the smallest and STMV being the largest dataset.
Figure 6: Performance of NAMD on BIOS Profiles (The Lower the Better)
The performance obtained on ApoA1 across BIOS profiles is same while for ATPase it is almost similar. The difference is visible in case of STMV (larger dataset) over other two datasets since the number of atoms increases which in turn helps in proper utilization of processors. In case of STMV, DAPC and Performance profiles vary from 0.7% to 1.3% for HS and COD respectively. Also the COD snoop mode performs ~6.1% - 6.7% better than HS.
Ansys Fluent is a powerful computational fluid dynamics (CFD) software tool. “Solver Rating” (the higher the better) is considered as a metric to analyze the performance on six input data sets for Fluent.
Figure 7: Performance comparison of Fluent on BIOS Profiles (The Higher the Better)
The Perf.COD is expected to perform best with Fluent. For all datasets the Solver Rating varies from 2-5%.
To conclude, the R830 platform is performing upto the mark with all expected output. It is a good pick for HPC workloads giving best results with DAPC.COD system BIOS profile. It is a great choice in terms of overall system architecture improvements and supporting latest processor.