Dell Community

Blog Group Posts
Application Performance Monitoring Blog Foglight APM 105
Blueprint for HPC - Blog Blueprint for High Performance Computing 0
Custom Solutions Engineering Blog Custom Solutions Engineering 9
Data Security Data Security 8
Dell Big Data - Blog Dell Big Data 68
Dell Cloud Blog Cloud 42
Dell Cloud OpenStack Solutions - Blog Dell Cloud OpenStack Solutions 0
Dell Lifecycle Controller Integration for SCVMM - Blog Dell Lifecycle Controller Integration for SCVMM 0
Dell Premier - Blog Dell Premier 3
Dell TechCenter TechCenter 1,860
Desktop Authority Desktop Authority 25
Featured Content - Blog Featured Content 0
Foglight for Databases Foglight for Databases 35
Foglight for Virtualization and Storage Management Virtualization Infrastructure Management 256
General HPC High Performance Computing 228
High Performance Computing - Blog High Performance Computing 35
Hotfixes vWorkspace 66
HPC Community Blogs High Performance Computing 27
HPC GPU Computing High Performance Computing 18
HPC Power and Cooling High Performance Computing 4
HPC Storage and File Systems High Performance Computing 21
Information Management Welcome to the Dell Software Information Management blog! Our top experts discuss big data, predictive analytics, database management, data replication, and more. Information Management 229
KACE Blog KACE 143
Life Sciences High Performance Computing 11
OMIMSSC - Blogs OMIMSSC 0
On Demand Services Dell On-Demand 3
Open Networking: The Whale that swallowed SDN TechCenter 0
Product Releases vWorkspace 13
Security - Blog Security 3
SharePoint for All SharePoint for All 388
Statistica Statistica 24
Systems Developed by and for Developers Dell Big Data 1
TechCenter News TechCenter Extras 47
The NFV Cloud Community Blog The NFV Cloud Community 0
Thought Leadership Service Provider Solutions 0
vWorkspace - Blog vWorkspace 511
Windows 10 IoT Enterprise (WIE10) - Blog Wyse Thin Clients running Windows 10 IoT Enterprise Windows 10 IoT Enterprise (WIE10) 6
Latest Blog Posts
  • General HPC

    Collaboration Showcase: Dell EMC, TACC and Intel join forces on Stampede2 performance studies

    Dell EMC Solutions, June 2018

     

    Stampede2 system, is the result of collaboration between the Texas Advanced Computing Center (TACC), Dell EMC and Intel. Stampede2 consists of 1,736 Dell EMC PowerEdge C6420 nodes with dual-socket Intel Skylake processors, 4,204 Dell EMC PowerEdge C6320p nodes with Intel Knights Landing bootable processors, a total of 5,940 compute nodes, and 24 additional login and management servers, Dell EMC Networking H-series switches, all interconnected by an Intel Omni-Path Architecture (OPA) fabric.

     

    Two technical white papers were recently published through the joint efforts of TACC, Dell EMC and Intel. One white paper describes the Network Integration and Testing Best Practices on the Stampede2 cluster. The other white paper discusses the Application Performance of Intel Skylake and Intel Knights Landing Processors on Stampede2 and highlights the significant performance advantage of Intel Skylake processor at a multi node scale in four commonly used applications: NAMD, LAMMPS, Gromacs and WRF. For build details, please contact your Dell EMC representative. If you have VASP license, we are happy to share VASP benchmark results as well.

     

    Deploying Intel Omni-Path Architecture Fabric in Stampede2 at the Texas Advanced Computing CenterNetwork Integration and Testing Best Practices (H17245)

     

    Application Performance of Intel Skylake and Intel Knights Landing Processors on Stampede2 (H17212)

  • Dell TechCenter

    Persistent Memory (NVDIMM-N) support on Dell EMC PowerEdge servers and VMware ESXi

    This blog is written by Dell Hypervisor Engg.

    Persistent Memory(also known as Non Volatile Memory (NVM)) is a random access memory type which retains it’s contents even when system power goes down in the event of an unexpected power loss, user initiated shutdown, system crash etc. Dell EMC introduced support for NVDIMM-N from their 14th generation of PowerEdge servers. VMware announced support for NVDIMM-N from vSphere ESXi 6.7 onwards. The NVDIMM-N resides in a standard CPU Memory slot, placing data closer to the processor thus reducing the latency and fetch maximum performance. This document detail about the support stance of NVDIMM-N and VMware ESXi specific to Dell EMC PowerEdge servers. This paper provides an insight into the usecases where NVDIMM is involved and the behavior caveats of the same. 

    Dell EMC support for Persistent Memory (PMem) and VMware ESXi

    Dell EMC started supporting PMem (also known as Non-volatile Memory (NVM)) from their 14th generation of PowerEdge server release onwards. However, VMware introduced their support of NVDIMM from vSphere 6.7 release. Refer to section “Server Hardware Configuration” in NVDIMM-N user guide to know the PowerEdge server models that supports NVDIMM-N. The server support matrix is same for VMware ESXi as well.  The hardware and firmware requirements for NVDIMM-N to function properly in ESXi is documented in the user guide. Dell EMC highly recommend customers to go through the user guide before getting started with NVDIMM-N.
     
    Refer to Dell EMC whitepaper to know about the usecases, utilities available to monitor and manage NVDIMM-N on VMware ESXi.

  • Dell TechCenter

    Why 4K drive recommended for OS installation?

    Overview

    This blog helps to understand why the transition happened from 512 bytes sector disk to 4096 bytes sector disk. The blog also gives answers to why 4096 bytes (4K) sector disk should be opted for OS installation. The blog first explains about sector layout to understand the need of migration, then gives reasoning behind the migration and finally it covers the benefits of 4K sector drive over 512 bytes sector drive.

    Sector layout

    A sector is the minimum storage unit of a hard disk drive. It is a subdivision of a track on a hard disk drive. The sector size is an important factor in the design of Operating System because it represents the atomic unit of I/O operations on a hard disk drive. In Linux, you can check the size of the disk sector using "fdisk -l" command.

                                                                  Figure-1: The disk sector size in Linux

    As shown in Figure-1, both the logical and physical sectors are 512bytes long for this Linux system.

    The sector layout is structured as follows:
    1) Gap section: Each sector on a drive is separated by gap section.
    2) Sync section: It indicates the beginning of the sector.
    3) Address Mark section: It contains information related to sector identification e.g. sector’s number and location.
    4) Data section: It contains the actual user data.
    5) ECC section: It contains error correction codes that are used to repair and recover data that might be damaged during the disk read/write process.

    Each sector stores a fixed amount of user data, traditionally 512 bytes for hard disk drives. But because of better data integrity at higher densities and robust error correction capabilities newer HDDs now store 4096 bytes (4K) in each sector.

    Need for large sector

    The number of bits stored on a given length of track is termed as areal density. Increasing areal density is a trend in the disk drive industry not only because it allows greater volumes of data to be stored in the same physical space but it also improves transfer speed at which that medium can operate. With the increase in areal density, the sector has now consumed a smaller and smaller amount of space on the hard drive surface. This creates a problem because the physical size of the sectors on hard drives has shrunk but media defects have not. If the data in a hard drive sector consumes smaller areas then error correction becomes challenging. This is because media defects of the same size can damage a higher percentage of the data in the disk which has small area for a sector than the disk which has large area for a sector.

    There are two approaches to solve this problem. The first approach is to invest more disk space to ECC bytes to assure continued data reliability. But if we invest more disk space to ECC bytes this will lead to less disk format efficiency. Disk format efficiency is defined as (number of user data bytes X 100) / total number of bytes on disk. Another disadvantage is that the more ECC bits included, the disk controller requires more processing power to process the ECC algorithm. 

    Second approach is to increase the size of the data block and slightly increase the ECC bytes for each data block. With the increase of data block size, the amount of overhead required for each sector to store control information like gap, sync, address mark section etc. would reduce. For each sector the ECC bytes will increase but overall ECC bytes required for a disk would reduce because of larger sector. Reducing the overall amount of space used for error correction code improves format efficiency and increased ECC bytes for each sector gives capability to use more efficient and powerful error-correction algorithms. Thus, transition to a larger sector size has two benefits: improved reliability and greater disk capacity. 

    Why 4K only?

    From a throughput perspective, the ideal block size should be roughly equal to the characteristic size of a typical data transaction. We have to acknowledge that the average file size today is more than 512 bytes. Now a days applications in modern systems use data in large blocks, much larger than the traditional 512-byte sector size. Too small block sizes cause too much transaction overhead. While in case of large block sizes each transaction transfers a large amount of unnecessary data. 

    The size of a standard transaction in relational data Base systems is 4K. The consensus of opinion in the hard disk drive industry has been that physical block sizes of 4K-Block would provide a good compromise. It also corresponds to paging size used by operating systems and processors.

    Benefits

    • Improvement in Format Efficiency

      Figure-2: 512 bytes block vs 4096 bytes block

     

    Figure-3: Format Efficiency improvement in 4K disk

     512 byte sector format   4096 byte sector forma
     Gap, sync & address mark   15 bytes  15 bytes
     User data  512 bytes  4096 bytes
     Error-correcting code  50 bytes  100 bytes
     Total  577 bytes  4211 bytes
     Format Efficiency  88.7%  97.3%

                                                         Table-1: Format Efficiency improvement in 4K disk

     

    As we see in Figure-2, 4K sectors are 8 times as large as traditional 512 byte ones. Hence for the same data payload one need 8 times less gap, sync and address mark sections and 4 times less error correction code section. Reducing the amount of space used for error correction code and other non-data section improves format efficiency for 4K Format. Format efficiency improvement is shown in Figure-3 and Table-1, there is a gain of 8.6% format efficiency for 4K sector disk over 512byte sector disk.

    • Reliability and Error Correction

    Figure-4: Effect of media defect on disk density

    As shown in Figure-4, the effect of media defect on disk with higher areal density is more than the disk with the lower areal density disk. As areal density increases we need more ECC bytes to retain same level of error correction capability. The 4K format provides enough space to expand the ECC field from 50 to 100 bytes to accommodate new ECC algorithms. The enhanced ECC coverage improves the ability to detect and correct processed data errors beyond the 50-byte defect length associated with the 512-byte sector format.

    4K drive Support on OS & Dell PowerEdge Servers

    4K Data disks are supported on Windows Server 2012 but as boot disk only supported in UEFI mode. For Linux, 4K hard drives require a minimum of RHEL 6.1 and SLES 11 SP2. 4K boot drives are only supported in UEFI mode in Linux. Kernel support for 4K drives is available in kernel versions 2.6.31 and above.
    PERC H330, H730, H730P, H830, FD33xS, and FD33xD cards support 4K block size disk drives, which enables you to efficiently use the storage space. 4K disks can be used on the Dell PowerEdge Servers supporting above PERC cards.

    Conclusion

    The physical size of each sector on the disk has become smaller as a result of increase in areal densities in disk drives. If the number of disk defects does not scale at the same rate, then we expect more sectors to be corrupted and we need strong error correction capability for each sector. Disk drives with larger physical sectors and more ECC bytes for each sector provide enhanced data protection and correction algorithms. The 4K format helps to achieve better format efficiencies and improves the reliability and error correction capability. This transition will result in better user experiences, hence the 4K drive should be opted for OS installation. 

    References:

    http://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/512e_4Kn_Disk_Formats_120413.pdf

    https://www.seagate.com/files/www-content/product-content/enterprise-performance-savvio-fam/enterprise-performance-15k-hdd/_cross-product/_shared/doc/seagate-fast-format-white-paper-04tp699-1-1701us.pdf

  • Life Sciences

    De Novo Assembly with SPAdes

    Overview

    We published the whitepaper, “Dell EMC PowerEdge R940 makes De Novo Aseembly easier”, last year to study the behavior of SOAPdenovo2 [1]. However, the whitepaper is limited to one De Novo assembly application. Hence, we want to expand our application coverage little further. We decided to test SPAdes (2012) since it is a relatively new application and reported for some improvement on the Euler-Velvet-SC assembler (2011) and SOAPdenovoIdea. SPAdes is also based on de Bruijn graph algorithm like most of the assemblers targeting Next Generation Sequencing (NGS) data. De Bruijin graph-based assemblers would be more appropriate for larger datasets having more than a hundred-millions of short reads.

     As shown in Figure 1, Greedy-Extension and overlap-layout-consensus (OLC) approaches were used in the very early next gen assemblers [2]. Greedy-Extension’s heuristic is that the highest scoring alignment takes on another read with the highest score. However, this approach is vulnerable to imperfect overlaps and multiple matches among the reads and leads to an incomplete assembly or an arrested assembly. OLC approach works better for long reads such as Sanger or other technology generating more than 100bp due to minimum overlap threshold (454, Ion Torrent, PacBio, and so on). De Bruijin graph-based assemblers are more suitable for short read sequencing technologies such as Illumina. The approach breaks the sequencing reads into successive k-mers, and the graph maps the k-mers. Each k-mer forms a node, and edges are drawn between each k-mer in a read.

    Figure 1 Overview of de novo short reads assemblers. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3056720/

    SPAdes is a relatively recent application based on de Bruijn graph for both single-cell and multicell data. It improves on the recently released Euler Velvet Single Cell (E +V- SC) assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data).

    All tests were performed on Dell EMC PowerEdge R940 configured as shown in Table 1. The total number of cores available in the system is 96, and the total amount of memory is 1.5TB.

    Table 1 Dell EMC PowerEdge R940 Configuration
    Dell EMC PowerEdge R940
    CPU 4x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz (Skylake)
    RAM 48x 32GB @2666 MHz
    OS RHEL 7.4
    Kernel 3.10.0-693.el7.x86_64
    Local Storage 12x 1.2TB 10K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard Drive in RAID 0
    Interconnect Intel® Omni-Path
    BIOS System Profile Performance Optimized
    Logical Processor Disabled
    Virtualization Technology Disabled
    SPAdes Version 3.10.1
    Python Version 2.7.13

    The data used for the tests is a paired-end read, ERR318658 which can be downloaded from European Nucleotide Archive (ENA). The read generated from blood sample as a control to identify somatic alterations in the primary and metastatic colorectal tumors. This data contains 3.2 Billion Reads (BR) with the read length of 101 nucleotides.

    Performance Evaluation

    SPAdes runs three sets of de Bruijn graphs with 21-mer, 33-mer, and 55-mer consecutively. This is the main difference with regards to SOAPdenovo2 which run a single k-mer, either 63-mer or 127-mer.

    In Figure 2, the runtimes, wall-clock times, are plotted in days (blue bars) with various number of cores, 28, 46, and 92 cores. Since we do not want to use the entire cores of each socket, 92 cores were picked as the maximum number of cores for the system. One core per socket was reserved for OS and other maintenance processes. Subsequent tests were done by reducing the number of cores in half. Peak memory consumptions for each case is plotted as a line graph. SPAdes runs significantly longer than SOAPdenovo2 due to the multiple iterations on three different k-mers.

    SPAdes benchmark

    The peak memory consumption is very similar to SOAPdenovo2. Both applications require slightly less than 800GB memory to process 3.2 BR.

    Conclusion

    Utilizing more cores helps to reduce the runtime of SPAdes significantly as shown in Figure 2. For SPAdes, it is recommendable to use the highest core count CPUs like Intel Xeon Plantinum 8180 processor with 28 cores and 3.80GHz to bring down the runtime further.

    Resources

    Internal web page

    1. http://en.community.dell.com/techcenter/blueprints/blueprint_for_hpc/m/mediagallery/20444301

    External web page

    1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874646/

    Contacts
    Americas
    Kihoon Yoon
    Sr. Principal Systems Dev Eng
    Kihoon.Yoon@dell.com
    +1 512 728 4191



    Idea It refers an earlier version of SOAPdenovo, not SOAPdenovo2.

  • Windows 10 IoT Enterprise (WIE10) - Blog

    Quick Start: Enhanced Out Of Box Experience (OOBE)

    I am excited to announce the availability of Quick Start on the newly launched Wyse 5070 WIE10 thin client. The Quick Start product runs on first boot and can be launched manually as required. Quick Start provides the end user with an enhanced first time out-of-box experience aka OOBE and informs the user about the product details-both hardware as well as software. Upon walking through the screens, the end user is prompted to configure the thin client if they chose to, or simply proceed with using their brand-new Dell Wyse 5070 thin client.

    Here are some screenshots:

    And:

  • Windows 10 IoT Enterprise (WIE10) - Blog

    Fix issue: Windows Thin Clients frequently rebooting

    A bit of background on the write filter- Microsoft has the UWF or Unified Write Filter available for Windows 10 IoT Enterprise thin clients (they had other write filters like EWF and FBWF for WES7, WE8S, etc previously). The write filter starts on boot (enabling or disabling prompts a reboot) and captures any and all writes to disk to an overlay called write filter cache. This cache can be in RAM (typically the case for thin clients and what we are discussing today) or on storage. So, apps think they are writing persistently to disk when in actuality, they are writing to volatile RAM and those writes are lost when unit is rebooted. Since these are typically VDI apps writing temporary data and the user generated data is in the back-end VDI infrastructure, this is actually ideal. While not entirely relevant to this discussion, I should note that the UWF does provide a mechanism to bypass itself with file, folder and registry exclusions for programs like Windows Defender, for example, that needs to frequently persist it's writes.

    The write filter has two main functions:

    1. Protect the storage medium (usually eMMC or flash) from excessive writes thus extending the life of the thin client
    2. Protect the OS and user data from malware and viruses and prevent end user from filling up the disk with non-critical data like 4k videos, etc.

    Typical Windows thin clients have at most 1 GB of UWF cache (some with 8 GB RAM have up to 2 GB). Once this UWF cache fills up, the OS starts complaining about low memory or low UWF cache size. Usually, once the UWF cache reaches a critical 90% level, the unit has to be rebooted. In most case, this doesn't happen for weeks, but in some deployments, this happens almost daily or more (this could be excessively verbose logging by some applications, browser cache, etc). This is an industry-wide issue for all Windows thin clietns running with the write filter enabled.

    I am excited to announce the release of our brand-new patent-pending product, Overlay Optimizer that solves this very issue. Without getting into the details aka "sausage-making", I will say that Overlay Optimizer will ensure that your Windows thin client doesn't reboot as frequently as it needed to. Not only does this mean that you have a greater system up-time and therefore, a much better end-user experience; you also can avoid the need to upgrade your thin clients from say 4 GB RAM to 8 GB RAM. This patent-pending software is only available on Dell Thin Clients and will help our customers extract more performance/up-time from their thin clients. 

    Overlay Optimizer is available for all Dell Thin Clients running Windows 10 IoT Enterprise and can be downloaded for free from here:

    https://downloads.dell.com/wyse/OverlayOptimizer/1.0/

    Hope this helps-please comment is this does help solve your issues.

  • Life Sciences

    RNA-Seq pipeline benchmark with Dell EMC Ready Bundle for HPC Life Sciences

    Deferentially Expressed Gene (DEG) Analysis with Tuxedo Pipeline

    Overview

    Gene expression analysis is as important as identifying Single Nucleotide Polymorphism (SNP), InDel or chromosomal restructuring. Eventually, the entire physiological and biochemical events depend on the final gene expression products, proteins. Many quantitative scientists, non-biologists tend to oversimplify the flow of genetic information and forget about what the actual roles of proteins are. Simplification is the beginning of most science fields; however, it is too optimistic to think that this practice also works for biology. Although all the human organs contain the identical genomic composition, the actual protein expressed in various organs are completely different. Ideally, a technology enables to quantify the entire proteins in a cell could excel the progress of Life Science significantly; however, we are far from to achieving it. Here, in this blog, we test one popular RNA-Seq data analysis pipeline known as Tuxedo pipeline. The Tuxedo suite offers a set of tools for analyzing a variety of RNA-Seq data, including short-read mapping, identification of splice junctions, transcript and isoform detection, differential expression, visualizations, and quality control metrics.
    A typical RNA-Seq data set consists of multiple samples as shown in Figure 1. Although the number of sample sets depends on the biological experimental designs, two sets of samples are used to make comparisons between normal vs. cancer samples or untreated vs. treated samples, for examples.

     Figure 1 Tested Tuxedo pipeline workflow

    Figure 1 Tested Tuxedo pipeline workflow

    All the samples are aligned individually in Step 1. In this pipeline, the Tophat process uses Bowtie 2 version 2.3.1 as an underlying short sequence read aligner. Step 3, Cuffmerge job has a dependency from all the previous jobs in Step 2. The results from Cufflinks jobs are collected at this step to merge together multiple Cufflinks assemblies which is required for Cuffdiff step. Cuffmerge also runs Cuffcompare in the background and automatically filters out transcription artifacts. Cuffnorm generates tables of expression values that are properly normalized for library size, and these tables can be used for other statistical analysis instead of CummeRbund. At Step 5, CummeRbund step is set to generate three plots, gene density, gene box and volcano plots by using R script.
    A performance study of RNA-Seq pipeline is not trivial because the nature workflow requires non-identical input files. 185 RNA-Seq paired-end read data are collected from public data repositories. All the read data files contain around 25 Million Fragments (MF)[1] and have similar read lengths. The samples for a test randomly selected from the pool of 185 paired-end read files. Although these randomly selected data will not have any biological meaning, certainly these data will put the tests on the worst-case scenario with very high level of noise.
    The test cluster configurations are summarized in Table 1.

    Table 1 Test cluster configurations
    8x Dell EMC PowerEdge C6420
    CPU 2x Xeon® Gold 6148 20c 2.4GHz (Skylake)
    RAM 12x 16GB @2666 MHz
    OS RHEL 7.4
    Interconnect Intel® Omni-Path
    BIOS System Profile Performance Optimized
    Logical Processor Disabled
    Virtualization Technology Disabled

    The test clusters and H600 storage system was connected via 4 x 100GbE links between two Dell Networking Z9100-ON switches. Each compute node was connected to the test cluster side Dell Networking Z9100-ON switch via single 10GbE. Four storage nodes in Dell EMC Isilon H600 was connected to the other switch via 4x 40GbE links. The configuration of the storage is listed in Table 2.

    Table 2 Storage configurations
    Dell EMC Isilon H600
    Number of nodes 4
    CPU per node Intel® Xeon™ CPU E5-2680 v4 @2.40GHz
    Memory per node 256GB
    Storage Capacity Total usable space: 126.8 TB, 31.7 TB per node
    SSD L3 Cache 2.9 TB per node
    Network

    Front end network: 40GbE, Back end network: IB QDR

    OS Isilon OneFS v8.1.0.0 B_8_1_0_011

    Performance Evaluation


    Two sample Test – Bare-minimum

    DEG analysis requires at least two samples. In Figure 2, each step described in Figure 1 is submitted to Slurm job scheduler with proper dependencies. For example, Cuffmerge step must wait for all the Cufflinks jobs are completed. Two samples, let’s imagine one normal and one treated sample, begin with Tophat step individually and followed by Cufflinks step. Upon the completion of all the Cufflinks steps, Cuffmerge aggregates gene expressions in the entire samples provided. Then, subsequent steps, Cuffdiff and Cuffnorm begin. The output of Cuffnorm can be used for other statistical analysis. Cuffdiff steps generates gene expression differences at the gene level as well as isoformer level. CummeRbund step uses R-package CummeRbund to visualize the results as shown in Figure 3. The total runtime[2]  with 38 cores and two PowerEdge C6420s is 3.15 hours.

    Figure 2 Tuxedo pipeline with two samples

    Figure 3 shows differentially expressed genes in red with significantly lower p-values (Y-axis) compared to other gene expressions illustrated in black. X-axis is fold changes in log base of 2, and these fold changes of each genes are plotted against p-values. More samples will bring a better gene expression estimation. The right upper plot are gene expressions in sample 2 in comparisons with sample 1 whereas the left lower plot are gene expressions in sample 1 compared to sample 2. Gene expressions in black dots are not significantly different in both samples.

     

    Figure 3 Volcano plot of the Cuffdiff results

    Throughput Test – Single pipeline with more than two samples - Biological / Technical replicates

    Typical RNA-Seq studies consist of multiple samples, sometime 100s of different samples, normal versus disease or untreated versus treated samples. These samples tend to have high level of noisy due to their biological reasons; hence, the analysis requires vigorous data preprocessing procedure.
    Here, we tested various numbers of samples (all different RNA-Seq data selected from 185 paired-end reads data set) to see how much data can be processed by 8 nodes PowerEdge C6420 cluster. As shown in Figure 4, the runtimes with 2, 4, 8, 16, 32 and 64 samples grow exponentially when the number of samples increases. Cuffmerge step does not slow down as the number of samples grows while Cuffdiff and Cuffnorm steps slow down significantly. Especially, Cuffdiff step becomes a bottle-neck for the pipeline since the running time grows exponentially (Figure 5). Although Cuffnorm’s runtime increases exponentially like Cuffdiff, it is ignorable since Cuffnorm’s runtime is bounded by Cuffdiff’s runtime.

    Figure 4 Runtime and throughput results

    Figure 4 Runtime and throughput results

    Figure 5 Behaviors of Cuffmerge, Cuffdiff and Cuffnorm

    Figure 5 Behaviors of Cuffmerge, Cuffdiff and Cuffnorm

    Conclusion

    The throughput test results show that 8 node PowerEdge C6420s with Isilon H600 can process roughly 1 Billion Fragments which is little more than 32 samples with ~50 million paired reads each (25 MF) through Tuxedo pipeline illustrated in Figure 1.

    Since Tuxedo pipeline is relatively faster than other popular pipelines, it is hard to generalize or utilize these results for sizing a HPC system. However, this provides a good reference point to help designing a right size HPC system.

    Resources

    Internal web page
    External web page

    Contacts

    Americas
    Kihoon Yoon
    Sr. Principal Systems Dev Eng
    Kihoon.Yoon@dell.com
    +1 512 728 4191

    [1] “For RNA sequencing, determining coverage is complicated by the fact that different transcripts are expressed at different levels. This means that more reads will be captured from highly expressed genes, and few reads will be captured from genes expressed at low levels. When planning RNA sequencing experiments, researchers usually think in terms of numbers of millions of reads to be sampled.” – cited from https://www.illumina.com/documents/products/technotes/technote_coverage_calculation.pdf
    [2] Runtime refers Wall-Clock time throughout the blog.


  • Custom Solutions Engineering Blog

    PowerEdge 14th Generation Server System Profile Performance Comparison

    Written by Bruce Wagner

    This white paper compares the compute throughput and energy consumption of
    the user-selectable system power profiles available on the 14th generation
    PowerEdge R740 2U/2S rack server 

  • Dell TechCenter

    Delivering high density NVMe based storage systems

    Shane Kavanagh – Member Technical Staff, ESI Architecture, Dell EMC

    For years now the tech industry has been talking about the “data tsunami” – the ongoing trend for increasing amounts of data that need to be stored and analyzed. This is primarily driven by the explosion in active connected devices and the desire to use the data they collect to provide better services (i.e., more efficient homes, smart cities, self-driving vehicles). While this trend has been going on for quite some time, its end is nowhere in sight.

     

    man in lab

    In response to this pressing market need, the industry is delivering ever-increasing storage density to match the growth in generated data. But more than just density is at play. As data centers apply these solutions at scale, cost becomes a limiting concern, and as ever, performance is always a consideration. So, the challenge is really to provide greater density at lower cost and acceptable performance.


    The density challenge

    The storage density challenge for the industry is to try to deliver one petabyte of storage into a 1U form factor. There are multiple ways that this can be achieved. One approach is to use 10 x 128 TB U.2 SSD devices. But at today’s prices that would be cost prohibitive. You could consider using a custom form factor in your solution, but this makes it difficult to leverage cost and supply benefits of the high volume offerings in the market and requires changes to platform designs.

     

    In response to the cost challenge of deploying large SSDs, an innovative approach that the Dell EMC Extreme Scale Infrastructure group is employing with a select group of customers is to use smaller, relatively low capacity, lower cost SSDs in standard form factors (i.e., M.2 devices) integrated in proven platforms (i.e., PowerEdge C4140). This allows us to provide highly dense NVMe based systems at costs approaching today’s SATA SSD cost points – and this approach has the added benefit of a more granular failure domain.

    The solution we are exploring with these customers delivers M.2 devices on a PCIe card that conforms to a standard GPU adapter size, making it easy to plug into existing platforms that accommodate GPUs. (See example illustration.)

    Wrigley card

    One of the keys to success with this approach is the inclusion of a high performance PCIe switch that fans out the PCIe lanes among the M.2 devices.

     

    At today’s M.2 capacities this results in almost 100 TB per card, but note the capacities for M.2 devices are about to double in the next year – allowing the card to approach almost 200 TB of capacity. Once this higher capacity is reached, placing four of these cards in a PowerEdge C4140 provides in excess of half a petabyte, and as M.2 capacities grow, this design readily scales beyond one petabyte in 1U.

    Performance
    Keep in mind, while this dense storage capacity is being delivered at SATA level costs, it is also significantly faster. Because we are delivering SSDs using the NVMe interface, the system will have performance levels well in excess of those available with SATA in an equivalent 1U system.


    When delivered with a bandwidth optimized system, like a PowerEdge C4140, and paired with two 100 Gb NICs, this solution can deliver 200 Gb of bandwidth in a 1U form factor. So, in just 5U that quickly adds up to 1Tb of throughput and millions of IOPs – along with more than 1.5 petabytes of storage!  (Readily scaling to 3 PB when the M.2 capacities double!)

                                                             4 high density NAND modules in a C4140

    Data Driven Workloads

    This high density all-flash solution is ideal for handling the sustained ingest of massive amounts of data, for example, as a front end in an edge computing architecture. It can work in conjunction with a Machine Learning backend, or any number of IoT functions that require large amounts of data to feed real-time analytics, like self-driving vehicles, satellite imagery, and weather telemetry.

    Conclusion

    Impressive storage density, extremely high bandwidth, and easy, technology-paced scalability - Dell EMC can offer large scale customers innovative, all flash solutions for their toughest data challenges. Inquiries about Extreme Scale Infrastructure solutions can be made at ESI@dell.com

     


     

  • Dell TechCenter

    Dell EMC PowerEdge Servers Certified for VMware vSphere 6.7

    This blog post is written by Revathi A from Dell Hypervisor Engg. team

    • VMware vSphere 6.7 is a next major release from VMware which has been certified on DellEMC PowerEdge (PE) servers. Following are some of the highlights of vSphere 6.7 release on Dell EMC PowerEdge servers.
    • DellEMC has certified 68 servers that includes 300+ configurations (Server & Feature certifications) for vSphere 6.7 on VMware vSphere 6.7 GA. Refer to VMware HCL for complete list of VMware certified dell servers.
    • VMware introduced support for TPM 2.0 in vSphere 6.7. This feature is certified and supported only on 13th & 14th generation of DellEMC PowerEdge servers.

                     NOTE: This list gets updated with new DellEMC Platforms launch. We recommend to review VMware HCL page for updates.

    • vSphere 6.7 supports maximum physical memory up to 12 Tera Bytes and PowerEdge R830 & R940 models of servers are certified for max memory of 6TB.
    • VMware vDGA (virtual Direct Graphics Acceleration) is a feature offered by VMware and currently enabled with NVIDIA & AMD GPUs. This feature is certified on PowerEdge R740, R730 & T640 server models. Refer VMware HCL for vDGA.
    • For a list of Certified SD cards on Dell Servers, refer to Table-2 in guide “VMware vSphere 6.7 on Dell PowerEdge Servers Compatibility Matrix” available at Dell support site.