BladeBit Version: v3.0.0-alpha4
August 28, 2023
|Initial Version v1.0 (August 28, 2023)
Chia emerged in 2021 with a focus on being a power-efficient crypto-currency that leveraged hard drive space as opposed to heavy computations done by GPUs or ASICs. Chia farmers created plots ("plotting") and kept them online to potentially earn crypto-currency ("farming"). Many Chia farmers optimized their operations for power-efficiency, with some operating less powerful equipment that also took modest amounts of electricity. In the early days of Chia, profits could also benefit from growing space more rapidly than other farms by having a proportionally large share of the small blockchain network storage space. However, with the growth of the total net space, total farm size and power efficiency are key factors for profits.
Today in 2023, the situation has changed significantly from the early days of Chia. Power costs have risen in many parts of the world and Chia's crypto price has dropped from 2021. However, for those dedicated farmers a new consideration has emerged: plot compression. Several flavors of plot compression (e.g. MadMax Gigahorse, NoSSD, BladeBit CUDA) allow farmers to store more plots in the same amount of storage space. However, the trade-off is increased GPU or CPU utilization resulting in higher energy costs. Chia farmers understandably have a variety of questions before decided to create new compressed plots or stay with the existing uncompressed plots.
This article focuses on analysis of BladeBit CUDA compression. As of this writing, BladeBit CUDA development is still in progress so it is possible that upcoming versions may have a different performance profile. The tested version of the software supports a variety of compression settings (Compression levels 1 through 9) and can also generate uncompressed plots (Compression level of 0). Note: Compression level 8 is not supported in this version and compression level 9 plot sizes are not finalized.
In this article, we examine the performance of BladeBit CUDA to answer several key questions:
How long does it take to create BladeBit compressed plots using a GPU?
What is the GPU power consumption required to create varying levels of compressed plots?
What are the achieved compressed plot sizes for varying levels of compression?
For CUDA farming, how much CPU and GPU usage are required to farm plots?
How much benefit is there with a higher-end CPU than commodity desktop CPUS for farming compressed plots without a GPU?
Is a GPU absolutely required for high compression levels or can a powerful CPU be utilized to farm highly compressed plots?
What are the power consumption trade-offs of CPU vs GPU farming?
For these tests, we survey several systems each with different CPUs and GPUs to evaluate the relative merits of various CPU and GPUs as they perform compressed farming. We use one workstation (Dell Precision 5860 with Xeon W5-2455X CPU), and three desktop platforms for comparison that use the Core i9 12900k, Core i9 11900k, and Core i9 9900k, respectively.
For our workstation configuration, we use the new Dell Precision 5860 workstation that was released in April 2023. This new single-socket workstation uses Intel's 4th Generation Scalable Xeon processor ("Sapphire Rapids") and has eight registered ECC DDR5 memory slots. The Dell Precision 5860 is orderable with the W-24XX series CPU (4 channel memory, up to 24 cores/48 threads, 64 PCIe 5 lanes). A higher end Xeon W-34XX is available with the larger Dell Precision 7960 model. We added 256 GB of after-market DDR5, NVME drives, and replaced the original Nvidia T400 4 GB card with an RTX 3070.
The Precision 5860 can readily be configured with 256 GB or more RAM which meets/exceeds the amount of memory required for BladeBit CUDA. They also include 10 Gbit ethernet which can readily copy out generated BladeBit plots to remote storage. In a distributed Chia setup with a plotter and multiple other nodes with drives, reducing the time required to transfer plots to other systems can free up the plotter to more quickly resume the plotting work. For desktop platforms, 10 Gbit PCIe-based ethernet devices can be readily added at additional cost. As the Precision 5860 is a tower-based system, it is easier to add a commodity RTX GPU for CUDA.
We also measured BladeBit CUDA performance on desktop-class systems to evaluate their applicability for compressed CPU farming. While creating BladeBit CUDA compressed plots with the tested version of BladeBit CUDA requires a GPU and 256 GB RAM, farming the plots can be optionally done without a GPU and reduced memory.
Hardware Configuration Summary
|Core i9 12900k System
|Core i9 11900k System
|Core i9 9900k System
|Intel Xeon W5-2455X (30 MB L3, 12 cores / 24 threads, 200 W)
|Intel Core i9 12900k (30 MB L3, 8 Performance Cores/16 threads, 8 Efficient Cores with no HT)
|Intel Core i9 11900k (16 MB L3, 8 cores / 16 threads, 125 W)
|Intel Core i9-9900k (16 MB cache, 8 cores / 16 threads, 95W)
|256 GB (4x64GB) Kingston DDR5 Registered ECC (KSM48R40BD4TMM-64HMR)
|128 GB DDR4 (4x32 GB)
|32 GB (2x16GB)
|64 GB (2x32GB)
|RTX 3070 FE (900-1G142-2510-000)
We used the Alpha build of BladeBit CUDA (v3.0.0-alpha4) downloaded on June 23, 2023 from: https://download.chia.net/BladeBit/alpha4.3/BladeBit-cuda-plotter/DEB/BladeBit-cuda-v3.0.0-alpha4-ubuntu-x86-64.tar.gz .
We encountered errors such as
Error 1 while fetching proof for F7 2818314911 during use of Beta versions so this analysis focuses on the Alpha 4 version. Since the writing of this whitepaper, newer versions of BladeBit have begun arriving. We hope to test newer versions as possible in the future.
|Dell Precision 5860
|Core i9 12900k
|Core i9 11900k
|Core i9 9900k
|Ubuntu 22.04.2 LTS
|Ubuntu 20.04.6 LTS
|Ubuntu 22.04.2 LTS
|Ubuntu 20.04.4 LTS
BladeBit CUDA has several requirements:
256 GB of system RAM. We observed CUDA memory allocation errors when the amount of system RAM was below this amount.
NVIDIA GPU with at least 8 GB of VRAM and CUDA Compute Capability version 5.2.
On some (but not all) of our Ubuntu 20.04 systems, we encountered GLIBC mismatches, however, the BladeBit CUDA binaries worked on the Ubuntu 22.04 systems that we tried and we identified a BladeBit binary that worked on Ubuntu 20 as well.
To examine the CPU and DRAM power consumption, we gathered CPU and DRAM measurements from Intel's RAPL (Running Average Power Limit) capability. We queried the
MSR_PKG_ENERGY_STATUS MSR for the CPU package power consumption data and the
MSR_DRAM_ENERGY_STATUS MSR for the DRAM power consumption data. These MSRs consist of running 32 bit counters that show the total power consumption at a given period of time. We take a sample just before starting the simulation and one just after the simulation concludes. By subtracting the starting value from the ending value (and handling any 32bit overflow, if needed), the counters show the power consumption in joules ("Delta"). We then convert this to watts/second using
Delta *(2.3/10000000000)*WORKLOAD_SECONDS, similar to the method described at https://lwn.net/Articles/569674/, although with a change for an apparent misplaced parenthesis in the original write-up. Our 11900k system only supported the package power RAPL capability and not the RAPL DRAM power monitoring capability.
We also leverage
sar to gather total CPU and memory utilization values.
For GPU power measurement, we took 1 second samples of
nvidia-smi which reports the watts consumed at that point in time. (Other sources of power consumption, e.g. drives/other peripherals are not included in our numbers.)
On the next page, we examine the performance and power consumption of BladeBit CUDA GPU plotting.