BladeBit CUDA Performance Analysis

BladeBit Version: v3.0.0-alpha4

Science of Mining Report v1.0

August 28, 2023

Version History	Initial Version v1.0 (August 28, 2023)

1. Overview

Chia emerged in 2021 with a focus on being a power-efficient crypto-currency that leveraged hard drive space as opposed to heavy computations done by GPUs or ASICs. Chia farmers created plots ("plotting") and kept them online to potentially earn crypto-currency ("farming"). Many Chia farmers optimized their operations for power-efficiency, with some operating less powerful equipment that also took modest amounts of electricity. In the early days of Chia, profits could also benefit from growing space more rapidly than other farms by having a proportionally large share of the small blockchain network storage space. However, with the growth of the total net space, total farm size and power efficiency are key factors for profits.

Today in 2023, the situation has changed significantly from the early days of Chia. Power costs have risen in many parts of the world and Chia's crypto price has dropped from 2021. However, for those dedicated farmers a new consideration has emerged: plot compression. Several flavors of plot compression (e.g. MadMax Gigahorse, NoSSD, BladeBit CUDA) allow farmers to store more plots in the same amount of storage space. However, the trade-off is increased GPU or CPU utilization resulting in higher energy costs. Chia farmers understandably have a variety of questions before decided to create new compressed plots or stay with the existing uncompressed plots.

This article focuses on analysis of BladeBit CUDA compression. As of this writing, BladeBit CUDA development is still in progress so it is possible that upcoming versions may have a different performance profile. The tested version of the software supports a variety of compression settings (Compression levels 1 through 9) and can also generate uncompressed plots (Compression level of 0). Note: Compression level 8 is not supported in this version and compression level 9 plot sizes are not finalized.

In this article, we examine the performance of BladeBit CUDA to answer several key questions:

How long does it take to create BladeBit compressed plots using a GPU?
What is the GPU power consumption required to create varying levels of compressed plots?
What are the achieved compressed plot sizes for varying levels of compression?
For CUDA farming, how much CPU and GPU usage are required to farm plots?
How much benefit is there with a higher-end CPU than commodity desktop CPUS for farming compressed plots without a GPU?
Is a GPU absolutely required for high compression levels or can a powerful CPU be utilized to farm highly compressed plots?
What are the power consumption trade-offs of CPU vs GPU farming?

Hardware Config

For these tests, we survey several systems each with different CPUs and GPUs to evaluate the relative merits of various CPU and GPUs as they perform compressed farming. We use one workstation (Dell Precision 5860 with Xeon W5-2455X CPU), and three desktop platforms for comparison that use the Core i9 12900k, Core i9 11900k, and Core i9 9900k, respectively.

For our workstation configuration, we use the new Dell Precision 5860 workstation that was released in April 2023. This new single-socket workstation uses Intel's 4th Generation Scalable Xeon processor ("Sapphire Rapids") and has eight registered ECC DDR5 memory slots. The Dell Precision 5860 is orderable with the W-24XX series CPU (4 channel memory, up to 24 cores/48 threads, 64 PCIe 5 lanes). A higher end Xeon W-34XX is available with the larger Dell Precision 7960 model. We added 256 GB of after-market DDR5, NVME drives, and replaced the original Nvidia T400 4 GB card with an RTX 3070.

The Precision 5860 can readily be configured with 256 GB or more RAM which meets/exceeds the amount of memory required for BladeBit CUDA. They also include 10 Gbit ethernet which can readily copy out generated BladeBit plots to remote storage. In a distributed Chia setup with a plotter and multiple other nodes with drives, reducing the time required to transfer plots to other systems can free up the plotter to more quickly resume the plotting work. For desktop platforms, 10 Gbit PCIe-based ethernet devices can be readily added at additional cost. As the Precision 5860 is a tower-based system, it is easier to add a commodity RTX GPU for CUDA.

We also measured BladeBit CUDA performance on desktop-class systems to evaluate their applicability for compressed CPU farming. While creating BladeBit CUDA compressed plots with the tested version of BladeBit CUDA requires a GPU and 256 GB RAM, farming the plots can be optionally done without a GPU and reduced memory.

Hardware Configuration Summary

Item	Precision 5860	Core i9 12900k System	Core i9 11900k System	Core i9 9900k System
CPU	Intel Xeon W5-2455X (30 MB L3, 12 cores / 24 threads, 200 W)	Intel Core i9 12900k (30 MB L3, 8 Performance Cores/16 threads, 8 Efficient Cores with no HT)	Intel Core i9 11900k (16 MB L3, 8 cores / 16 threads, 125 W)	Intel Core i9-9900k (16 MB cache, 8 cores / 16 threads, 95W)
Memory	256 GB (4x64GB) Kingston DDR5 Registered ECC (KSM48R40BD4TMM-64HMR)	128 GB DDR4 (4x32 GB)	32 GB (2x16GB)	64 GB (2x32GB)
GPU	RTX 3070 FE (900-1G142-2510-000)	N/A	N/A	RTX 3090

Software Config

We used the Alpha build of BladeBit CUDA (v3.0.0-alpha4) downloaded on June 23, 2023 from: https://download.chia.net/BladeBit/alpha4.3/BladeBit-cuda-plotter/DEB/BladeBit-cuda-v3.0.0-alpha4-ubuntu-x86-64.tar.gz .

We encountered errors such as Error 1 while fetching proof for F7 2818314911 during use of Beta versions so this analysis focuses on the Alpha 4 version. Since the writing of this whitepaper, newer versions of BladeBit have begun arriving. We hope to test newer versions as possible in the future.

Item	Dell Precision 5860	Core i9 12900k	Core i9 11900k	Core i9 9900k
OS	Ubuntu 22.04.2 LTS	Ubuntu 20.04.6 LTS	Ubuntu 22.04.2 LTS	Ubuntu 20.04.4 LTS
Kernel	5.17.0-1033-oem	5.10.157	5.19.0-41-generic	5.15.0-75-generic
Nvidia Driver	530.30.02	N/A	N/A	530.41.03
Nvidia Toolkit	12.1	N/A	N/A	12.1

BladeBit CUDA Requirements

BladeBit CUDA has several requirements:

256 GB of system RAM. We observed CUDA memory allocation errors when the amount of system RAM was below this amount.
NVIDIA GPU with at least 8 GB of VRAM and CUDA Compute Capability version 5.2.

On some (but not all) of our Ubuntu 20.04 systems, we encountered GLIBC mismatches, however, the BladeBit CUDA binaries worked on the Ubuntu 22.04 systems that we tried and we identified a BladeBit binary that worked on Ubuntu 20 as well.


./bladebit_cuda: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./bladebit_cuda)
./bladebit_cuda: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./bladebit_cuda)

Power-Monitoring Methods

CPU/DRAM

To examine the CPU and DRAM power consumption, we gathered CPU and DRAM measurements from Intel's RAPL (Running Average Power Limit) capability. We queried the MSR_PKG_ENERGY_STATUS MSR for the CPU package power consumption data and the MSR_DRAM_ENERGY_STATUS MSR for the DRAM power consumption data. These MSRs consist of running 32 bit counters that show the total power consumption at a given period of time. We take a sample just before starting the simulation and one just after the simulation concludes. By subtracting the starting value from the ending value (and handling any 32bit overflow, if needed), the counters show the power consumption in joules ("Delta"). We then convert this to watts/second using Delta *(2.3/10000000000)*WORKLOAD_SECONDS, similar to the method described at https://lwn.net/Articles/569674/, although with a change for an apparent misplaced parenthesis in the original write-up. Our 11900k system only supported the package power RAPL capability and not the RAPL DRAM power monitoring capability.

We also leverage sar to gather total CPU and memory utilization values.

GPU

For GPU power measurement, we took 1 second samples of nvidia-smi which reports the watts consumed at that point in time. (Other sources of power consumption, e.g. drives/other peripherals are not included in our numbers.)

On the next page, we examine the performance and power consumption of BladeBit CUDA GPU plotting.