Getting Started with Running the BladeBit CUDA Simulation Tool and Gathering Data

Science of Mining Guide v0.91

September 2, 2023

Version History	Changes
Version 0.9	Initial Version
Version 0.91	Fix typos
Version 0.92	Add info on how to use the PCM tool get to RAPL power data for CPU/DRAM

1. Overview

On August 28, 2023, we released a report analyzing the performance and power efficiency of BladeBit CUDA leveraging the BladeBit CUDA simulator. The report provided our data points for the test systems in our lab and serves as some baseline measurements for comparison.

This short guide gives an overview of how to estimate the compression level your CPU/GPU will support for Bladebit CUDA using the simulator and gather some helpful performance diagnostics as well.

We use Ubuntu for this overview.

Prerequisites:

BladeBit CUDA binary
BladeBit CUDA Compressed farm plots available for the compression levels to be tested
For GPU plotting: Nvidia GPU that supports 8 GB VRAM and CUDA Compute Capability 5.2.
For CPU plotting: Sufficient memory, consult Table 1 which is based on our testing.

The number of BladeBit software threads and compression level guides the total memory required. For example, testing C7 farms with 32 threads needs 10.4 GiB of RAM.

Table 1: System Memory Used (GiB), CPU Farming

Threads	C1	C2	C3	C4	C5	C6	C7
16	0.1	0.2	0.3	0.7	1.3	2.6	5.2
24	0.1	0.2	0.5	1.0	2.0	3.9	7.8
32	0.2	0.3	0.7	1.3	2.6	5.2	10.4
48	0.3	0.5	1.0	2.0	3.9	7.8	15.6
96	0.5	1.0	2.0	3.9	7.8	15.6	31.3
144	0.8	1.5	3.0	5.9	11.7	23.5	46.9

2. Sample script to run the simulator

Our method of running testing was to leverage several Bash scripts to automate the data collection and parsing across a wide variety of runs. There are many ways to automate data extraction and parsing, we provide a simplified example in this write-up as an introductory guide. Additional automation and command parameters can be provided to automate further.

First, it's necessary to make a plot for your desired compression level. The command below creates a C7 plot:

./bladebit_cuda -f $FARMER -c $POOL --compress 7 cudaplot /path/to/plot_storage , use your farmer public key for $FARMER and your contract address for $POOL, set the place for the plot storage to replace the default /path/to/plot_storage above)

At a minimum, all that is required to run the simulator is the ./bladebit_cuda command with your chosen options and compressed plot file.

However, if you would, also like to gather some CPU, GPU, memory, disk, etc utilization, the following is a simple script to do some simple data collection along with the run.

Sample simulator run script with basic data collection for sar and nvidia-smi:

runSimulation.sh


x
#!/bin/bash
# DURATION is set to 600 seconds or 10 minutes
# THREADS is the number of CPU threads you want to use. (For GPU plotting, the script below 
# is hard-coded to 1 thread.)
# SIZE is the farm size, e.g. 250TB, 500TB, 750TB, 1PB, 2PB, etc. 
# Change the plotfile to a C1,C2,C3,C4,C5,C6,C7 plot to simulate that type
DURATION=600
THREADS=8
SIZE=500TB
PLOTFILE=plot-k32-c07-2023-08-13-09-01-eaa809914afde2e5c2d0[etc].plot

# For CUDA simulations

# Run sar (prereq: sudo apt-get install sysstat)
sar -A 1 $DURATION -o sar-data.bin > /dev/null 2>&1 &

# Optional: Nvidia GPU stats, run in the background while the benchmark goes (See below for script)
./getGpu.sh&

# Start the CUDA simulation, use 1 thread
./bladebit_cuda  simulate --power $DURATION -p 1  --size $SIZE $PLOTFILE

# Workload done, stop the GPU monitoring. Raw data is in nvidia-log.txt 
pkill getGpu.sh 

# For CPU-based simulations, set --no-cuda in bladebit_cuda syntax
# Run sar (prereq: sudo apt-get install sysstat)
sar -A 1 $DURATION -o sar-data-nocuda.bin > /dev/null 2>&1 &
./bladebit_cuda  simulate --no-cuda --power $DURATION -p $THREADS  --size $SIZE $PLOTFILE

To easily log the run output to a file, one option is to use script such as:


xxxxxxxxxx
script
./runSimulation.sh
[Hit Control D to kill the script, resulting BladeBit CUDA run log file is in a file called: "typescript".] 

To clean up the typescript to remove any extraneous ^M characters, run: "dos2unix -f typescript" (Prereq: sudo apt-get install dos2unix)

If you ssh to the system under test, you may also consider using screen (sudo apt install screen) to avoid issues with ssh sessions getting closed due to network inactivity / connection loss.

Thus, a full run could look like:


xxxxxxxxxx
screen
script
./runSimulation.sh
[Hit Control D when the run is over to capture the output from script]

Full guides on how to use screen are available online however, a few basic useful commands to get started are:


xxxxxxxxxx



(Inside screen)           


Control a c     (Open a new screen window)

Control a "     (Choose which screen window to switch to)
Control a d     (Detatch from your screen window)




(Outside of screen)  


exit            (Close the current screen window)

screen -r       Resume a previous session

screen -dr      Disconnect an active screen session and resume it elsewhere

getGpu.sh (Script to get Nvidia GPU data)





xxxxxxxxxx


#!/bin/bash

# getGpu.sh

# This script runs until it's killed and takes measurements from nvidia-smi, extracts the data lines into an nvidia-log.txt file. Data is collected once a second. 

# Recommend to uncomment the line below to clear out any previous run data
# rm nvidia-log.txt

# (The nvidia-smi command below should be on one line or properly split between lines)
while true;
do
    nvidia-smi --query-gpu = timestamp,temperature.gpu,utilization.gpu, utilization.memory,       memory.total,memory.free,memory.used --format=csv

    sleep 1
done

Sample nvidia-log.txt


xxxxxxxxxx
timestamp, temperature.gpu, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]

2023/09/01 18:51:52.953, 55, 0 %, 0 %, 8192 MiB, 1349 MiB, 6615 MiB
...

RAPL To get more insights into CPU and DRAM power consumption, the Intel Running Average Power Limit ("RAPL") CPU counters can be queried over time to monitor the power consumption of the CPU and memory. One easy way to get this information is to installed the PCM tools from Intel from: https://github.com/intel/pcm


x
# If cmake is not installed, first do: sudo apt-get install cmake
git clone --recursive https://github.com/opcm/pcm.git
cd pcm
mkdir build
cd build
cmake ..
cmake --build . --parallel

# To get the RAPL data
cd bin
sudo ./pcm-power |grep Watts

# Sample data after initial diagnostic printouts:
S0; Consumed energy units: 354349; Consumed Joules: 21.63; Watts: 21.67; Thermal headroom below TjMax: 68
S0; Consumed DRAM energy units: 103613; Consumed DRAM Joules: 6.32; DRAM Watts: 6.34

3. Extracting data from `sar`

The sar utility can provide very detailed information regarding the performance of the system. This section describes a short overview of how to interact with the sar data that's been collected.

At a high level, the method is to log all sar data to one .bin file (e.g. "sar-data.bin" in this example) and then selectively extract portions of interest into separate text files for easier processing.

CPU utilization data:

sar -u -f sar-data.bin > cpu-utilization.txt


xxxxxxxxxx
03:21:47 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:21:48 PM     all     55.34      0.00      0.56     19.55      0.00     24.55
03:21:49 PM     all     23.88      0.00      0.19     30.24      0.00     45.70
03:21:50 PM     all     96.63      0.00      0.25      0.87      0.00      2.25

For easier import into tools like Excel, it can be useful to make these fields comma separated (CSV). This can be readily done by the following line below. The tr command portion will substitute a comma for a space character in the input file.

sar -u -f sar-data.bin > cpu-utilization.txt | tr -s ' ' ',' > cpu-utilization-csv.txt

Many other performance metrics can be extracted from the sar file such as:

Paging data (sar -B -f sar-data.bin)
Block device data for the hard-drives/NVME devices (sar -d -f sar-data.bin)
Memory data (sar -r -f sar-data.bin)
Network device data (sar -n DEV -f sar-data.bin)
Or extract all possible data by doing (sar -A -f sar-data.bin)

The sar data can be readily imported into other tools such as Excel for graphing and additional analysis.

4. Interpreting BladeBit CUDA Results

After the BladeBit CUDA run, the simulator will show results such as below giving results about the run.

Several key fields of interest are the worst plot lookup lookup and Average full proof lookup time .

The current Chia guide suggests keeping the maximum lookup time at around 5 seconds and if the times exceed 10 seconds to get a more power CPU/GPU or use a lesser compression level.

The sample results below are representative of attempting to use a compression and farm size which is too high for the hardware.

The estimated largest farm sizes reported by the tool seem to be larger than the response time data would suggest is reasonable so more investigation may be needed before relying upon that portion of the results.

Sample Bladebit CUDA simulator results:


xxxxxxxxxx
 [Simulator for harvester farm capacity for K32 C7 plots]
 Random seed: 0x1c563ba8c9755e0269e0751810e21065a894269a5e0c964d1157575e0ad00cb2
 Simulating...

 Context count                 : 24
 Thread per context instance   : 0
 Memory used                   : 8007.2MiB ( 7.8GiB )
 Proofs / Challenges           : 1242 / 1536 ( 80.86% )
 Fetches / Challenges          : 785 / 1536
 Filter bits                   : 512
 Effective partials            : 23 ( 2.93% )
 Total fetch time elapsed      : 227.820 seconds
 Average plot lookup time      : 0.290 seconds
 Worst plot lookup lookup time : 128.588 seconds
 Average full proof lookup time: 54.967 seconds
 Fastest full proof lookup time: 14.629 seconds

*** Warning *** : Your worst plot lookup time of 128.588 was over the maximum set of 8.000.
 compression | plot count | size TB    | size PB
------------------------------------------------
 C7          | 14113      | 1182       | 1.18

5. Conclusions

This write-up provided an example of how to run the BladeBit CUDA simulator tool to estimate how well a Chia farmers CPU or GPU could handle a given farm size and compression level. While the simulator tool itself provides useful output data, it is also often helpful to dig deeper into the data, e.g. from sar or nvidia-smi to get additional insights. The sample scripts provided here provide a basic starting place for those who would like to evaluate BladeBit CUDA performance on their CPU/GPU hardware.

Contact info: tech@ [this website address]