# damocles-worker configuration guide

# Resource consumption at each stage of sealing job (32 GiB sector)

Exmaple computer specs:

  • CPU: AMD EPYC 7642 (max MHz: 2300)
  • GPU: RTX 3080
  • Memory: DIMM DDR4 Synchronous Registered (Buffered) 2933 MHz (0.3 ns)
Stage Concurrency Duration RAM CPU GPU DiskIO read DiskIO wirte Remark
WindowPoSt 1 ~4-10mins ~120GiB RAYON_NUM_THREADS*100% TODO -
WinningPoSt 1 ~1-10s - - - - -
AddPieces 1 ~3mins ~210MiB RAYON_NUM_THREADS*100% x <=32GiB 32GiB
TreeD 1 ~1min ~47GiB RAYON_NUM_THREADS*100% x 32GiB 64GiB 64 GiB sector ~60GiB RAM
PC1 1 ~177mins <=64GiB 150% x - 352GiB
PC2 1 ~10-13mins ~64GiB RAYON_NUM_THREADS*100% 384GiB ~37GiB
SupraPC2 1 ~2-5mins ~40GiB ~400% 384GiB ~37GiB
WaitSeed - 75mins - - x - -
C1 1 ~1s - - x - -
C2 1 ~13-16mins TODO RAYON_NUM_THREADS*100% - -
SupraC2 1 ~3-5mins 128 GiB RAYON_NUM_THREADS*100% - -
SnapEncode 1 ~3-5mins TODO RAYON_NUM_THREADS*100% ~32GiB(NFS) ~32GiB(NFS) It is recommended to run multiple SnapEncode tasks concurrently on each GPU to enable concurrent I/O and reduce GPU idle time
SnapProve 1 ~3-5mins TODO RAYON_NUM_THREADS*100% - -
SupraSnapProve 1 ~3-5mins TODO RAYON_NUM_THREADS*100% ~32GiB(NFS) ~32GiB(NFS)
Unseal 1 TODO TODO TODO x TODO TODO

Note: The RAYON_NUM_THREADS environment variable is used to configure the number of threads used by tasks, and it defaults to the number of CPU cores.

# damocles-worker-util documentation

damocles-worker-util contains a set of damocles-worker related utilities. These include:

  • hwinfo (Hardware information)
  • sealcalc (Sealing calculator)

# hwinfo

hwinfo displays hardware information so that we can configure damocles-worker accordingly given the output of hardware information and use them more effectively.

The information currently available from hwinfo is as follows:

  • CPU topology (including the number of CPU cores, NUMA Memory Node, CPU Cache, etc.)
  • Disk information
  • GPU information
  • Memory information

Parameter Description:

damocles-worker-util-hwinfo
// Display hardware information

USAGE:
    damocles-worker-util hwinfo [OPTIONS]

OPTIONS:
        --full  // display full CPU topology information
    -h, --help 	// print help information

# hwinfo dependency installation

  • hwloc 2.x is used to get CPU topology information
  • OpenCL is used to get GPU information
# hwloc 2.x installation
# On Ubuntu 20.04 or later, it can be installed directly using apt
 apt install hwloc=2.\*
# Source installation:
# Install necessary tools.
apt install -y wget make gcc
# Download hwloc-2.7.1.tar.gz
wget https://download.open-mpi.org/release/hwloc/v2.7/hwloc-2.7.1.tar.gz

tar -zxpf hwloc-2.7.1.tar.gz
cd hwloc-2.7.1
./configure --prefix=/usr/local
make -j$(nproc)
sudo make install
ldconfig /usr/local/lib
# OpenCL installation
apt install ocl-icd-opencl-dev

# hwinfo example

Example run on a machine with 2 32-core CPUs:

damocles-worker-util hwinfo

output:

CPU topology:
Machine (503.55 GiB)
├── Package (251.57 GiB) (*** *** *** 32-Core Processor)
│ ├── NUMANode (#0 251.57 GiB)
│ ├── L3 (#0 16 MiB)
│ │ └── PU #0 + PU #1 + PU #2 + PU #3
│ ├── L3 (#1 16 MiB)
│ │ └── PU #4 + PU #5 + PU #6 + PU #7
│ ├── L3 (#2 16 MiB)
│ │ └── PU #8 + PU #9 + PU #10 + PU #11
│ ├── L3 (#3 16 MiB)
│ │ └── PU #12 + PU #13 + PU #14 + PU #15
│ ├── L3 (#4 16 MiB)
│ │ └── PU #16 + PU #17 + PU #18 + PU #19
│ ├── L3 (#5 16 MiB)
│ │ └── PU #20 + PU #21 + PU #22 + PU #23
│ ├── L3 (#6 16 MiB)
│ │ └── PU #24 + PU #25 + PU #26 + PU #27
│ └── L3 (#7 16 MiB)
│ └── PU #28 + PU #29 + PU #30 + PU #31
└── Package (251.98 GiB) (*** *** *** 32-Core Processor)
    ├── NUMANode (#1 251.98 GiB)
    ├── L3 (#8 16 MiB)
    │ └── PU #32 + PU #33 + PU #34 + PU #35
    ├── L3 (#9 16 MiB)
    │ └── PU #36 + PU #37 + PU #38 + PU #39
    ├── L3 (#10 16 MiB)
    │ └── PU #40 + PU #41 + PU #42 + PU #43
    ├── L3 (#11 16 MiB)
    │ └── PU #44 + PU #45 + PU #46 + PU #47
    ├── L3 (#12 16 MiB)
    │ └── PU #48 + PU #49 + PU #50 + PU #51
    ├── L3 (#13 16 MiB)
    │ └── PU #52 + PU #53 + PU #54 + PU #55
    ├── L3 (#14 16 MiB)
    │ └── PU #56 + PU #57 + PU #58 + PU #59
    └── L3 (#15 16 MiB)
        └── PU #60 + PU #61 + PU #62 + PU #63

Disks:
╭───────────┬─────────────┬─────────────┬────────────┬───────────────────────────────────────╮
│ Disk type │ Device name │ Mount point │ Filesystem │                 Space                 │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD       │ /dev/sda3   │ /           │ ext4       │ 346.87 GiB / 434.68 GiB (79.80% used) │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD       │ /dev/sda2   │ /boot       │ ext4       │ 675.00 MiB / 3.87 GiB (17.01% used)   │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD       │ /dev/md127  │ /mnt/mount  │ ext4       │ 4.83 TiB / 13.86 TiB (34.86% used)    │
╰───────────┴─────────────┴─────────────┴────────────┴───────────────────────────────────────╯

GPU:
╭─────────────────────────┬────────┬───────────╮
│           Name          │ Vendor │   Memory  │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB  │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB  │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB  │
╰─────────────────────────┴────────┴───────────╯



Memory:
╭──────────────┬───────────────────┬────────────┬─────────────╮
│ Total memory │    Used memory    │ Total swap │  Used swap  │
├──────────────┼───────────────────┼────────────┼─────────────┤
│ 515.63 GiB   │ 33.51 GiB (6.50%) │ 0 B        │ 0 B (0.00%) │
╰──────────────┴───────────────────┴────────────┴─────────────╯

From the output CPU topology information, this machine has two NUMANodes:

  1. CPU set of NUMANode #0: 0-31
  2. CPU set of NUMANode #1: 32-63

We can modify the external processor configuration group in the damocles-worker configuration file ([[processors.{stage_name}]]).

Through cgroup.cpuset + numa_preferred configuration items, the external processor is restricted to only use the CPU in the specified NUMANode, and memory will also be allocated from the said NUMANode first, thereby improving the CPU efficiency (from v0.5.0 or later, damocles supports loading NUMA-affinity hugepage memory files; if this feature is enabled, cpuset can be allocated across NUMA nodes without performance impact).

example:

# damocles-worker.toml

[[processors.{stage_name}]]
numa_preferred = 0
cgroup.cpuset = "0-3"
# ...

[[processors.{stage_name}]]
numa_preferred = 1
cgroup.cpuset = "32-35"
# ...

# sealcalc

Given fixed parameters, sealcalc computes the running status of tasks in each stage during respective time period to maximize the sealing efficiency by adjusting the maximum concurrent number of tasks and sealing_threads.

Parameter Description:

USAGE:
    damocles-worker-util sealcalc [OPTIONS] --tree_d_mins <tree_d_mins> --tree_d_concurrent <tree_d_concurrent> --pc1_mins <pc1_mins> --pc1_concurrent <pc1_concurrent> --pc2_mins <pc2_mins> --pc2_concurrent <pc2_concurrent> --c2_mins <c2_mins > --c2_concurrent <c2_concurrent> --sealing_threads <sealing_threads>

OPTIONS:
        --c2_concurrent <c2_concurrent>                Specifies the maximum number of concurrent c2 tasks
        --c2_mins <c2_mins>                            Specifies the time it takes to execute one c2 task, unit: minutes
        --calculate_days <calculate_days>              Calculate the total duration, unit: days [default: 30]
        --calculate_step_mins <calculate_step_mins>    Output step duration, unit: minutes [default: 60], if this value is 60, each line of results will be separated by 1 hour
        --csv                                          Output results in csv format
 -h, --help                                            Print help information
        --pc1_concurrent <pc1_concurrent>              Specifies the maximum number of concurrent pc1 tasks
        --pc1_mins <pc1_mins>                          Specifies the time it takes to execute one pc1 task, unit: minutes
        --pc2_concurrent <pc2_concurrent>              Specifies the maximum number of concurrent pc2 tasks
        --pc2_mins <pc2_mins>                          Specifies the time it takes to execute one pc2 task, unit: minutes
        --sealing_threads <sealing_threads>            Specify the number of sealing_threads
        --seed_mins <seed_mins>                        Specifies the time to wait for the seed, unit: minutes [default: 80]
        --tree_d_concurrent <tree_d_concurrent>        Specifies the maximum number of concurrent tree_d tasks
        --tree_d_mins <tree_d_mins>                    Specify the time it takes to execute one tree_d task, unit: minutes

# sealcalc example:

# Fixed parameters:
  • Time required for tree_d task execution: 10 minutes
  • Time required for pc1 task execution: 320 minutes
  • Time required for pc2 task execution: 25 minutes
  • Time required for c2 task execution: 18 minutes
# Adjustable parameters:
  • sealing_threads number: 18
  • tree_d maximum concurrency: 2
  • pc1 maximum concurrency: 10
  • pc2 maximum concurrency: 5
  • c2 maximum concurrency: 2
damocles-worker-util sealcalc --tree_d_mins=10 --pc1_mins=320 --pc2_mins=1 --c2_mins=2 --tree_d_concurrent=2 --pc1_concurrent=10 --pc2_concurrent=5 --c2_concurrent=2 --sealing_threads= 18

The output is as follows:

┌sealing calculator─────────────────────────────────────────────────────┐
│time    sealing    tree_d      pc1      pc2     wait    c2     finished│
│(mins)  threads    (...)      (...)     (...)   seed   (...)   sectors │
│                                                                       │
│0       2/18        2/2       0/10       0/5      0     0/2      0     │
│60      14/18       2/2       10/10      0/5      0     0/2      0     │
│120     18/18       0/2       10/10      0/5      0     0/2      0     │
│180     18/18       0/2       10/10      0/5      0     0/2      0     │
│240     18/18       0/2       10/10      0/5      0     0/2      0     │
│300     18/18       0/2       10/10      0/5      0     0/2      0     │
│360     18/18       0/2       10/10      2/5      6     0/2      0     │
│420     18/18       2/2       8/10       0/5      8     0/2      2     │
│480     18/18       0/2       10/10      0/5      0     0/2      10    │
│540     18/18       0/2       10/10      0/5      0     0/2      10    │
│600     18/18       0/2       10/10      0/5      0     0/2      10    │
│660     18/18       0/2       10/10      2/5      2     0/2      10    │
│720     18/18       0/2       10/10      0/5      8     0/2      10    │
│780     18/18       0/2       10/10      0/5      2     0/2      18    │
│840     18/18       0/2       10/10      0/5      0     0/2      20    │
│900     18/18       0/2       10/10      0/5      0     0/2      20    │
│960     18/18       0/2       10/10      0/5      0     0/2      20    │
│1020    18/18       0/2       10/10      0/5      8     0/2      20    │
│1080    18/18       2/2       10/10      0/5      4     0/2      26    │
│1140    18/18       0/2       10/10      0/5      2     0/2      28    │
│1200    18/18       0/2       10/10      0/5      0     0/2      30    │
│1260    18/18       0/2       10/10      0/5      0     0/2      30    │
│1320    18/18       0/2       10/10      2/5      6     0/2      30    │
│1380    18/18       2/2       10/10      0/5      6     0/2      32    │
│1440    18/18       0/2       10/10      0/5      2     0/2      38    │
│1500    18/18       0/2       10/10      0/5      0     0/2      40    │
│1560    18/18       0/2       10/10      0/5      0     0/2      40    │
│1620    18/18       0/2       10/10      2/5      2     0/2      40    │
│1680    18/18       0/2       10/10      0/5      8     0/2      40    │
│1740    18/18       0/2       10/10      0/5      2     0/2      48    │
└───────────────────────────────────────────────────────────────────────┘

Arrow keys to turn pages

Description of each column of the output result:

  • time (mins): time, unit: minutes. Each item of data output is the result of running within one step
  • sealing thread (running/total): sealing thread status (running thread/total thread)
  • tree_d (running/total): task status of tree_d stage (number of running tasks/total number of tasks)
  • pc1 (running/total): task status of pc1 stage (number of running tasks/total number of tasks)
  • pc2 (running/total): task status of pc2 stage (number of running tasks/total number of tasks)
  • wait seed: the number of tasks waiting for the seed
  • c2 (running/total): task status of c2 stage (number of running tasks/total number of tasks)
  • finish sector: the sector completed up to this step

We can maximize the sealing efficiency by continuously adjusting the above-mentioned adjustable parameters. These parameters can be used as a reference for the configuration of damocles-worker.