# damocles-worker configuration guide
# Resource consumption at each stage of sealing job (32 GiB sector)
Exmaple computer specs:
- CPU: AMD EPYC 7642 (max MHz: 2300)
- GPU: RTX 3080
- Memory: DIMM DDR4 Synchronous Registered (Buffered) 2933 MHz (0.3 ns)
Stage | Concurrency | Duration | RAM | CPU | GPU | DiskIO read | DiskIO wirte | Remark |
---|---|---|---|---|---|---|---|---|
WindowPoSt | 1 | ~4-10mins | ~120GiB | RAYON_NUM_THREADS *100% | ✓ | TODO | - | |
WinningPoSt | 1 | ~1-10s | - | - | - | - | - | |
AddPieces | 1 | ~3mins | ~210MiB | RAYON_NUM_THREADS *100% | x | <=32GiB | 32GiB | |
TreeD | 1 | ~1min | ~47GiB | RAYON_NUM_THREADS *100% | x | 32GiB | 64GiB | 64 GiB sector ~60GiB RAM |
PC1 | 1 | ~177mins | <=64GiB | 150% | x | - | 352GiB | |
PC2 | 1 | ~10-13mins | ~64GiB | RAYON_NUM_THREADS *100% | ✓ | 384GiB | ~37GiB | |
SupraPC2 | 1 | ~2-5mins | ~40GiB | ~400% | ✓ | 384GiB | ~37GiB | |
WaitSeed | - | 75mins | - | - | x | - | - | |
C1 | 1 | ~1s | - | - | x | - | - | |
C2 | 1 | ~13-16mins | TODO | RAYON_NUM_THREADS *100% | ✓ | - | - | |
SupraC2 | 1 | ~3-5mins | 128 GiB | RAYON_NUM_THREADS *100% | ✓ | - | - | |
SnapEncode | 1 | ~3-5mins | TODO | RAYON_NUM_THREADS *100% | ✓ | ~32GiB(NFS) | ~32GiB(NFS) | It is recommended to run multiple SnapEncode tasks concurrently on each GPU to enable concurrent I/O and reduce GPU idle time |
SnapProve | 1 | ~3-5mins | TODO | RAYON_NUM_THREADS *100% | ✓ | - | - | |
SupraSnapProve | 1 | ~3-5mins | TODO | RAYON_NUM_THREADS *100% | ✓ | ~32GiB(NFS) | ~32GiB(NFS) | |
Unseal | 1 | TODO | TODO | TODO | x | TODO | TODO |
Note: The RAYON_NUM_THREADS environment variable is used to configure the number of threads used by tasks, and it defaults to the number of CPU cores.
# damocles-worker-util
documentation
damocles-worker-util
contains a set of damocles-worker
related utilities. These include:
- hwinfo (Hardware information)
- sealcalc (Sealing calculator)
# hwinfo
hwinfo
displays hardware information so that we can configure damocles-worker
accordingly given the output of hardware information and use them more effectively.
The information currently available from hwinfo is as follows:
- CPU topology (including the number of CPU cores, NUMA Memory Node, CPU Cache, etc.)
- Disk information
- GPU information
- Memory information
Parameter Description:
damocles-worker-util-hwinfo
// Display hardware information
USAGE:
damocles-worker-util hwinfo [OPTIONS]
OPTIONS:
--full // display full CPU topology information
-h, --help // print help information
# hwinfo dependency installation
- hwloc 2.x is used to get CPU topology information
- OpenCL is used to get GPU information
# hwloc 2.x installation
# On Ubuntu 20.04 or later, it can be installed directly using apt
apt install hwloc=2.\*
# Source installation:
# Install necessary tools.
apt install -y wget make gcc
# Download hwloc-2.7.1.tar.gz
wget https://download.open-mpi.org/release/hwloc/v2.7/hwloc-2.7.1.tar.gz
tar -zxpf hwloc-2.7.1.tar.gz
cd hwloc-2.7.1
./configure --prefix=/usr/local
make -j$(nproc)
sudo make install
ldconfig /usr/local/lib
# OpenCL installation
apt install ocl-icd-opencl-dev
# hwinfo example
Example run on a machine with 2 32-core CPUs:
damocles-worker-util hwinfo
output:
CPU topology:
Machine (503.55 GiB)
├── Package (251.57 GiB) (*** *** *** 32-Core Processor)
│ ├── NUMANode (#0 251.57 GiB)
│ ├── L3 (#0 16 MiB)
│ │ └── PU #0 + PU #1 + PU #2 + PU #3
│ ├── L3 (#1 16 MiB)
│ │ └── PU #4 + PU #5 + PU #6 + PU #7
│ ├── L3 (#2 16 MiB)
│ │ └── PU #8 + PU #9 + PU #10 + PU #11
│ ├── L3 (#3 16 MiB)
│ │ └── PU #12 + PU #13 + PU #14 + PU #15
│ ├── L3 (#4 16 MiB)
│ │ └── PU #16 + PU #17 + PU #18 + PU #19
│ ├── L3 (#5 16 MiB)
│ │ └── PU #20 + PU #21 + PU #22 + PU #23
│ ├── L3 (#6 16 MiB)
│ │ └── PU #24 + PU #25 + PU #26 + PU #27
│ └── L3 (#7 16 MiB)
│ └── PU #28 + PU #29 + PU #30 + PU #31
└── Package (251.98 GiB) (*** *** *** 32-Core Processor)
├── NUMANode (#1 251.98 GiB)
├── L3 (#8 16 MiB)
│ └── PU #32 + PU #33 + PU #34 + PU #35
├── L3 (#9 16 MiB)
│ └── PU #36 + PU #37 + PU #38 + PU #39
├── L3 (#10 16 MiB)
│ └── PU #40 + PU #41 + PU #42 + PU #43
├── L3 (#11 16 MiB)
│ └── PU #44 + PU #45 + PU #46 + PU #47
├── L3 (#12 16 MiB)
│ └── PU #48 + PU #49 + PU #50 + PU #51
├── L3 (#13 16 MiB)
│ └── PU #52 + PU #53 + PU #54 + PU #55
├── L3 (#14 16 MiB)
│ └── PU #56 + PU #57 + PU #58 + PU #59
└── L3 (#15 16 MiB)
└── PU #60 + PU #61 + PU #62 + PU #63
Disks:
╭───────────┬─────────────┬─────────────┬────────────┬───────────────────────────────────────╮
│ Disk type │ Device name │ Mount point │ Filesystem │ Space │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD │ /dev/sda3 │ / │ ext4 │ 346.87 GiB / 434.68 GiB (79.80% used) │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD │ /dev/sda2 │ /boot │ ext4 │ 675.00 MiB / 3.87 GiB (17.01% used) │
├───────────┼─────────────┼─────────────┼────────────┼───────────────────────────────────────┤
│ SSD │ /dev/md127 │ /mnt/mount │ ext4 │ 4.83 TiB / 13.86 TiB (34.86% used) │
╰───────────┴─────────────┴─────────────┴────────────┴───────────────────────────────────────╯
GPU:
╭─────────────────────────┬────────┬───────────╮
│ Name │ Vendor │ Memory │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB │
├─────────────────────────┼────────┼───────────┤
│ NVIDIA GeForce RTX 3080 │ NVIDIA │ 9.78 GiB │
╰─────────────────────────┴────────┴───────────╯
Memory:
╭──────────────┬───────────────────┬────────────┬─────────────╮
│ Total memory │ Used memory │ Total swap │ Used swap │
├──────────────┼───────────────────┼────────────┼─────────────┤
│ 515.63 GiB │ 33.51 GiB (6.50%) │ 0 B │ 0 B (0.00%) │
╰──────────────┴───────────────────┴────────────┴─────────────╯
From the output CPU topology information, this machine has two NUMANodes:
- CPU set of NUMANode #0: 0-31
- CPU set of NUMANode #1: 32-63
We can modify the external processor
configuration group in the damocles-worker
configuration file ([[processors.{stage_name}]]).
Through cgroup.cpuset
+ numa_preferred
configuration items, the external processor
is restricted to only use the CPU in the specified NUMANode, and memory will also be allocated from the said NUMANode first, thereby improving the CPU efficiency (from v0.5.0 or later, damocles supports loading NUMA-affinity hugepage memory files; if this feature is enabled, cpuset can be allocated across NUMA nodes without performance impact).
example:
# damocles-worker.toml
[[processors.{stage_name}]]
numa_preferred = 0
cgroup.cpuset = "0-3"
# ...
[[processors.{stage_name}]]
numa_preferred = 1
cgroup.cpuset = "32-35"
# ...
# sealcalc
Given fixed parameters, sealcalc
computes the running status of tasks in each stage during respective time period to maximize the sealing efficiency by adjusting the maximum concurrent number of tasks and sealing_threads
.
Parameter Description:
USAGE:
damocles-worker-util sealcalc [OPTIONS] --tree_d_mins <tree_d_mins> --tree_d_concurrent <tree_d_concurrent> --pc1_mins <pc1_mins> --pc1_concurrent <pc1_concurrent> --pc2_mins <pc2_mins> --pc2_concurrent <pc2_concurrent> --c2_mins <c2_mins > --c2_concurrent <c2_concurrent> --sealing_threads <sealing_threads>
OPTIONS:
--c2_concurrent <c2_concurrent> Specifies the maximum number of concurrent c2 tasks
--c2_mins <c2_mins> Specifies the time it takes to execute one c2 task, unit: minutes
--calculate_days <calculate_days> Calculate the total duration, unit: days [default: 30]
--calculate_step_mins <calculate_step_mins> Output step duration, unit: minutes [default: 60], if this value is 60, each line of results will be separated by 1 hour
--csv Output results in csv format
-h, --help Print help information
--pc1_concurrent <pc1_concurrent> Specifies the maximum number of concurrent pc1 tasks
--pc1_mins <pc1_mins> Specifies the time it takes to execute one pc1 task, unit: minutes
--pc2_concurrent <pc2_concurrent> Specifies the maximum number of concurrent pc2 tasks
--pc2_mins <pc2_mins> Specifies the time it takes to execute one pc2 task, unit: minutes
--sealing_threads <sealing_threads> Specify the number of sealing_threads
--seed_mins <seed_mins> Specifies the time to wait for the seed, unit: minutes [default: 80]
--tree_d_concurrent <tree_d_concurrent> Specifies the maximum number of concurrent tree_d tasks
--tree_d_mins <tree_d_mins> Specify the time it takes to execute one tree_d task, unit: minutes
# sealcalc example:
# Fixed parameters:
- Time required for tree_d task execution: 10 minutes
- Time required for pc1 task execution: 320 minutes
- Time required for pc2 task execution: 25 minutes
- Time required for c2 task execution: 18 minutes
# Adjustable parameters:
- sealing_threads number: 18
- tree_d maximum concurrency: 2
- pc1 maximum concurrency: 10
- pc2 maximum concurrency: 5
- c2 maximum concurrency: 2
damocles-worker-util sealcalc --tree_d_mins=10 --pc1_mins=320 --pc2_mins=1 --c2_mins=2 --tree_d_concurrent=2 --pc1_concurrent=10 --pc2_concurrent=5 --c2_concurrent=2 --sealing_threads= 18
The output is as follows:
┌sealing calculator─────────────────────────────────────────────────────┐
│time sealing tree_d pc1 pc2 wait c2 finished│
│(mins) threads (...) (...) (...) seed (...) sectors │
│ │
│0 2/18 2/2 0/10 0/5 0 0/2 0 │
│60 14/18 2/2 10/10 0/5 0 0/2 0 │
│120 18/18 0/2 10/10 0/5 0 0/2 0 │
│180 18/18 0/2 10/10 0/5 0 0/2 0 │
│240 18/18 0/2 10/10 0/5 0 0/2 0 │
│300 18/18 0/2 10/10 0/5 0 0/2 0 │
│360 18/18 0/2 10/10 2/5 6 0/2 0 │
│420 18/18 2/2 8/10 0/5 8 0/2 2 │
│480 18/18 0/2 10/10 0/5 0 0/2 10 │
│540 18/18 0/2 10/10 0/5 0 0/2 10 │
│600 18/18 0/2 10/10 0/5 0 0/2 10 │
│660 18/18 0/2 10/10 2/5 2 0/2 10 │
│720 18/18 0/2 10/10 0/5 8 0/2 10 │
│780 18/18 0/2 10/10 0/5 2 0/2 18 │
│840 18/18 0/2 10/10 0/5 0 0/2 20 │
│900 18/18 0/2 10/10 0/5 0 0/2 20 │
│960 18/18 0/2 10/10 0/5 0 0/2 20 │
│1020 18/18 0/2 10/10 0/5 8 0/2 20 │
│1080 18/18 2/2 10/10 0/5 4 0/2 26 │
│1140 18/18 0/2 10/10 0/5 2 0/2 28 │
│1200 18/18 0/2 10/10 0/5 0 0/2 30 │
│1260 18/18 0/2 10/10 0/5 0 0/2 30 │
│1320 18/18 0/2 10/10 2/5 6 0/2 30 │
│1380 18/18 2/2 10/10 0/5 6 0/2 32 │
│1440 18/18 0/2 10/10 0/5 2 0/2 38 │
│1500 18/18 0/2 10/10 0/5 0 0/2 40 │
│1560 18/18 0/2 10/10 0/5 0 0/2 40 │
│1620 18/18 0/2 10/10 2/5 2 0/2 40 │
│1680 18/18 0/2 10/10 0/5 8 0/2 40 │
│1740 18/18 0/2 10/10 0/5 2 0/2 48 │
└───────────────────────────────────────────────────────────────────────┘
Arrow keys to turn pages
Description of each column of the output result:
- time (mins): time, unit: minutes. Each item of data output is the result of running within one step
- sealing thread (running/total): sealing thread status (running thread/total thread)
- tree_d (running/total): task status of tree_d stage (number of running tasks/total number of tasks)
- pc1 (running/total): task status of pc1 stage (number of running tasks/total number of tasks)
- pc2 (running/total): task status of pc2 stage (number of running tasks/total number of tasks)
- wait seed: the number of tasks waiting for the seed
- c2 (running/total): task status of c2 stage (number of running tasks/total number of tasks)
- finish sector: the sector completed up to this step
We can maximize the sealing efficiency by continuously adjusting the above-mentioned adjustable parameters. These parameters can be used as a reference for the configuration of damocles-worker
.