# Configurations of damocles-worker

damocles-worker is the main execution body of data sealing. Let's take a look at its configuration file structure and configuration options.

The configuration file of damocles-worker is in toml format. It should be noted that in this format, lines starting with # will be regarded as comments and will not take effect.

Taking a mock instance as an example, a basic configuration might look like this:

[worker]
# name = "worker-#1"
# rpc_server.host = "192.168.1.100"
# rpc_server.port = 17891

[metrics]
#enable = false
#http_listen = "0.0.0.0:9000"

[sector_manager]
rpc_client.addr = "/ip4/127.0.0.1/tcp/1789"
# rpc_client.headers = { User-Agent = "jsonrpc-core-client" }
# piece_token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJuYW1lIjoiMS0xMjUiLCJwZXJtIjoic2lnbiIsImV4dCI6IiJ9.JenwgK0JZcxFDin3cyhBUN41VXNvYw-_0UUT2ZOohM0"

[sealing]
# allowed_miners = [10123, 10124, 10125]
# allowed_sizes = ["32GiB", "64GiB"]
enable_deals = true
# disable_cc = true
# max_deals = 3
# min_deal_space = "8GiB"
max_retries = 3
# seal_interval = "30s"
# recover_interval = "60s"
# rpc_polling_interval = "180s"
# ignore_proof_check = false

[[sealing_thread]]
location = "./mock-tmp/store1"
# plan = "snapup"
# sealing.allowed_miners = [10123, 10124, 10125]
# sealing.allowed_sizes = ["32GiB", "64GiB"]
# sealing.enable_deals = true
# sealing.disable_cc = true
# sealing.max_deals = 3
# sealing.min_deal_space = "8GiB"
# sealing.max_retries = 3
# sealing.seal_interval = "30s"
# sealing.recover_interval = "60s"
# sealing.rpc_polling_interval = "180s"
# sealing.ignore_proof_check = false

[[sealing_thread]]
location = "./mock-tmp/store2"

[[sealing_thread]]
location = "./mock-tmp/store3"

[[attached]]
# name = "persist-store1"
location = "./mock-tmp/remote"

[processors.limitation.concurrent]
# add_pieces = 5
# pc1 = 3
# pc2 = 1
# c2 = 1

[processors.limitation.staggered]
# pc1 = "5min"
# pc2 = "4min"

[processors.ext_locks]
# gpu1 = 1

[processors.static_tree_d]
# 2KiB = "./tmp/2k/sc-02-data-tree-d.dat"

# fields for the add_pieces processor
# [[processors.add_pieces]]

# fields for tree_d processor
[[processors.tree_d]]
# auto_restart = true
# inherit_envs = true

# fields for pc1 processors
[[processors.pc1]]
# bin = "./dist/bin/damocles-worker-plugin-pc1"
# args = ["--args-1", "1", --"args-2", "2"]
numa_preferred = 0
cgroup.cpuset = "4-5"
envs = { RUST_LOG = "info" }
# auto_restart = true
# inherit_envs = true

[[processors.pc1]]
numa_preferred = 0
cgroup.cpuset = "6-7"
# auto_restart = true
# inherit_envs = true

[[processors.pc1]]
numa_preferred = 1
cgroup.cpuset = "12-13"
# auto_restart = true
# inherit_envs = true

# fields for pc2 processors
[[processors.pc2]]
# cgroup.cpuset = "24-27"
# auto_restart = true
# inherit_envs = true

[[processors.pc2]]
cgroup.cpuset = "28-31"
# auto_restart = true
# inherit_envs = true

# fields for c2 processor
[[processors.c2]]
cgroup.cpuset = "32-47"
# auto_restart = true
# inherit_envs = true

Below we will break down the configurable items one by one.

# [worker]

The worker configuration item is used to configure some basic information of this worker instance.

# Basic configuration example

[worker]
# instance name, optional, string type
# By default, the IP address of the network card used to connect to `damocles-manager` is used as the instance name
# name = "worker-#1"

# rpc service listening address, optional, string type
# The default is "0.0.0.0", that is, listening to all addresses of the machine
# rpc_server.host = "192.168.1.100"

# rpc service listening port, optional, number type
# Default is 17890
# rpc_server.port = 17891

# local pieces file directory, optional, string array type
# if set, worker will load the piece file from the local file
# otherwise it will load the remote piece file from damocles-manager
# If the file "/path/to/{your_local_pieces_dir01, your_local_pieces_dir02, ...}/piece_file_name" does not exist, the worker will also load it from the damocles-manager
# local_pieces_dirs = ["/path/to/your_local_pieces_dir01", "/path/to/your_local_pieces_dir02"]

In most cases, each field in this configuration item does not need to be manually configured.

Only in some special cases, such as:

  • If you would like to name each damocles-worker instance according to your own naming schemes
  • If you don't want to monitor all network card IPs and only allow local rpc requests
  • If multiple damocles-workers are deployed on one machine, you would like to avoid port conflicts so that they are distinguished
  • If damocles-worker can access the piece_store directory, you can set local_pieces_dir to load piece file locally

And in other scenarios, you would need to manually configure the options here as needed.

# [metrics]

metrics provides metrics recorder & exporter (prometheus) options.

# Basic configuration example

[metrics]
# Switch of the prometheus exporter, optional,boolean
# Default value: false
# When enabled,field "http_listen" will be used to bootstrap a prometheus exporter
#enable = false

# prometheus exporter listen address, optional, socket address
# Default value: "0.0.0.0:9000"
#http_listen = "0.0.0.0:9000"

# [sector_manager]

sector_manager is used to configure damocles-manager related information so that damocles-worker can connect to the corresponding service correctly.

# Basic configuration example

[sector_manager]
# The connection address used when constructing the rpc client, required, string type
# Accepts both `multiaddr` format and url format such as `http://127.0.0.1:1789`, `ws://127.0.0.1:1789`
# Normally, use the `multiaddr` format for consistency with other components
rpc_client.addr = "/ip4/127.0.0.1/tcp/1789"

# The http header information when constructing the rpc client, optional, dictionary type
# default is null
# rpc_client.headers = { User-Agent = "jsonrpc-core-client" }

# The verification token carried when requesting deal piece data, optional, string type
# default is null
# This is usually set when this instance allows sealing of sectors with deal data
# The value of this config item is usually the token value of the Venus service used
# piece_token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJuYW1lIjoiMS0xMjUiLCJwZXJtIjoic2lnbiIsImV4dCI6IiJ9.JenwgK0JZcxFDin3cyhBUN41VXNvYw-_0UUT2ZOohM0"

# [sealing]

sealing is used to configure common parameter options in the sealing process.

# Basic configuration example

[sealing]
# Allowed `SP`, optional, number array format
# Defaults to null, allows tasks from any `SP`
# After configuration, only sealing tasks from `SP` listed in the array can be carried out
# allowed_miners = [10123, 10124, 10125]

# Allowed sector size, optional, string array format
# Defaults to null, allows sector tasks of any size
# After configuration, only tasks that match the sector size listed in the array can be executed
# allowed_sizes = ["32GiB", "64GiB"]

# Whether to allow adding deals to the sector, optional, boolean type
# Default is false
# When set to true, the `piece_token` item in `sector_manager` usually needs to be set at the same time
# enable_deals = true

# Whether to disable cc sector, optional, boolean type
# Default is false
# When enable_deals is true, turn on this option, will keep waiting until a storage deal is obtained, instead of starting sealing cc sectors
# disable_cc = true

# The maximum number of deals allowed to be added to the sector, optional, number type
# default is null
# max_deals = 3

# The minimum size of a deal to be filled in a sector, optional, byte string format
# default is null
# min_deal_space = "8GiB"

# Number of retries when an error of `temp` type is encountered during the sealing process, optional, number format
# default is 5
# max_retries = 3

# Retry interval when a `temp` type error is encountered during the sealing process, optional, time string format
# The default is "30s", which means 30 seconds
# recover_interval = "30s"

# Interval for idle `sealing_thread` to apply for new sealing tasks, optional, time string format
# The default is "30s", which means 30 seconds
# seal_interval = "30s"

# rpc status polling request interval, optional, time string format
# The default is "180s", which means 180 seconds
# During the sealing process, some links use the polling method to obtain non-real-time information, such as if a message is on-chain.
# This value helps to avoid overly frequent requests consuming network resources
# rpc_polling_interval = "180s"

# Whether to skip the local verification step of proof, optional, boolean format
# Default is false
# Usually only set this during testing
# ignore_proof_check = false

# Number of retries when unable to request a task from `sector_manager`, optional, number format
# default is 3
# request_task_max_retries = 3

The configuration items in sealing usually have default items preset from experiences, which removes the need for manual configurations in most cases.

# Special configuration example

# 1. Test network, serving only specific SP

allowed_miners = [2234, 2236, 2238]

# 2. Large-scale clusters to reduce network usage

# Among the recoverable exceptions, a considerable part is caused by network jitter, increase the interval of automatic recovery and reduce the frequency of requests
recover_interval = "90s"

# Increases the interval time to reduces the request frequency
rpc_polling_interval = "300s"

# 3. Increase the possibility of self-recovery from errors during sealing tasks

# Increase the number of autorecovery attempts
max_retries = 10

# Increase the auto-recovery interval
recover_interval = "60s"

# [[sealing_thread]]

Each sealing_thread is used to configure a worker thread for a sector. Multiple sealing_thread configuration groups can exist in one configuration file.

The sealing_thread will inherit the configuration in [sealing]. Any sealing.* in the sealing_thread configuration will override the configuration of [sealing].

# Basic configuration example

[[sealing_thread]]
# Sector data directory path, required, string type
# It is recommended to use an absolute path, the data directory and the worker thread are binded in one-to-one relationship
location = "/mnt/nvme1/store"

# task type, optional, string type
# Default value is null
# All options: sealer | snapup | rebuild | unseal | wdpost, when left blank, the default is `sealer`
# plan = "snapup"

# Custom parameters of the sealing process, only applies to the current worker thread
# sealing.allowed_miners = [10123, 10124, 10125]
# sealing.allowed_sizes = ["32GiB", "64GiB"]
# sealing.enable_deals = true
# sealing.disable_cc = true
# sealing.max_retries = 3
# sealing.seal_interval = "30s"
# sealing.recover_interval = "60s"
# sealing.rpc_polling_interval = "180s"
# sealing.ignore_proof_check = false
# sealing.request_task_max_retries = 3

[[sealing_thread]]
location = "/mnt/nvme2/store"


[[sealing_thread]]
location = "/mnt/nvme3/store"

The number of sealing_thread and the corresponding sealing data directory paths need to be planned according to your specific situation.

In order to facilitate combination and matching, each sealing_thread can be configured with an independent sealing sub-item, which satisfies:

  • The naming, type and effect of configurable items are consistent with the common sealing items
  • Only applies to the current worker thread
  • When not configured use the value in the common sealing item

# Special configuration example

# 1. Two worker threads, each serving a different SP

[[sealing_thread]]
location = "/mnt/nvme2/store"
sealing.allowed_miners = [1357]


[[sealing_thread]]
location = "/mnt/nvme3/store"
sealing.allowed_miners = [2468]

# 2. Two worker threads, each with different sector sizes

[[sealing_thread]]
location = "/mnt/nvme2/store"
sealing.allowed_sizes = ["32GiB"]


[[sealing_thread]]
location = "/mnt/nvme3/store"
sealing.allowed_sizes = ["64GiB"]

# sealing_thread configuration hot reload

Before damocles v0.5.0, we could only update the configuration by modifying the sealing_thread configuration and restarting damocles-worker. It is very inconvenient in some scenarios, for example: during Sector rebuild (opens new window) we would like to be able to reload configuration without rebooting damocles-worker after modifying the plan configuration item of a sealing_thread.

sealing_thread configuration hot reload is supported after v0.5.0. By creating a hot reload configuration file named config.toml under the location directory in a sealing_thread with definitions of the configuration file exactly the same as the content of [[sealing_thread]], the configuration items in this configuration file will override the corresponding [[sealing_thread]] configuration items in damocles-worker, and modifying this configuration file does not require restarting damocles-worker to take effect.

The sealing_thread in damocles-worker will check the config.toml file in the location directory before starting a new sector task. A reload or removal of the configuration will be triggered, if the content of the config.toml file is either changed or the file is deleted.

Notice:

  • Hot reload of configuration file config.toml cannot override the location configuration item in sealing_thread.
  • damocles-worker main configuration file does not support hot reload.

# Basic configuration example

# /path/to/the_sealing_thread_location/config.toml

# Task type, optional, string type
# Default value is null
# All options: sealer | snapup | rebuild | unseal | wdpost, if not filled, the default is sealer
# plan = "rebuild"

# Custom parameters of the sealing process, only effective on the current worker thread
# sealing.allowed_miners = [10123, 10124, 10125]
# sealing.allowed_sizes = ["32GiB", "64GiB"]
# sealing.enable_deals = true
# sealing.max_retries = 3
# sealing.seal_interval = "30s"
# sealing.recover_interval = "60s"
# sealing.rpc_polling_interval = "180s"
# sealing.ignore_proof_check = false
# sealing.request_task_max_retries = 3

# [[attached]]

attached is used to configure where the sealed sector persistent data is saved. Multiple configuration entries are allowed at the same time.

# Basic configuration example

[attached]
# name, optional, string type
# The default is the absolute path corresponding to the location
# name = "remote-store1"

# path, required, string type
# It is recommended to fill in the absolute path directly
location = "/mnt/remote/10.0.0.14/store"

# read only, optional, boolean
# Default is false
# readonly = true

With the need to coordinate storage location information between damocles-worker and damocles-manager in mind and the fact that in many cases the same persistent storage directory mount path on the damocles-worker machine and on the damocles-manager machine are not exactly the same, so we decided to use name as the basis for coordination.

You can also choose not to configure name on both sides of damocles-worker and damocles-manager during configuration if the mount path of the persistent storage directory is the same on all machines. In this case, both will use the absolute path as the name.

# [processors]

processors is used to configure sealing task executors, and some other information during sealing.

This configuration item is actually divided into three sub-items, and we shall analyze them one by one.

# [processors.limitation.concurrent]

When both [processors.limit] and [processors.limitation.concurrent] exist in the configuration file, the latter shall override.

processors.limitation.concurrent is used to configure the number of parallel tasks for a specific sealing task. This is to reduce the contention of hardware resources at one given sealing stage.

It should be noted that when external processors are configured, the number of external processors and the total allowed concurrency will also affect the number of parallel tasks.

# Basic configuration example

[processors.limitation.concurrent]
# Consurrency limit for add_pieces task, optional, number_type
# add_pieces = 5

# Consurrency limit for tree_d task, optional, number type
# tree_d = 1

# Concurrency limit for the pc1 task, optional, number type
# pc1 = 3

# Concurrency limit of the pc2 task, optional, number type
# pc2 = 2

# Concurrency limit for task c2, optional, number type
# c2 = 1

For example, if pc2 = 2 is set, then at most two sectors can perform pc2 task at the same time.

# [processors.limitation.staggered]

processors.limitation.staggered is used to configure the time interval for staggered startup of parallel tasks during a given sealing phase. After configuring this item, when multiple tasks are scheduled at the same time for a given sealing phase, damocles-worker will start the tasks in sequence according to the configured time interval, so as to avoid the problem of resource shortage such as slowdown of disk IO caused by the simultaneous start of tasks.

# Basic configuration example

[processors.limitation.staggered]
# The time interval for starting multiple pc1 tasks in sequence, optional, string type (e.g. "1s", "2min")
# pc1 = "5min"
# pc2 = "4min"

For example, if pc1 = "5min" is set, when two pc1 tasks are scheduled at the same time, the second task will be executed after 5 minutes into the execution of the first task.

# [processors.ext_locks]

processors.ext_locks is used to configure some custom lock restrictions, which is used in conjunction with the locks configuration item in [[processors.{stage_name}]]. This configuration item allows users to customize some restrictions and make different external processors subject to them.

# Basic configuration example

[processors.ext_locks]
# some_name = some_number

# Special configuration example

processors.ext_locks by itself does not apply the lock.

# one GPU, shared by pc2 and c2
[processors.ext_locks]
gpu=1

[[processors.pc2]]
locks = ["gpu"]

[[processors.c2]]
locks = ["gpu"]

By this way, pc2 c2 will each start an external processor, and the two will compete for the lock, which means that the two tasks will not happen at the same time.

# Two GPU, shared by pc2 and c2
[processors.ext_locks]
gpu1 = 1
gpu2 = 1

[[processors.pc2]]
locks = ["gpu1"]

[[processors.pc2]]
locks = ["gpu2"]

[[processors.c2]]
locks = ["gpu1"]

[[processors.c2]]
locks = ["gpu2"]

In this way, pc2 c2 will each start two external processors, which will create a two-two competition relationship, which allows to limit a GPU to only execute one of the sealing tasks.

# [processors.static_tree_d]

processors.static_tree_d is a configuration item introduced to improve the efficiency of cc sector.

When a static file path is configured for the corresponding sector size, this file will be used directly as the tree_d file for cc sector without trying to generate it again.

# Basic configuration example

[processors.static_tree_d]
2KiB = "/var/tmp/2k/sc-02-data-tree-d.dat"
32GiB = "/var/tmp/32g/sc-02-data-tree-d.dat"
64GiB = "/var/tmp/64g/sc-02-data-tree-d.dat"

# [[processors.{stage_name}]]

This is the configuration group used to configure external processors.

Currently {stage_name} could be one of the following.

  • add_pieces for add pieces phase
  • tree_d for generation phase of Tree D
  • pc1 for the PreCommit1 phase
  • pc2 for the PreCommit2 phase
  • synth_proof for the synthetic proof phase
  • c2: for Commit2 phase
  • unseal: for unseal phase
  • transfer: used to customize the transfer method between local data and persistent data storage

Each such configuration group translates to an external processor of the corresponding sealing phase to be started. If nothing is configured for one of the above {stage_name} and corresponding [[processors.{stage_name}]] line does not exist in the configuration file, then damocles-worker will not start a child process for this {stage_name}. damocles-worker will use the built-in processor code to directly execute the corresponding {stage_name} task in sealing_thread, which means that the concurrent number of {stage_name} tasks depends on the corresponding number of sealing_thread and concurrent number of {stage_name} as configured in [processors.limitation.concurrent]. By not configuring external processor saves extra steps such as serializing task parameters and task output, but one loses capabilities such as more powerful concurrency control, cgroup control, and custom proof algorithms. You are feel to choose according to your own needs.

The optional configuration items for [[processors.{stage_name}]] include:

[[processors.pc1]]
# Custom external processor executable file path, optional, string type
# By default, the executable file path corresponding to the main process will be used, Execute the damocles-worker builtin processor
# bin = "./dist/bin/damocles-worker-plugin-pc1"

# Customize the parameters of the external processor, optional item, string array type
# The default value is null, `damocles-worker`'s own executor default parameters will be used
# args = ["--args-1", "1", --"args-2", "2"]

# Time to get ready of external child-processes, optional, time string
# The default value is 5s
# stable_wait = "5s"

# numa affinity partition id, optional, number type
# Default value is null, no affinity will be set
# Need to fill in according to the host's numa partition
# numa_preferred = 0

# cpu core binding and limit options, optional, string type
# The default value is null, no binding is set
# The format of the value follows the standard cgroup.cpuset format
# cgroup.cpuset = "4-5"

# Additional environment variables for external processors, optional, dictionary type
# Default value is null
# envs = { RUST_LOG = "info" }

# The maximum number of concurrent tasks allowed by this executor
# The default value is null, unlimited, but whether the task is executed concurrently depends on the implementation of the external processor used
# Mainly used in pc1 so that multiple parallel links can be used, which can effectively save resources such as shared memory and thread pools
# concurrent = 4

# Custom external limit lock name, optional, string array type
# Default value is null
# locks = ["gpu1"]

# Whether to restart automatically when a child process exits, optional, bool type
# Default is true
# auto_restart = true

# Whether to inherit the environment variables from the worker daemon process, optional, bool type
# Default is true
# inherit_envs = true

# Basic configuration example

[processors.limit]
add_pieces = 8
pc1 = 4
pc2 = 2
c2 = 1

[[processors.pc1]]
numa_preferred = 0
cgroup.cpuset = "0-7"
concurrent = 2
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }
auto_restart = true

[[processors.pc1]]
numa_preferred = 1
cgroup.cpuset = "12-19"
concurrent = 2
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }
auto_restart = true

[[processors.pc2]]
cgroup.cpuset = "8-11,24-27"
envs = { FIL_PROOFS_USE_GPU_COLUMN_BUILDER = "1", FIL_PROOFS_USE_GPU_TREE_BUILDER = "1", CUDA_VISIBLE_DEVICES = "0" }
auto_restart = true

[[processors.pc2]]
cgroup.cpuset = "20-23,36-39"
envs = { FIL_PROOFS_USE_GPU_COLUMN_BUILDER = "1", FIL_PROOFS_USE_GPU_TREE_BUILDER = "1", CUDA_VISIBLE_DEVICES = "1" }
auto_restart = true


[[processors.c2]]
cgroup.cpuset = "28-35"
envs = { CUDA_VISIBLE_DEVICES = "2,3" }
auto_restart = true


[[processors.tree_d]]
cgroup.cpuset = "40-45"
auto_restart = true

The above is an example processors.{stage_name} configuration for a 48C + 4GPU box. Under this configuration, it will start:

  • 2 pc1 external processors, both using MULTICORE_SDR mode, each with 8 cores, allowing 2 concurrent tasks, and memory allocation preferentially uses its configured numa partition
  • 2 pc2 external processors, each with 8 cores, each using one GPU
  • 1 c2 external processor, allocated 8 cores, using one GPU
  • 1 tree_d external processor with 6 cores allocated

# Special configuration example

# 1. Using user-defined closed source, algorithmically optimized c2 external processor
[[processors.c2]]
bin = "/usr/local/bin/damocles-worker-c2-optimized"
cgroup.cpuset = "40-47"
envs = { CUDA_VISIBLE_DEVICES = "2,3" }
# 2. c2 external processor using outsourced mode (remote computation)
[[processors.c2]]
bin = "/usr/local/bin/damocles-worker-c2-outsource"
args = ["--url", "/ip4/apis.filecoin.io/tcp/10086/https", "--timeout", "10s"]
envs = { LICENCE_PATH = "/var/tmp/c2.licence.dev" }
# 3. Use CPU mode to make up for pc2 computing power when GPU is lacking
[[processors.pc2]]
cgroup.cpuset = "8-11,24-27"
envs = { FIL_PROOFS_USE_GPU_COLUMN_BUILDER = "1", FIL_PROOFS_USE_GPU_TREE_BUILDER = "1", CUDA_VISIBLE_DEVICES = "0" }

[[processors.pc2]]
cgroup.cpuset = "20-23,36-45"
# 4. Under the optimal planning, the total amount of pc1 is odd and cannot be divided equally
[processors.limitation.concurrent]
pc1 = 29
pc2 = 2
c2 = 1

[[processors.pc1]]
numa_preferred = 0
cgroup.cpuset = "0-41"
concurrent = 14
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }

[[processors.pc1]]
numa_preferred = 1
cgroup.cpuset = "48-92"
concurrent = 15
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }
# 5. To prioritize numa 0 area for pc1
[processors.limit]
pc1 = 29
pc2 = 2
c2 = 1

[[processors.pc1]]
numa_preferred = 0
cgroup.cpuset = "0-47"
concurrent = 16
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }

[[processors.pc1]]
numa_preferred = 1
cgroup.cpuset = "48-86"
concurrent = 13
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }

# cgroup.cpuset Configuration Notes

  • For PC1 external processors with multicore SDR enabled, cgroup.cpuset should use the entirety of the CPU cores under one L3 cache as a base unit for configuration as much as possible. If you need to configure portion of CPU cores under one L3 cache, you must make sure that the number of CPU cores under each L3 cache is the same.

    According to rust-fil-proofs (opens new window), when multicore sdr is enabled, the number of CPU cores and number of corresponding cores on single DIE must be an integer multiple relationship. If the external processor is using rust-fil-proofs or rust-fil-proofs based variants, this rule must be followed, otherwise it can be ignored. The default processor of damocles-worker is based on rust-fil-proofs.

  • The CPU cores in cgroup.cpuset should not span NUMA nodes as much as possible. Across NUMA nodes will make CPU access to memory slower. (damocles supports loading NUMA affinity hugepage memory files (opens new window) after v0.5.0, if this feature is enabled then you can allocate cpusets across NUMA nodes without concerns)

    If the configured CPU cores are on the same NUMA node, processors.{stage_name}.numa_preferred can be configured with the corresponding NUMA node id.

Use damocles-worker-util to check CPU information

./damocles-worker-util hwinfo

Output:

CPU topology:
Machine (503.55 GiB)
├── Package (251.57 GiB) (*** *** *** 32-Core Processor)
│ ├── NUMANode (#0 251.57 GiB)
│ ├── L3 (#0 16 MiB)
│ │ └── PU #0 + PU #1 + PU #2 + PU #3
│ ├── L3 (#1 16 MiB)
│ │ └── PU #4 + PU #5 + PU #6 + PU #7
│ ├── L3 (#2 16 MiB)
│ │ └── PU #8 + PU #9 + PU #10 + PU #11
│ ├── L3 (#3 16 MiB)
│ │ └── PU #12 + PU #13 + PU #14 + PU #15
│ ├── L3 (#4 16 MiB)
│ │ └── PU #16 + PU #17 + PU #18 + PU #19
│ ├── L3 (#5 16 MiB)
│ │ └── PU #20 + PU #21 + PU #22 + PU #23
│ ├── L3 (#6 16 MiB)
│ │ └── PU #24 + PU #25 + PU #26 + PU #27
│ └── L3 (#7 16 MiB)
│ └── PU #28 + PU #29 + PU #30 + PU #31
└── Package (251.98 GiB) (*** *** *** 32-Core Processor)
    ├── NUMANode (#1 251.98 GiB)
    ├── L3 (#8 16 MiB)
    │ └── PU #32 + PU #33 + PU #34 + PU #35
    ├── L3 (#9 16 MiB)
    │ └── PU #36 + PU #37 + PU #38 + PU #39
    ├── L3 (#10 16 MiB)
    │ └── PU #40 + PU #41 + PU #42 + PU #43
    ├── L3 (#11 16 MiB)
    │ └── PU #44 + PU #45 + PU #46 + PU #47
    ├── L3 (#12 16 MiB)
    │ └── PU #48 + PU #49 + PU #50 + PU #51
    ├── L3 (#13 16 MiB)
    │ └── PU #52 + PU #53 + PU #54 + PU #55
    ├── L3 (#14 16 MiB)
    │ └── PU #56 + PU #57 + PU #58 + PU #59
    └── L3 (#15 16 MiB)
        └── PU #60 + PU #61 + PU #62 + PU #63
...

From the output information, we can see that this machine has two NUMANodes, each NUMANode has 8 L3 Caches, and each L3 cache has 4 CPU cores.

  • CPU cores on NUMANode #0: 0-31
  • CPU cores on NUMANode #1: 31-63

Using this machine as an example, configuration like processors.pc1.cgroup.cpuset = "0-6" does not satisfy the rust-fil-proofs rule.

  1. PU#0, PU#1, PU#2, PU#3, these 4 CPU cores belong to L3#0
  2. PU#4, PU#5, PU#6, these 3 CPU cores belong to L3#1

The number of shared caches of is 2 (L3#0, L3#1), and the number of configured CPU cores is inconsistent, which does not meet the rust-fil-proofs rule and cannot be started. The correct configuration could be processors.pc1.cgroup.cpuset = "0-7".

# A minimal working configuration file example

[sector_manager]
rpc_client.addr = "/ip4/{some_ip}/tcp/1789"

# According to actual resource planning
[[sealing_thread]]
location = "{path to sealing store1}"

[[sealing_thread]]
location = "{path to sealing store2}"

[[sealing_thread]]
location = "{path to sealing store3}"

[[sealing_thread]]
location = "{path to sealing store4}"

[[sealing_thread]]
location = "{path to sealing store5}"

[[sealing_thread]]
location = "{path to sealing store6}"

[[sealing_thread]]
location = "{path to sealing store7}"

[[sealing_thread]]
location = "{path to sealing store8}"


[remote_store]
name = "{remote store name}"
location = "{path to remote store}"

[processors.static_tree_d]
32GiB = "{path to static tree_d for 32GiB}"
64GiB = "{path to static tree_d for 64GiB}"

# According to actual resource planning
[processors.limit]
pc1 = 4
pc2 = 2
c2 = 1

[[processors.pc1]]
numa_preferred = 0
cgroup.cpuset = "0-7"
concurrent = 2
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }

[[processors.pc1]]
numa_preferred = 1
cgroup.cpuset = "12-19"
concurrent = 2
envs = { FIL_PROOFS_USE_MULTICORE_SDR = "1" }


[[processors.pc2]]
cgroup.cpuset = "8-11,24-27"
envs = { FIL_PROOFS_USE_GPU_COLUMN_BUILDER = "1", FIL_PROOFS_USE_GPU_TREE_BUILDER = "1", CUDA_VISIBLE_DEVICES = "0" }

[[processors.pc2]]
cgroup.cpuset = "20-23,36-39"
envs = { FIL_PROOFS_USE_GPU_COLUMN_BUILDER = "1", FIL_PROOFS_USE_GPU_TREE_BUILDER = "1", CUDA_VISIBLE_DEVICES = "1" }


[[processors.c2]]
cgroup.cpuset = "28-35"
envs = { CUDA_VISIBLE_DEVICES = "2,3" }


[[processors.tree_d]]
cgroup.cpuset = "40-45"

After planning according to the actual situation and filling in the corresponding information, the above is a minimal working configuration file that...

  • Does cc sector only
  • Optimized tree_d generation for 32GiB and 64GiB sectors
  • Integrated resource allocation

# References to the Venus community test cases

Reference Example 1 (opens new window) Takeaways: PC1 is precisely limited to cores, and C2 is completed by gpuproxy, which has strong scalability. The disadvantage is that the configuration is complex, and the number of tasks needs to be adjusted according to the actual hardware specs

Reference Example 2 (opens new window) Takeaways: PC2 and C2 share 1 GPU, may cause some C2 tasks be backlogged

Reference Example 3 (opens new window) Takeaways: 2 sets of PC2 share GPU resources with 2 sets of C2 respectively

Reference Example 4 (opens new window) Takeaways: Suitable for lower end machines, creates 96G of swap space on NVMe, but this may cause some tasks to be slower