# Standalone PoSter node
In earlier versions, although damocles-manager
supports using --poster
and --miner
parameters of the daemon run
command to enable the corresponding module, the post
proof process is still of strong correlation with sector location information which makes it more limited and difficult to expand.
From v0.2.0 onwards, we have provided a series of function combinations that make easy-to-use, scalable standalone PoSter nodes an option for SP
of large-scale operations and multiple miner ids.
Below, we will introduce these new features and provide a practice to complete the deployment of standalone PoSter nodes using these features. Subsequent documents use the node with --poster
enabled as an example, and the standalone --miner
node operates in a similar manner, which will not be described separately.
From version v0.8.0 and onwards, damocles supports three ways to run PoSter nodes independently, namely worker-prover mode, proxy node mode, and ext-prover mode (external executor mode).
# worker-prover mode
The worker-prover mode is a new feature of v0.8.0. It is characterized by simplicity and can support multi-machine wdpost (with coordination and redundancy) very easily.
# Fundamental
The worker-prover mode uses damocles-worker to compute the window post proof, obtains the window post task from damocles-manager through RPC and returns the computation result.
damocles-worker adds wdpost planner for executing window post tasks.
# Architecture
+-----------------------------------+
| damocles-manager daemon |
| with --worker-prover flag |
| |
| +-----------------+ |
| |damocles-manager | |
| | poster module | |
| +-------+-^-------+ |
| send | |recv |
| | | |
| +-------v-+-------+ |
| | worker-prover | |
+--------+--------> module <--------+--------+
| | +--------^--------+ | |
| | | | |
| +-----------------+-----------------+ |
| | |
-------+--------------------------+--------------------------+------------
| | |
| pull job | pull job | pull job
| push res | push res | push res
| by rpc | by rpc | by rpc
| | |
+------+--------+ +-------+-------+ +------+--------+
|damocles-worker| |damocles-worker| |damocles-worker|
|wdpost planner | |wdpost planner | ... |wdpost planner |
+---------------+ +---------------+ +---------------+
# Damocles-manager Configuration and Startup
New configuration:
# ~/.damocles-manager/sector-manager.cfg
#...
[Common.Proving.WorkerProver]
# The maximum number of attempts of the WindowPoSt task, optional, number type
# Default value is 2
# The WindowPoSt task whose number of attempts exceeds JobMaxTry can only be re-executed by manually resetting
JobMaxTry = 2
# WindowPoSt task heartbeat timeout, optional, time string type
# Default value is 15s
# Tasks that have not sent a heartbeat for more than this amount of time will be set as failed and retried
HeartbeatTimeout = "15s"
# WindowPoSt task timeout, optional, time string type
# Default value is 25h
# WindowPoSt tasks whose creation time exceeds this time will be deleted
JobLifetime = "25h0m0s"
#...
Start the damocles-manager process:
# --miner flag is optional to add, which means to start the miner module to execute WinningPoSt and produce blocks
# --poster flag must be added, which means to start the WindowPoSt module
# --worker-prover must be added, indicating that the WorkerProver module is used to execute WindowPoSt
./damocles-manager daemon run --miner --poster --worker-prover
# damocles-worker configuration
Configuration walkthrough:
[[sealing_thread]]
# Configure to use wdpost plan
plan = "wdpost"
# The configuration limits the execution of the task to the specified miner id; if it is left empty, it means no limit
# sealing.allowed_miners = [6666, 7777]
# Configure tasks that only allow sectors of the specified size to run
# allowed_sizes = ["32GiB", "64GiB"]
[[attached]]
# Configure the permanent storage that this worker will use during the execution of the window post task
name = "miner-6666-store"
location = "/mnt/miner-6666-store"
# Control window_post task concurrency (optional), no limit if not configured
[processors.limitation.concurrent]
window_post = 2
[[processors. window_post]]
# Use a custom wdpost proof (optional), if you do not configure bin, the built-in proof will be used by default
bin="~/my_algorithm"
args = ["window_post"]
# Limit the cpuset to this sub processes
# cgroup.cpuset = "10-19"
# Configure environment variables for custom proof (optional)
envs = { BELLMAN_GPU_INDEXS="0", CUDA_VISIBLE_DEVICES="0", TMPDIR = "/tmp/worker-prover1/", ... }
# Configure the maximum concurrent number of this process (optional), no limit if not configured
concurrent = 1
# A simple example configuration to start just one wdpost sealing_thread is as follows:
# /path/to/your-damocles-worker-config.toml
[worker]
name = "damocles-worker-USA-01"
[sector_manager]
rpc_client.addr = "/ip4/your-damocles-manager-address-here/tcp/1789"
[[sealing_thread]]
plan = "wdpost"
# The time interval for trying to claim tasks, the default is 60s,
# For wdpost plan, we can reduce this value to get new wdpost tasks faster
sealing.recover_interval = "15s"
# sealing.allowed_miners = [6666]
# sealing. allowed_sizes = ["32GiB"]
#...
[[attached]]
name = "miner-6666-store"
location = "/mnt/miner-6666-store"
# Example configuration for two wdpost job with one GPU
# /path/to/your-damocles-worker-config.toml
[worker]
name = "damocles-worker-USA-01"
[sector_manager]
rpc_client.addr = "/ip4/your-damocles-manager-address-here/tcp/1789"
[[sealing_thread]]
plan = "wdpost"
sealing.recover_interval = "15s"
sealing.allowed_miners = [6666]
# ...
[[sealing_thread]]
plan = "wdpost"
sealing.recover_interval = "15s"
sealing.allowed_miners = [7777]
# ...
[[attached]]
name = "miner-6666-store"
location = "/mnt/miner-6666-store"
[[attached]]
name = "miner-7777-store"
location = "/mnt/miner-7777-store"
# -------------------------
[[processors.window_post]]
bin="~/my_algorithm"
# args = ["window_post", ...]
cgroup.cpuset = "10-19"
envs = { CUDA_VISIBLE_DEVICES="0", TMPDIR = "/tmp/worker-prover1/", ... }
concurrent = 1
[[processors.window_post]]
bin="~/my_algorithm"
# args = ["window_post", ...]
cgroup.cpuset = "20-29"
envs = { CUDA_VISIBLE_DEVICES="0", TMPDIR = "/tmp/worker-prover1/", ... }
concurrent = 1
Doing two wdPost jobs with one GPU can greatly improve utilization. As each wdPost job is composed of vanilla_proofs
(CPU computation) and snark_proof
(GPU computation), improved GPU utilization can be achieved through allowing vanilla_proofs
to be computed in parallel and having snark_proof
to be computed serially using GPU lock file in TMPDIR
.
# An example of a wdpost machine equipped with two graphics cards
# /path/to/your-damocles-worker-config.toml
[worker]
name = "damocles-worker-USA-01"
[sector_manager]
rpc_client.addr = "/ip4/your-damocles-manager-address-here/tcp/1789"
[[sealing_thread]]
plan = "wdpost"
sealing.recover_interval = "15s"
sealing.allowed_miners = [6666]
#...
[[sealing_thread]]
plan = "wdpost"
sealing.recover_interval = "15s"
sealing.allowed_miners = [7777]
#...
[[attached]]
name = "miner-6666-store"
location = "/mnt/miner-6666-store"
[[attached]]
name = "miner-7777-store"
location = "/mnt/miner-7777-store"
# -------------------------
[[processors. window_post]]
# bin="~/my_algorithm"
# args = ["window_post", ...]
# cgroup.cpuset = "10-19"
envs = { ... }
concurrent = 2
# ----------- or ---------
#[[processors. window_post]]
# bin="~/my_algorithm"
# args = ["window_post", ...]
# cgroup.cpuset = "10-19"
# envs = { CUDA_VISIBLE_DEVICES="0", TMPDIR = "/tmp/worker-prover1/",, ... }
# concurrent = 1
# [[processors. window_post]]
# bin="~/my_algorithm"
# args = ["window_post"]
# cgroup.cpuset = "20-29"
# envs = { CUDA_VISIBLE_DEVICES="1", TMPDIR = "/tmp/worker-prover2/", ... }
# concurrent = 1
When damocles-worker is running wdpost plan, it is not necessary to use the damocles-worker store sealing-init -l
command to initialize the local storage directory of the data.
# Manage window post tasks
# Show window post task list
# By default, unfinished tasks and failed tasks are displayed, where the DDL field represents the deadline Index of the task, and the Try field is the number of attempts of the task
./damocles-manager util worker wdpost list
JobID MinerID DDL Partitions Worker State Try CreateAt Elapsed Heartbeat Error
3FgfEnvrub1 1037 3 1,2 10.122.63.30 ReadyToRun 1 07-27 16:37:31 - -
gbCVH4TUgEf 1037 2 1,2 ReadyToRun 0 07-27 16:35:56 - -
CrotWCLaXLa 1037 1 1,2 10.122.63.30 Succeed 1 07-27 17:19:04 6m38s(done) -
# show all tasks
./damocles-manager util worker wdpost list --all
#...
# show window post task details
./damocles-manager util worker wdpost list --detail
#...
# reset task
When the execution of the window post task fails and the number of automatic retries reaches the limit, the task status can be manually reset so that it can continue to be picked up and executed by damocles-worker.
./damocles-manager util worker wdpost reset gbCVH4TUgEf 3FgfEnvrub1
# delete task
Deleting a task is similar to resetting a task. When the command to delete a task is executed, the retry mechanism of damocles-manager will detect whether the window post task of the current deadline exists in the database, if not, it will resend the task and record it in the database.
In addition, worker-prover will automatically delete tasks that have been created for more than a certain period of time (the default is 25 hours, and the time is configurable).
# Delete the specific task
./damocles-manager util worker wdpost remove gbCVH4TUgEf 3FgfEnvrub1
# delete all tasks
./damocles-manager util worker wdpost remove-all --really-do-it
# Disable damocles-manager to use GPU
After enabling the worker-prover feature, winning_post is executed by damocles-manager. If you do not want winning_post to use the GPU, you can compile damocles-manager with the following command to disable it:
make dist-clean
FFI_BUILD_FROM_SOURCE=1 FFI_USE_GPU=0 make build-manager
# Proxy node mode
We know that for PoSter nodes, the most important capability is to obtain real-time and accurate sector location information. In the current damocles-manager
version, we only provide metadata management based on the local embedded kv database (more to be supported).
This only allows data to be managed by one process, and direct data sharing across processes is not possible.
Therefore, we designed the proxy node mode to provide some metadata to other consumer nodes through network interfaces, thus realizing data sharing.
# How to use the proxy node
We have added the --proxy
parameter to the daemon run
command. Its format is like {ip}:{port}
. When the startup command contains a valid --proxy
parameter, {ip}:{port}
will be used as data source for the current damocles-manager
node and the necessary metadata (read-only) management module will be constructed.
In addition to --proxy
, we also provide switches that control whether proxy mode is enabled for specific data management modules.
We just provide --proxy-sector-indexer-off
switch for the time being. When --proxy-sector-indexer-off
is enabled, nodes use the SectorIndexer
database in their own data directory.
For example, if started with the damocles-manager daemon run --miner
command, it will launch a damocles-manager
instance listening on port 1789
using ~/.damocles-manager
as the data directory with mining module enabled.
At this time, we can use the following command to initialize and start a proxy node with the above instance as the data source on the same machine. This proxy node will use ~/.damocles-manager2
as the data directory and listen to 2789
port.
damocles-manager --home ~/.damocles-manager2 daemon init
// maintain configuration files
damocles-manager --home ~/.damocles-manager2 daemon run --proxy="127.0.0.1:1789" --listen=":2789" --poster
The proxy node can provide the exact same and real-time sector location information as the source node.
# The agent node uses the existing configuration file
According to the method described in the previous section, we can already start an proxy node, but there is still a problem with this startup method: the configuration file of the proxy node needs to be written again, or copied from the data directory of the source node. This introduces additional maintenance work, especially when configuration files may change frequently.
For this, we also provide a --conf-dir
parameter, which is in the form of a directory path. When the startup command includes a valid --conf-dir
parameter, the node will use the configuration file that already exists in the specified directory as its own configuration file.
This saves the work of writing and maintaining configuration files for different source and agent nodes on the same machine and serving the same set of clusters.
Based on this function, the agent node startup method mentioned in the previous section can become:
damocles-manager --home ~/.damocles-manager2 daemon run --proxy="127.0.0.1:1789" --listen=":2789" --conf-dir="~/.damocles-manager" --poster
At this point, the source node and the proxy node will use the same batch of configuration files.
# ext-prover executor
In addition to sharing sector information, another challenge faced by standalone PoSter nodes is the utilization of hardware resources.
Limited by the underlying algorithm library, granularity of computing nodes utilizing GPUs is by process. This makes it difficult for PoSter nodes to effectively utilize the computing power of multiple GPUs, and it is also difficult to safely avoid proof timeouts when multiple SP
s have conflicting WindostPoSt
proof windows.
For this, we provide an ext-prover
mechanism similar to the ext processor
in damocles-worker
.
The ext-prover
mechanism consists of two components:
- The
--ext-prover
parameter of thedaemon run
command - The
ext-prover.cfg
configuration file in the node data directory
A default ext-prover.cfg
file looks like:
# Default config:
#[[WdPost]]
#Bin = "/path/to/custom/bin"
#Args = ["args1", "args2", "args3"]
#Concurrent = 1
#Weight = 1
#ReadyTimeoutSecs = 5
#[WdPost.Envs]
#ENV_KEY = "ENV_VAL"
#
#[[WinPost]]
#Bin = "/path/to/custom/bin"
#Args = ["args1", "args2", "args3"]
#Concurrent = 1
#Weight = 1
#ReadyTimeoutSecs = 5
#[WinPost.Envs]
#ENV_KEY = "ENV_VAL"
#
In recent versions, daemon init
initializes the ext-prover.cfg
file.
Users can write their own, or copy the corresponding files from a data directory initialized by the latest version to an existing data directory.
The functions of the configuration items in ext-prover.cfg
are very similar to the configuration blocks in damocles-worker
, and users can refer to the corresponding documents for reference.
When the --ext-prover
parameter is included in the start command of damocles-manager
, the node will use the ext-prover.cfg
configuration file in the configuration directory as the basis for starting child processes. For this configuration file, setting the --conf-dir
parameter will also have an effect.
If user sees logs like the following, then it means that ext-prover
is ready.
2022-04-27T19:15:00.441+0800 INFO porver-ext ext/prover.go:122 response loop start {"pid": 24764, "ppid": 24732, "loop": "resp"}
2022-04-27T19:15:00.441+0800 INFO porver-ext ext/prover.go:155 request loop start {"pid": 24764, "ppid": 24732, "loop": "req"}
2022-04-27T19:15:00.468+0800 INFO processor-cmd processor/processor.go:35 ready {"pid": 24764, "ppid": 24732, "proc": "wdpost"}
# Deployment Practice
Suppose we have a node machine with 8 GPUs, then we can provide stronger PoSt processing capabilities through the following configuration.
Configure and start the source node
damocles-manager daemon run --miner
At this time, the source node only provides functions and capabilities related to sealing;
Configure the
ext-prover.cfg
file:[[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "0" TMPDIR = "/tmp/ext-prover0/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "1" TMPDIR = "/tmp/ext-prover1/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "2" TMPDIR = "/tmp/ext-prover2/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "3" TMPDIR = "/tmp/ext-prover3/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "4" TMPDIR = "/tmp/ext-prover4/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "5" TMPDIR = "/tmp/ext-prover5/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "6" TMPDIR = "/tmp/ext-prover6/" [[WdPost]] [WdPost.Envs] CUDA_VISIBLE_DEVICES = "7" TMPDIR = "/tmp/ext-prover7/"
Initialize and start a standalone
PoSter
nodedamocles-manager --home=~/.damocles-individual-poster daemon init damocles-manager --home=~/.damocles-individual-poster daemon run --proxy="127.0.0.1:1789" --poster --listen=":2789" --conf-dir="~/.damocles-manager" --ext-prover
By this way of deployment,
- The source node provides both sealing and mining support
- Proxy nodes provide
WindowPoSt
support- The proxy node enables
ext-prover
, and each child process independently uses a GPU and a computing lock directory
- The proxy node enables
There is no conflict between winningPost
and windowPost
due to device usage
# Limitations
So far, we have described the functions, principles and simple usage examples that stand-alone PoSter
nodes rely on.
However, this mode still has some limitations for very large SP
clusters, which may manifest in:
- The scheduling of the
PoSt
and the serious conflict in thePoSt
window period still relies on the operation and maintenance to a certain extent;