Skip to main content

Configuration reference

Every option is both a command-line flag and an environment variable: --data-root is CELERIANT_DATA_ROOT, --num-shards is CELERIANT_NUM_SHARDS, and so on. Flags suit a hand-run binary; the env vars suit containers. This page lists the ones you operate with; the binary's --help is the exhaustive list.

Core

FlagDefaultNotes
--data-rootdataData directory. Must be on an O_DIRECT-capable filesystem.
--listen-address0.0.0.0Bind address.
--client-port10000Client connections.
--num-shardsCPU countOne core per shard.
--routing-ruleaggregate_idorg_id, aggregate_type_id, or aggregate_id. Fixed at cluster init; changing it means re-sharding. Decides which aggregates can be co-committed.
--standalonefalseSingle node, no replication or S3.
--log-levelinfotrace, debug, info, warn, error.

Cluster and failover

FlagDefaultNotes
--replication-port10001Leader-to-follower replication and heartbeats.
--advertised-client-addressderivedWhat clients are told to reach; set behind a proxy or LB.
--advertised-replication-addressderivedWhat the peer is told to reach.
--heartbeat-interval-ms500Leader heartbeat cadence.
--heartbeat-timeout-ms= intervalPer-heartbeat soft timeout. Set below --heartbeat-lease-duration-ms.
--heartbeat-hard-timeout-multiplier4Hard cap for kernel-blocked kTLS sends that ignore the soft timeout.
--heartbeat-lease-duration-ms1500Silence before the follower's heartbeat lease expires.
--heartbeat-starve-threshold-ms500While a heartbeat is in flight longer than this, reject new writes with FollowerHeartbeatStarved so the NIC has bandwidth for the ack. 0 disables.
--s3-lease-duration-ms30000Durable leader-lease TTL. See leader election.
--max-clock-drift-ms500Slack added to lease checks. NTP is required; flapping elections almost always trace to drift.

Write pipeline batching

These three knobs control how aggressively the leader amortises fsync and replication. Wider windows = more throughput per write, more latency per individual write. Tune to your write-latency SLO.

FlagDefaultNotes
--fsync-delay-us17000Window during which incoming writes are coalesced into one fsync.
--replication-delay-us17000Same idea, for the replication send.
--s3-replication-delay-us500000Same idea, while in S3 fallback. Larger to keep S3 cost down and avoid starving lease renewal.
--replication-rollback-cooldown-us500000After a replication rollback, reject new writes with ReplicationBackpressure for this long to drain the pending queue and prevent rollback storms.
--s3-max-concurrent-fallback-uploads128Cap on parallel S3 fallback uploads across all shards. Lower for MinIO or local-LAN where saturation can starve lease renewal.
--internode-max-request-size64 MiBCap on a single replication batch payload. Also bounds a single promotion catch-up batch.

S3

FlagDefaultNotes
--s3-enabledfalseRequired for a cluster. Needs region and bucket.
--s3-region / --s3-bucketnoneThe bucket must support conditional writes.
--s3-access-key-id / --s3-secret-access-keynoneOr the usual AWS credential chain.
--s3-subfoldernoneIsolate multiple clusters in one bucket.
--s3-endpoint-overridenoneFor MinIO and other S3-compatible stores.
--s3-allow-http / --s3-skip-signaturefalseLocal testing only.

Security

FlagDefaultNotes
--tls-modedisableddisabled or strict. See TLS and mTLS.
--tls-ca-cert / --tls-node-cert / --tls-node-keynoneThe trust root and node identity.
--tls-client-authrequirerequire, optional, none.
--require-client-identityfalseForce the identity handshake.

Storage and memory

FlagDefaultNotes
--memory-consumption-percent80Share of RAM for caches (1-95).
--memory-budget-bytesautoExplicit override.
--shard-log-preallocate-bytes1GiBSize of each WAL file.
--wal-compression-level3zstd level for the WAL.
--compaction-check-interval-secs7200How often to scan for reclaimable space.
--compaction-temp-dirshard dirMust be on the same filesystem as --data-root.

Limits and observability

FlagDefaultNotes
--max-request-size16 MiBPer client request.
--max-response-size64 MiBPer response.
--max-requested-latency-ms2000Cap on a watch latency request.
--metrics-enabledtruePrometheus /metrics and /health.
--metrics-port9090See Monitoring.

Tuning notes

  • Write throughput vs latency. --fsync-delay-us and --replication-delay-us are the main knobs. The defaults (17 ms each) target high throughput. If your SLO is single-digit-ms p99, lower both to your latency budget minus your fsync time minus network RTT. You will give up throughput.
  • S3 fallback windows hurt. While the follower is unreachable, every write pays --s3-replication-delay-us (default 500 ms) on the ack. This is intentional — bigger batches mean fewer S3 PUTs. If your follower is flapping briefly, the cluster recovers; if it is down for hours, plan for elevated write latency and budget S3 costs accordingly.
  • NTP is required. --max-clock-drift-ms is slack, not the budget. If your nodes drift more than this between renewals, lease checks fail and you get election flaps. Check chronyc tracking on both nodes before opening a bug.
  • reserve_coordinator_shard isolates shard 0 for heartbeat/schema work on dense-core boxes. Useful if you see one shard hot from coordination traffic alone.