sentinel

A personal control plane built solo from two honest engineering drivers, deployed as the production target across my home.

Why this exists

Two real problems drove sentinel:

Household resilience. My Home Assistant deployment kept failing on commodity hardware, and my family had stopped finding that funny. Smart-home downtime is a quiet credibility problem; the system should be a quietly reliable utility, not a hobby that breaks Friday night.
Hands-on LLM substrate. I wanted an instantly-accessible agent I controlled end-to-end, not to consume an LLM product, but to figure out from the inside what production agentic systems actually require. The fastest way to understand a technology is to operate it, on real workloads, with skin in the game.

One system addressed both, because the substrate they needed overlapped: a resilient distributed-systems control plane with an agentic platform on top. The household became the production deployment; the agent became the daily research vehicle.

What I built

sentinel is a polyglot platform spread across five repositories: Rust core (protocol adapters, registry, web server, agent, scripting), a Rust plugin host with a polyglot plugin ecosystem, a native SwiftUI iOS/macOS app, ESP32-P4 C/ESP-IDF firmware for the picket edge devices, and git-managed Rhai automation apps, plus a Go relay that gives the home cluster a public face without opening a single inbound port. Two-node HA on commodity hardware; multi-provider AI agent; voice interface; the works.

It’s structured around three engineering pillars.

Pillar 1: Resilience-first distributed-systems substrate

The first job is don’t break.

NATS as the nervous system. Config, entity state, and coordination live in NATS KV. NATS JetStream provides the underlying KV consensus; sentinel doesn’t re-implement consensus on top of it. Each single-writer subsystem (agent, scripts, notifications, adapters…) holds a per-subsystem leader lock on that KV (60s TTL, 15s heartbeat, 60s grace period), giving the app layer fencing and auto-failover without another consensus protocol. Two nodes are sufficient: both serve full read/write UI, and a brief node blip resumes without disruption.
Location-independence as a property of the bus. Because every component speaks NATS (adapters, plugins, the relay, the picket firmware, the apps) the deployment topology is whatever fits each component’s constraints, not a single monolithic shape. The household runs on six different deployment substrates simultaneously: sentinel core as a two-node k8s deployment; ble-proxy on a pair of bare-metal Pi Zero 2 W’s near the radios they need to hear; the Home Assistant Python integrations as a k8s pod under the plugin host’s host-ha-base image; the HomeKit integration on a pair of Proxmox VMs sitting directly on the household LAN, because HomeKit needs mDNS discovery that k8s pod networking doesn’t cleanly allow, so the component lives where its protocol requires, with pairing and accessory state held in NATS KV so the pair runs as a true warm-standby HA (either VM can take over because the durable state lives on the bus, not on the host); the picket ESP32-P4 firmware on bare-metal embedded edge devices speaking WebSocket / MQTT into NATS at the core’s edge; and sentinel-relay on a remote VPS as an outbound-only leaf node. And because NATS federates (leaf nodes, superclusters, cross-account boundaries) the topology isn’t bounded by “the same network”; it’s bounded by “anywhere with a network path.” A future component on a friend’s home server, a colo edge box, or a remote cabin is the same deployment problem as a component in the kitchen: connect to NATS, declare your subjects, become a citizen of the system. The substrate’s reach is the bus’s reach.
Multi-protocol entity model. The core natively terminates and bridges several external protocols: WebSocket and MQTT for picket and other devices, Matter via matter.js’s JSON protocol, the Home Assistant WebSocket API for HA-ecosystem reach, and NATS for internal coordination, and normalizes all of them into a unified entity model. Automations, agent tools, and history don’t see which wire protocol the signal arrived on.
Public face without inbound exposure (sentinel-relay), defense in depth. A Go binary runs on a VPS with a static IP, hosts its own NATS server, and the home NATS connects outbound to it as a leaf node. The home network accepts zero inbound connections. External clients talk HTTPS to the relay; the relay bridges HTTPS to NATS; the leaf link carries the traffic home. Three independent layers limit what a compromised relay can do: (1) network, no inbound holes; (2) NATS account boundary, the relay runs in a separate account, with cross-account import/export rules controlling exactly which subjects can cross; (3) JWT privileges, narrow per-credential subject permissions inside that account. Even a fully compromised relay can only publish on the specific subjects it’s been explicitly granted. The relay also runs Let’s Encrypt and pushes new certs down the leaf link to the core, solving the “how do I get ACME certs onto a NATted home server” problem in the same substrate.
NATS auth is a pain; sentinel owns it so you don’t. NATS accounts, JWT signing, signing-key hierarchies, cross-account import/export rules, per-credential subject permissions: powerful, and a genuine learning curve. Sentinel’s setup generates the operator, accounts, signing keys, and scoped credentials, pushes account JWTs to the resolver, and wires the relay’s cross-account boundary, all from one command. The capability model (separate accounts, narrow JWT subject permissions) is doing real security work underneath, but a user deploying sentinel never has to author a JWT or reason about resolver preload to get it. The hard part is owned by the platform, not delegated to the operator.
Stable entity identity. Entities are UUIDv7 logical identities decoupled from devices and protocols. Bindings link an entity to one or more protocols with priority-based command routing, so history and automations survive hardware swaps, vendor migrations, and protocol changes.
Turn cancellation without deadlock. The agent holds a session lock for an entire conversation turn, so the cancel token and pending device-call map are lifted outside the lock, letting /stop abort the current generation lock-free, mid-tool-call.
Multi-stage synthetic-entity inference. An isolated runtime runs a multi-stage anomaly and occupancy inference pipeline whose outputs are real entities flowing through the same pipeline as hardware sensors: the same automation, history, and agent surface treats inferred state and observed state uniformly. Currently accumulating real-world signal so the inference results can be evaluated against months of actual household behavior, not synthetic tests.
Graceful degradation. A subsystem failure (one adapter, one inference runtime, even one whole node) leaves the rest of the system running. The family doesn’t notice when a Zigbee coordinator restarts.

Pillar 2: Agentic platform with a composition substrate

The second job is be a useful research vehicle for production LLM integration, including the agent-safety story production integration has to answer. After three months of building inside that question, the answer isn’t “more tools” or “better prompting”; it’s substrate the agent composes against, not just tools the agent calls. Most agent platforms answer the safety half with policy (allow-lists, argument parsing, sandboxed shells) enforced around the model. Sentinel answers it structurally: a typed composition language where capability is bound by construction, not by promises. Composing verbs into a pipeline is a live, unprivileged act, same trust posture as running a shell command. Growing the verb catalog is a deliberate one, mediated by code review. The intelligence the model spends picking tools in the policy-enforced model is, here, redirected into framing the question in the language of small typed verbs.

The model is Unix pipes, extended for the AI-contributor case. Verbs are small sharp tools; a Flow pipeline composes them; the substrate handles streaming, typing, error propagation, and capability binding so the verb author can stay focused on one thing done well. Where Unix pipes carried untyped bytes, Flow carries typed data with structured error metadata. Where Unix pipes were ephemeral, Flow runs are addressable: the agent can name a result and re-compose against it later. Where Unix tools acquired capability from their UID, Flow verbs declare it at registration and the substrate enforces it. Same composability ethos, harder guarantees, designed for an author that can’t read the manpage in the same way a human can and so needs the substrate to make structural promises in code.

Concretely, the composition substrate is four properties:

Flow as the universal type. Every verb is Flow → Flow. The pipeline carries out (the data channel, materialized array or StreamHandle over bigger-than-memory sources), status, err, and structured metadata. Composition is just method chaining; error propagation is automatic via a uniform fatal-Flow envelope, with did-you-mean enrichment on every miss.
Streaming vs materialized as a substrate property. The data plane streams by default and materializes when it fits; the author never picks a mode. Streaming-preserving verbs (map, filter, count, online stats via Welford) drain in O(1) memory; blocking verbs (sort_by, median) fatal on a stream with a helpful “reduce it first or window the query” message rather than silently exploding. The fail-loud-and-useful contract holds across the entire verb catalog.
Addressable runs. Every script execution persists with a UUIDv7 run_id. The agent can re-enter a prior result with run("<id>") and branch, re-compose, or join against it: no inflation of large results back into agent context. Persistence is owner-checked and in-substrate.
Composition is live; growing the verb catalog is a PR. Any registered verb can be chained against any other in a Rhai script, instantly, from chat or any other client: no review, no approval, no redeploy. Scripts are first-class CRUDable artifacts (save, call, soft-delete, re-enter via run_id), and the agent composes them live against real data at full fidelity. Adding a new verb to the catalog is the deliberate step that goes through human review: new verbs live in a separate crate (sentinel-verbs-wrapped), registered at build time via Builder::with_wrappers(...), opened as a pull request like any other contribution. The wrapping is ordered by strength of capability claim. Plan A: wrap the native Rust crate, bound by the type system (regex instead of ripgrep, jaq instead of jq). Plan B: a single Sentinel-compiled binary that imports tool libraries and exposes only the subcommands wired up, bound at compile time (the client-go / aws-sdk-go / go-gh pattern: one binary, many tools’ worth of reach, all pre-pruned). Plan C: a sandboxed third-party binary, bound at the flag / sandbox layer (Landlock + seccomp when needed). The deliberate bias is toward A and B precisely so the heavy Plan-C machinery is rarely owed; A and B make capability claims true by construction, C makes them true by attestation. Authorship of new verbs can come from anywhere, a human contributor or a cheap model spawned for the task, because the structural properties are enforced by the substrate (macro-required signatures, adapter ?-propagation, generated tests, deny-lints, Dockerfile validator) and the reviewer audits judgment. The agent gains capability by the substrate gaining capability, never by the agent gaining trust at runtime.

The agentic surface the substrate supports:

Multi-provider AI agent. Anthropic Claude, local Ollama, OpenAI-compatible endpoints, all behind a single agent surface. Provider selection is config- driven and per-conversation; the same conversation can route different tool calls to different providers.
Local LLM infrastructure. The pi-controller daemon fans out child-process event streams to multiple clients; combined with mlx-lm and ollama, sentinel runs a real local-model fleet for both routine routing and offline operation when frontier providers are unavailable.
Unified tool plane. Three registries merge at session start: built-in tools (entity query/control, notifications, timers, web search, weather, solar), plugin tools announced over NATS with streaming results, and MCP-server tools (HTTP/SSE/stdio). All normalized to one schema the model sees, with capability approval and per-conversation routing.
Voice interface. Wake-word detection, streaming STT (Whisper), low- latency TTS (Kokoro), and a conversation-mode state machine that handles hands-free continuation, push-to-talk, and clean session boundaries. Tokens and audio stream over WebSocket; the audio path is OggOpus, tuned for the ~800ms-round-trip latency budget that makes voice feel responsive rather than broken.
Mobile app protocol surface. The SwiftUI app talks to the core (via the relay) over three protocols matched to traffic shape: SSE for sensor / entity-state streaming, WebSocket for voice and agent streaming, and HTTPS for everything else. Each protocol is the right choice for its traffic: SSE for one-way state push, WS for bidirectional streaming, HTTPS for request/response. The app also feeds home/away presence detection back to the core.
Prompt caching and context compaction. Live token-budget accounting, with prompt-cache-aware prompt construction and progressive context compaction against a budget, so multi-hour conversations don’t either overflow or quietly drop important context.
Persistent memory with an async review loop. sentinel integrates the third-party memory-engine for semantic store/retrieve, and I built the memory/context handling and async review loop on top that populates memory over time: extracting durable facts from conversation, organizing them in a hierarchical tree, and surfacing them on the next session without explicit recall. The agent accumulates knowledge across conversations instead of starting cold every session.

Pillar 3: A polyglot plugin host with a Home Assistant bridge

The third job is don’t throw away the ecosystem, and more generally, make it easy to integrate anything.

sentinel-plugins is a Rust-based plugin host that supervises plugin child processes over a NATS-based wire protocol, with language-specific Docker base images so plugins can be written in whatever language fits the integration: native Rust, Go, or unmodified Python Home Assistant integrations running under the host-ha-base image. The host handles plugin lifecycle, capability approval, NATS subject routing, audio and LLM-provider bridging, and gRPC translation, so a plugin author only has to implement the integration logic, not the substrate.

The deployed plugin ecosystem covers a deliberately diverse set of real home-and-power infrastructure: ble-proxy (BlueZ to NATS bridge for live BLE scanning + GATT), bthome (Bluetooth sensors), esphome (ESP-based devices), homekit (Apple HomeKit), acinfinity (climate control), apsystems (solar microinverters), victron (off-grid power), bambu (3D printers), pitboss (grills), s30 (Lennox HVAC), ollama + openai (LLM providers), and the picket + node internals. Every one of them speaks the same plugin protocol to the host.

The win:

Instant access to HA’s integration library (the long tail of community-built devices and protocols) through the Python plugin path, without forking, vendoring, or rewriting.
Hardware continuity: devices that worked under HA keep working under sentinel, no replacement required.
Best-tool-for-the-job per plugin: performance-sensitive plugins like ble-proxy are native Rust; LLM-provider plugins are Rust; HA integrations run as Python. The substrate doesn’t care.
Clean upgrade path: when an integration matures, it can be reimplemented natively in Rust or Go and swapped in with no user-visible change.

This is the kind of “build versus leverage” trade-off that distinguishes working infrastructure from research-lab toys: the right answer was usually neither “rewrite everything” nor “live with the old platform’s limits,” but “build a clean integration substrate that lets both kinds of code be first-class citizens of it.”

Hard problems worth naming

Two-node HA via per-subsystem leader locks on NATS KV: leaning on JetStream’s own consensus for the KV layer rather than re-implementing a Raft cluster at the app level, with TTL + grace-period failover that gives the single-writer subsystems clean fencing and automatic recovery from brief node blips.
Cross-language, multi-protocol interop at low latency: Python to Rust for the HA plugins, ESP-IDF C to WebSocket / MQTT for picket (the core terminates and bridges into NATS), Matter via matter.js JSON, Swift to Rust for the iOS app over SSE / WS / HTTPS, all staying responsive enough to feel like one system.
Public face without inbound exposure, bounded blast radius if the relay is compromised: leaf-node NATS topology so the home network never accepts an inbound connection, plus a separate NATS account for the relay with cross-account rules and JWT privileges that narrow what a compromised relay can do. Three independent layers: network, account, credential. Cert renewal flows back down the same leaf link.
Voice latency budget: keeping STT → LLM → TTS → playback under ~800ms round-trip across multiple providers and network paths, including local- model fallback.
Memory that compounds: populating long-term memory automatically without polluting it with conversational noise; surfacing the right context on the right next turn.
Substrate plumbing that doesn’t leak: e.g. ble-proxy correctly coordinates with the core on active/passive scan transitions to avoid the BlueZ-known issue where active scanning during heavy GATT load stalls both; surfaces synthetic handles (the only ones BlueZ exposes) as opaque tokens with documented semantics; survives USB dongle yanks and BlueZ daemon crashes via supervisor restart with exponential backoff.

The substrate at work

Four concrete deployments that exercise the full stack end to end:

3D-print failure detection, two-stage LLM cascade. A Rhai app (bambu_print_watcher) tails a Bambu printer’s chamber camera once per cadence interval. Each frame goes through a two-stage model pipeline: a local Ollama VLM as the cheap gate (“does this look like a failure? YES/NO + one-line reason”), and only on a flagged gate response does the app escalate to an Anthropic verdict model (claude-haiku-4-5 by default) for a structured JSON judgement (failed, failure_type, confidence, reasoning). After N consecutive failed verdicts the printer is auto-paused, the failure frame is stored and attached to a critical notification with Resume / Cancel / Stay paused actions. The operational details are the interesting part: an explicit max_anthropic_calls_per_print budget (capped at 50 by default), a cooldown threshold multiplier that raises the trip-count after a user-resumed false alarm, a rolling window of recent gate captions injected into the gate prompt as cheap temporal grounding (text only, no extra image cost), a defensive printer-state re-read on every tick to self-disarm if the printer was powered off mid-print, and a “watcher blind” notification after 5 consecutive snapshot failures.
Smoker, with a derivative-based water-pan alarm. A pair of Raspberry Pi Zero 2 W’s running ble-proxy bridge a Pitboss smoker’s BLE protocol into NATS. A Rhai app monitors the smoker’s temperature probes, notifies me when probes hit configured set points, and (the part I’m most pleased with) runs a derivative calculation on chamber temperature so it can tell me to refill the water pan when the temperature starts rising too quickly. Physics-based predictive maintenance built on top of the same substrate that handles the rest of the household.
Chicken coop, sun-elevation light + adaptive ventilation + water-level inference from heater cycles. The chicken_coop app drives supplemental lighting based on solar elevation with a dark-hours fade boundary, ensures laying hens get their minimum daylight hours in winter via supplemental brightness, and runs a humidity-differential ventilation loop with effectiveness tracking and exponential cooldown: if running the fan isn’t actually dropping indoor humidity meaningfully within the effectiveness window, the loop decides the differential isn’t useful right now (cold outdoor air can’t hold much moisture either) and cools down with a doubling backoff up to a cap. Adaptive control, not on/off rules. A separate water-level inference script estimates how much water is left in the heated bucket from heater cycle period alone: shorter cycles mean less thermal mass means less water, with an ambient-temperature correction to back out heat-loss variation and auto-recalibration on refill detection (and the first cold-water cycle is skipped because it’s an outlier). Same family of physics-based state estimation as the smoker derivative, different domain, different observable. Currently winter-only because the inference depends on the heater actually cycling; the summer port (pulse the heater once a day if it hasn’t fired naturally) is TODO.
Live investigation against the substrate (the composition pillar in action). A new supplemental light was added to the chicken-coop cycle, light.chicks_tent, for a brooder tent with new chicks. Open question: is it actually participating in the cycle, or is it drifting independently? Two minutes of agent-authored Rhai pipelines against the production TimescaleDB answered it: schema discovery on entity_data → both lights’ brightness sample counts and time ranges in the last 24 hours → side-by- side time-series of the morning ramp (the two lights stepped in lockstep, 210 → 254, ~21 seconds apart) → 7-day check for whether brightness ever hits zero (it doesn’t, bottoms out at 1, lights go to state=off instead) → state cross-reference confirming both lights flip on/off within seconds of each other across multiple cycles. Six composed queries, ~30-225ms each, defensible answer with the actual evidence attached. This is the difference between an agent that picks tools from a fixed toolbox and an agent composing against substrate: no tool author had to anticipate “are these two lights in lockstep?”; the question was asked directly, in the language the data lives in.

Why a pair of Pi Zero 2 W’s? Because Raspberry Pi’s are legitimately unreliable in the long run: SD card corruption, USB BLE-adapter resets, thermal stress, the works. Single-Pi listening posts in a humid bathroom or a smoky backyard will drop out. The substrate accommodates that explicitly: both proxies announce the same advertisements, the core deduplicates, and whichever Pi is healthy at the moment carries the traffic. The Rhai apps are pure domain logic against the unified entity model; they never see, and never need to see, which Pi is currently talking.

Tech & scale

Five repositories, hand-written code only:

sentinel core: Rust, ~278k LOC, 3,406 commits
sentinel-plugins: Rust plugin host + polyglot plugin ecosystem (Rust/Go/Python), ~97k LOC Go + Rust, 867 commits
sentinel-app: SwiftUI iOS/macOS, ~36k LOC, 448 commits
sentinel-picket: ESP32-P4 C/ESP-IDF firmware, ~41k LOC, 362 commits
sentinel-apps: Rhai automation apps hosted by the core’s scripting runtime, ~5.9k LOC, 260 commits

~5,300 commits across all repos, solo, in roughly three months. Plus a Svelte web UI, the Go sentinel-relay for external HTTPS / leaf-node NATS, and unmodified Python Home Assistant integrations running inside the plugin host.

Substrate: NATS as the internal coordination layer (with leaf-node extension out to the relay and a separate account / JWT-privileges boundary around it); HTTPS / SSE / WebSocket as external client-facing protocols; WebSocket / MQTT / Matter (via matter.js) / HA WebSocket API as device-protocol surfaces; MCP as the agent-tool protocol; pgvector / TimescaleDB for history; Whisper / Kokoro for the voice path; BlueZ / D-Bus for the BLE path; Let’s Encrypt for cert provisioning at the relay.

Closing observation

After three months of building this, the agent built on this substrate is competent without supervision in ways that are hard to fake: knowing context, using tools, navigating long-running conversations, integrating information across sessions, composing live investigations against real data. That competence is the answer to the original question (“what does production LLM integration actually require?”). It requires substrate, and the architectural commitment that holds the whole thing together is this: the agent gains capability by the substrate gaining capability, never by the agent gaining trust at runtime. Composition is free, unprivileged, and live; growing the verb catalog is a deliberate, human-reviewed PR. That asymmetry is what makes the agent useful and safe at the same time, and it’s the property I haven’t found in the other agent platforms I’ve worked with. The household side benefits too: the family stopped noticing when things break, which is the real metric for any home-automation system.

Links

Private repository, access available on request.