Abstract
This blog post presents the architecture and design principles behind a secure IoT gateway written in Rust. The system connects a STM32-based edge devices to a cloud analytics using MQTT, SQLite WAL persistence, OpenTelemetry traces, and Prometheus metrics. It explores how reliability, observability, and security can be built in from day one - with insights drawn from real-world projects.
Introduction
Building IoT systems that scale securely and reliably is one of the hardest engineering challenges. I’ve seen companies lose critical telemetry, face data gaps, or struggle with software updates because reliability and observability were added too late.
As part of my work at Combotto.io, I set out to build a Rust-based IoT Gateway that demonstrates how to connect STM32 edge devices to the cloud with built-in reliability, observability, and security — the same architectural principles I apply when auditing and hardening production IoT infrastructures for clients.
This post walks through the system design, from edge to cloud, showing concrete implementation strategies, architectural viewpoints, and lessons from real deployments.
Background - The Design Philosophy
Inspired by the concept of architectural perspectives from Rozanski and Woods [1] - such as security, performance & scalability, availability and resilience perspective - I approach IoT gateway design with a similar mindset. To achieve production-grade quality and meet customer expectations, I emphasize three guiding perspectives:
- Reliability: Every message must survive network failures (WAL + retry logic).
- Observability: Traces and metrics should explain what the system is doing.
- Security: From device authentication to encrypted telemetry and hardening.
These perspectives shape how I structure IoT systems end-to-end - from embedded devices at the edge, through the gateway layer, to the analytics and monitoring systems in the cloud.
Architecture Overview — From Edge to Cloud
To give an overview of the architecture for the Rust IoT Gateway (edge-to-cloud), this section describes the context & scope and the functional elements with their responsibilities and interfaces.
Context & Scope
To better understand context and scope for the Gateway, a Context Viewpoint Diagram is used to show system relationships and interactions with customers, when designing for reliability, observability and security. The diagram shows an edge-to-cloud setup with STM32 Sensor at the edge communicating with an Edge MQTT Broker. The Edge MQTT Broker communicates with the Rust Gateway through a MQTT Client in the Gateway that subscribes to MQTT topic where STM32 Sensor have published it's data. MQTT Client publishes it's data to a write-ahead log (WAL Queue) where data are persistent and then sent downstream to the cloud.
The Gateway is programmed in Rust and consists of a single gateway instance running in Docker on an edge host (x86/ARM). The gateway includes device-facing ingest, a durable queue, processing, publishing to cloud, and observability.
External services / actors
The system consists of multiple external services and actors that interact with the system.
- Devices (STM32-based sensors): publish telemetry via MQTT and receive control messages.
- Cloud Backend Services:
- Analytics Service (HTTP/gRPC ingestion endpoint)
- S3 (object storage for batch/offload)
- Prometheus (scrapes metrics)
- Tempo/OTLP collector (receives traces)
- Grafana (visualizes metrics/traces)
- Operator/Admin: interacts with the gateway’s Admin API & logs.
These external services / actors are out of scope. I won't go into details of downstream analytics internals and device firmware internals.
Functional Elements & Responsibilities
The functional elements are deployable Rust services/crates within one containerized gateway binary, organized as modules.
A functional viewpoint diagram was created to visually describe the functional elements in more detail with interfaces and responsibilities.
Here are a list of the functional elements for the Rust IoT Gateway - Ingest Plane.
MQTT Telemetry Client
Maintains a resilient MQTT session to ingest device telemetry and handle control messages.
- Connect to the edge MQTT broker (local or embedded) with configured Client ID and credentials; maintain session & keep-alive
- Subscribe to telemetry topics and control topics (I1A & I6).
- Parse telemetry payload (JSON/Protobuf), validate basic shape, and hand off to the Ingest Router.
- Apply QoS=1 delivery semantics with automatic reconnection and exponential backoff.
Ingest Router
Normalize inbound telemetry and enrich it with gateway metadata before durable persistence.
- Validate/normalize telemetry payload (schema version, size limits, required fields).
- Enrich with gateway_id, receive_ts, correlation_id.
- Produce a canonical record and append to the WAL Queue.
HTTP Server
Provide a simple, broker-independent ingestion path for devices/services that can't speak MQTT.
- POST /ingest/{device_id} accepting JSON payloads; authenticate/authorize request (I1b).
- Reuse the same validation/enrichment pipeline as MQTT (Ingest Router).
- Append accepted records to the WAL and return success with a correlation id.
Config & Secrets
The gateway uses a layered configuration system where defaults can be overriden by files, environment variables and command-line flags. This approach provides secure secret handling and gives flexibility across development
- Load config via env/files → strongly typed config.
- Cert/keys management (I4).
Admin & Health API
Provide operational insight and lifecycle control to the gateway through a lightweight HTTP interface.
- Expose standardized endpoints for liveness
(/healthz), readiness(/readyz), metrics(/metrics), and version(/version)(I2). - Integrate with Prometheus and external monitoring tools for automated scraping and alerting.
- Allow operators and CI/CD pipelines to verify state before deployment or restart.
Control Plane
To control devices and MQTT at the edge a control plane is setup to reliably control routing and enforce security with policies.
- Handle inbound commands from MQTT topics (configuration updates, test pings, etc.).
- Persist/apply configuration; publish control messages to devices.
Telemetry (Observability)
Telemetry is the process of collecting output data from the system such as log, traces and metrics, that can be used for observability, which is about understanding the internal state of the system by analyzing it's telemetry output data. To enable observability, the gateway uses OpenTelemetry to instrument the code to emit telemetry traces with small single units of work called spans. Prometheus is used to collect and query metrics. Structured JSON logging are also setup across the code to gain further insight.
- Tracing: create spans for ingest, WAL append, publish, retry; export via OTLP to Tempo/collector.
- Metrics: Prometheus
/metricsendpoint (ingest rate, queue depth, publish success %, latency, CPU/mem IO). - Logging: structured JSON logs.
WAL Queue (Durable Buffer)
Reliability is built around the principle that every message must survive a network failure. To guarantee at-least-once delivery, the gateway uses an append-only SQLite write-ahead log (WAL). This allows crash-safe recovery and idempotent retries while keeping the footprint small for edge environments.
- Append-only write-ahead log (SQLite WAL) for at-least-once delivery.
- Index messages by status: Enqueued → InFlight → Acked/Dead.
- Compaction & retention (size/time).
- Backpressure thresholds.
- Crash-safe recovery.
- Idempotency keys.
Dispatcher
A dispatcher abstraction is setup in the gateway to coordinate ingest of data from the WAL Queue in a reliable and automated way. The dispatcher fetches a batch of N events from the WAL Queue and marks InFlight events to check for pending events and acknowledges events that have been published. Events that fails to get published gets scheduled for retry later.
- run_dispatcher: fetch N, mark InFlight.
- Calls publish(); schedules retries.
- Ack WAL on success.
Fanout
A FansoutSink
- FanoutSink
broadcasts to sinks. - Ensures per-sink error isolation.
- List of sinks: S3 Sink, MQTT Sink, Kafka Sink
Recovery Tools
Dashboards and recovery tools are created to allow operators to fix on-call scenarios when operating the Gateway in production.
- Requeue or export dead-lettered messages; support replay.
- Diagnostics for failed publishes and WAL inconsistencies.
Interfaces
Here are a list of the interfaces for the Rust IoT Gateway - Ingest Plane.
I1a. MQTT Telemetry Ingest
- Protocol: MQTT
- Topics: Subscribes:
devices/{device_id}/telemetry - QoS: 1 (at-least-once). Retain = false.
- Payload: JSON/Protobuf (configurable)
- Errors: invalid schema
I1b. HTTP JSON Ingest
- Protocol: HTTP
- Endpoint: POST
/ingest/{device_id} - Payload: JSON
- Errors: invalid schema
I2. Admin HTTP API
GET /healthz(liveness) → 200/500GET /readyz(readiness: MQTT + WAL + Publisher)GET /metrics(Prometheus export)GET /version(Gateway version)
I3. Observability Export
- Tracing: OTLP/HTTP to collector → Tempo
- Metrics: Prometheus scrape of
/metrics - Logs: stdout
I4. Config & Secrets
- Inputs: env vars
- Contract: typed schema with defaults.
I5. Local persistence (Local WAL)
- DB: SQLite in WAL mode
- Durability: fsync on append; periodic checkpoint.
I6. Control topics
- device commands via MQTT
Deployment Viewpoint — Rust IoT Gateway (Edge → Cloud)
Designing IoT system that run reliably in the field requires more than functional architecture. The deployment viewpoint describes where each functional element runs, how components communicates, what infrastructure is required, and how security, reliability, and observability is enforced at runtime.
This section explains the operational environment for the Rust IoT Gateway - from STM32 devices at the edge, to the MQTT broker and containerized gateway process, through to cloud-based observability and analytics services.
Runtime Platform & Deployment Topology
At each site, STM32 devices published telemetry over MQTT to and edge MQTT broker running on a Docker host. A Rust gateway container consumes messages, persistent them to local SQLite WAL, and forwards them securely over TLS to cloud analytics and observability services (Prometheus, Grafana, Tempo, Analytics, S3).
The table below shows a list of components mapped from the functional elements of the gateway to describe containers, storage, ports with notes to better understand the runtime platform.
| Component | Container | Storage | Ports | Notes |
|---|---|---|---|---|
| STM32 devices (B-L475E-IOT01A1) | Bare-metal firmware (no container) | On-board flash (firmware + config) | MQTT over TLS via Wi-Fi (client only) | Publishes signed telemetry to edge MQTT broker using mTLS client cert. |
| Edge MQTT Broker | eclipse-mosquitto:2 | /mosquitto (config, certs) | 8883/tcp | mTLS; topic ACLs for devices and gateway. |
| Rust Gateway container | combotto/gateway:TAG | /data (SQLite WAL) | 8080/tcp | /metrics, /healthz, /readyz, /version; MQTT + HTTP ingest |
| Prometheus | Managed service or prom/prometheus (Docker) | Local disk / managed storage | 443/tcp (if managed) or 9090/tcp | Scrapes gateway /metrics; stores time-series metrics. |
| Grafana | Managed service or grafana/grafana (Docker) | Local disk / managed storage | 443/tcp (if managed) or 3000/tcp | Dashboard for Prometheus metrics and Tempo Traces. |
| Tempo | Managed service or grafana/tempo (Docker) | Object storage (S3/MinIO) or local disk | 443/tcp (managed) or 3200/tcp | Distributed trace storage backend; storages and serves traces queried by Grafana. |
| OTLP Collector | Managed service or otel/opentelemetry-collector (Docker) | Stateless (no long-term storage) | 4317/4318/tcp, 443/tcp behind ingress | Receives OTLP telemetry, batches/processes it, and forwards traces to tempo. |
| Analytics API | Backend service (Kubernetes / VM / DOcker) | Application DB / data lake | 443/tcp | Received telemetry from gateway over HTTPS (idempotent ingest). |
| S3 / Object Storage | Managed object storage (S3-compatible) | Bucket per environment | 443/tcp | Optional batch/offload sink for archival telemetry. |
Runtime Platform Model Diagram
The runtime platform model describes the concrete hardware and execution environments that hosts the functional elements of the gateway. The model is organized into Edge, Gateway Host, and Cloud nodes, each with their own processing nodes, execution environments, and deployed components/artifacts.
Edge Site Nodes
| Processing node | Execution environment | Deployed artifacts | Notes / sizing |
|---|---|---|---|
| STM32 devices (B-L475E-IOT01A1) | STM32L475VG MCU, Arm® Cortex®-M4 @ 80 MHz, 128 KB SRAM, 1 MB Flash. Bare-metal or RTOS-based firmware. On-board peripherals: ISM43362 Wi-Fi (802.11 b/g/n), BLE 4.1, Sub-GHz RF, NFC tag, IMU, magnetometer, barometer, humidity/temperature, ToF sensor, microphones. | Telemetry + control firmware image: samples sensor data, publishes MQTT over TLS via the ISM43362 Wi-Fi module, subscribes to control topics for configuration/commands. | Ultra-low-power device with multiple radios and sensors on board. The 128 KB RAM and 1 MB Flash constrain firmware size and buffering; MQTT payload size and local queuing must be kept small. OTA update and power management are handles in device firmware (out of scope for the gateway). |
| Edge MQTT Host (industrial PC or SBC) | Linux (e.g. Ubuntu Server 22.04) on ARMv8 or x86-64, 2 vCPU, 2-4 GB RAM, Docker Engine ≥ 24. | eclipse-mosquitto:2 container, TLS CA + server certs, per-device client certs, ACL configration under /mosquitto. | Typically co-located with the Rust Gateway on the same edge LAN. Must provide stable Wi-Fi connectivity to the STM32 boards (via AP/router) and enough disk to store broker config/certs and logs. |
The choice of B-L475E-IOT01A1 (STM32L475VG @ 80 MHz, 128 KB RAM, 1 MB Flash) strongly influences the gateway architecture: device firmware cannot buffer large backlogs or run heavyweight crypto/protocol stacks, so reliability is achieved primarily by the edge MQTT broker + Rust Gateway WAL, not by deep queues on the device. This is a typical pattern in constrained IoT deployments and one of the key assumptions behind the ingest and durability design.
Gateway Host Node
| Processing node | Execution environment | Deployed artifacts | Notes / sizing |
|---|---|---|---|
| Rust Gateway Host (edge server / SBC) | Same Linux host as MQTT or separate industrial PC, 4 vCPU, 4-8 GB RAM, SSD storage SSD storage (≥ 50 GB) | combotto/gateway:TAG Docker container, config file/env vars, SQL DQ in WAL mode under /data | Hosts the gateway ingest plane: MQTT Client, HTTP ingest, WAL Queue, Dispatcher, Fanout sinks, Admin & Health API, Telemetry instrumentation. WAL sizing depends on expected outrage window and ingest rage (e.g. keep 24-72 h backlog). |
In small deployments, the Edge MQTT Host and Rust Gateway Host are the same physical machine with two containers. In larger deployments, they may be split across two processing nodes on the same edge LAN for isolation.
Cloud / Data Center Nodes
| Processing node | Execution environment | Deployed artifacts | Notes / sizing |
|---|---|---|---|
| Observability Node | Managed Grafana stack or Kubernetes cluster (e.g. 3x nodes, 2-4 vCPU, 8-16 GB RAM each). | Prometheus server, Tempo trace storage, OTLP Collector, Grafana Dashboard | Collect /metrics from gateway, receives OTLP spans, stores time-series + traces. High availability and retention tuned per customer SLOs. |
| Analytics API Node | Kubernetes Deployment or VM pool behind HTTPS load balancer, 2-8 vCPU, 8-32 GB RAM | analytics-api container exposing HTTP/gRPC ingest endpoint | Receives telemetry from the gateway over TLS. Ingest is idempotent so that WAL replay does not corrupt data. |
| Object Storage Service | Managed S3-compatible service | Buckets: telemetry-offload, telemetry-archive | Optional batch/offload sink for long-term storage, compliance or reprocessing. No CPU sizing; cost driven by throughput and retention. |
This model below ties each functional element from the earlier viewpoint to a concrete runtime location:
- MQTT Telemetry Client, Ingest Router, HTTP Server, WAL Queue, Dispatcher, Fanout, Admin & Health API, Config & Secrets, Telemetry (Obs) → run inside the Rust Gateway container on the Rust Gateway Host.
- Control Plane functions share the same container but interact with the Edge MQTT Broker and STM32 devices across the edge LAN.
- Observability export (traces, metrics, logs) targets the Observability Node in the cloud.
- Kafka/S3/MQTT sinks (if enabled) push to the Analytics API or Object Storage nodes.
Network Nodes & Communication Links
While the runtime platform describes where each component executes, the deployment viewpoint also need to capture how these components communicate from across the edge-cloud boundary. The network nodes and communication links, shown below illustrate the secure telemetry from STM32 devices to the edge MQTT broker, through the Rust Gateway, and onward to the cloud analytics and observability stack. This view highlights the protocols, ports, trust boundaries, and communication paths enforced at runtime.
From Prototype to Consulting Offering
Over the past months, this gateway evolved from a technical prototype into a foundation for my consulting work at Combotto.io — where I help teams design secure, reliable, and observable IoT infrastructures.
By transforming the prototype into a reusable reference architecture, I now use it as a blueprint for audits, hardening sprints, and reliability partnerships with clients who need production-grade edge-to-cloud systems.
If you’re building something similar and want an outside perspective, I offer:
- Audit: Review of your existing IoT dataflows and reliability setup.
- Hardening Sprint: Short engagement to improve observability and security.
- Reliability Partnership: Long-term consulting to scale your system safely.
Conclusion
Building IoT systems that are both reliable and secure requires more than just code — it’s about embedding the right architectural perspectives from day one.
This Rust IoT Gateway is my reference architecture for how to bridge embedded devices and cloud infrastructure safely. It’s open-ended by design, and I continue to evolve it as part of my consulting work at Combotto.io.
If you’re building connected devices and want to audit your data reliability, harden your telemetry paths, or scale with confidence, feel free to reach out — I’d be glad to share insights or help your team.
- Rozanski, Nick and Woods, Eoin. *Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives* (2nd Edition). Addison-Wesley, 2012.