Building a Secure Rust IoT Gateway: From Edge to Cloud

Published Oct 21, 202510 min read0 comments

Building a Secure Rust IoT Gateway: From Edge to Cloud

Abstract

This blog post presents the architecture and design principles behind a secure IoT gateway written in Rust. The system connects a STM32-based edge devices to a cloud analytics using MQTT, SQLite WAL persistence, OpenTelemetry traces, and Prometheus metrics. It explores how reliability, observability, and security can be built in from day one - with insights drawn from real-world projects.

Introduction

Building IoT systems that scale securely and reliably is one of the hardest engineering challenges. I’ve seen companies lose critical telemetry, face data gaps, or struggle with software updates because reliability and observability were added too late.

As part of my work at Combotto.io, I set out to build a Rust-based IoT Gateway that demonstrates how to connect STM32 edge devices to the cloud with built-in reliability, observability, and security — the same architectural principles I apply when auditing and hardening production IoT infrastructures for clients.

This post walks through the system design, from edge to cloud, showing concrete implementation strategies, architectural viewpoints, and lessons from real deployments.

Background - The Design Philosophy

Inspired by the concept of architectural perspectives from Rozanski and Woods [1] - such as security, performance & scalability, availability and resilience perspective - I approach IoT gateway design with a similar mindset. To achieve production-grade quality and meet customer expectations, I emphasize three guiding perspectives:

  • Reliability: Every message must survive network failures (WAL + retry logic).
  • Observability: Traces and metrics should explain what the system is doing.
  • Security: From device authentication to encrypted telemetry and hardening.

These perspectives shape how I structure IoT systems end-to-end - from embedded devices at the edge, through the gateway layer, to the analytics and monitoring systems in the cloud.

Architecture Overview — From Edge to Cloud

To give an overview of the architecture for the Rust IoT Gateway (edge-to-cloud), this section describes the context & scope and the functional elements with their responsibilities and interfaces.

Context & Scope

To better understand context and scope for the Gateway, a Context Viewpoint Diagram is used to show system relationships and interactions with customers, when designing for reliability, observability and security. The diagram shows an edge-to-cloud setup with STM32 Sensor at the edge communicating with an Edge MQTT Broker. The Edge MQTT Broker communicates with the Rust Gateway through a MQTT Client in the Gateway that subscribes to MQTT topic where STM32 Sensor have published it's data. MQTT Client publishes it's data to a write-ahead log (WAL Queue) where data are persistent and then sent downstream to the cloud.

IoT Edge to Gateway to Cloud Context Diagram
Context Viewpoint Diagram for Rust IoT Gateway showing a STM32 Sensor → Edge MQTT Broker → Rust Gateway → Cloud Backend → Customers.

The Gateway is programmed in Rust and consists of a single gateway instance running in Docker on an edge host (x86/ARM). The gateway includes device-facing ingest, a durable queue, processing, publishing to cloud, and observability.

External services / actors

The system consists of multiple external services and actors that interact with the system.

  • Devices (STM32-based sensors): publish telemetry via MQTT and receive control messages.
  • Cloud Backend Services:
    • Analytics Service (HTTP/gRPC ingestion endpoint)
    • S3 (object storage for batch/offload)
    • Prometheus (scrapes metrics)
    • Tempo/OTLP collector (receives traces)
    • Grafana (visualizes metrics/traces)
  • Operator/Admin: interacts with the gateway’s Admin API & logs.

These external services / actors are out of scope. I won't go into details of downstream analytics internals and device firmware internals.

Functional Elements & Responsibilities

The functional elements are deployable Rust services/crates within one containerized gateway binary, organized as modules.

A functional viewpoint diagram was created to visually describe the functional elements in more detail with interfaces and responsibilities.

IoT Edge Gateway Cloud Functional Diagram
Functional Viewpoint diagram showing data flow and key responsibilities for IoT Edge → Gateway → Cloud.

Here are a list of the functional elements for the Rust IoT Gateway - Ingest Plane.

MQTT Telemetry Client

Maintains a resilient MQTT session to ingest device telemetry and handle control messages.

  • Connect to the edge MQTT broker (local or embedded) with configured Client ID and credentials; maintain session & keep-alive
  • Subscribe to telemetry topics and control topics (I1A & I6).
  • Parse telemetry payload (JSON/Protobuf), validate basic shape, and hand off to the Ingest Router.
  • Apply QoS=1 delivery semantics with automatic reconnection and exponential backoff.

Ingest Router

Normalize inbound telemetry and enrich it with gateway metadata before durable persistence.

  • Validate/normalize telemetry payload (schema version, size limits, required fields).
  • Enrich with gateway_id, receive_ts, correlation_id.
  • Produce a canonical record and append to the WAL Queue.

HTTP Server

Provide a simple, broker-independent ingestion path for devices/services that can't speak MQTT.

  • POST /ingest/{device_id} accepting JSON payloads; authenticate/authorize request (I1b).
  • Reuse the same validation/enrichment pipeline as MQTT (Ingest Router).
  • Append accepted records to the WAL and return success with a correlation id.

Config & Secrets

The gateway uses a layered configuration system where defaults can be overriden by files, environment variables and command-line flags. This approach provides secure secret handling and gives flexibility across development

  • Load config via env/files → strongly typed config.
  • Cert/keys management (I4).

Admin & Health API

Provide operational insight and lifecycle control to the gateway through a lightweight HTTP interface.

  • Expose standardized endpoints for liveness (/healthz), readiness (/readyz), metrics (/metrics), and version (/version) (I2).
  • Integrate with Prometheus and external monitoring tools for automated scraping and alerting.
  • Allow operators and CI/CD pipelines to verify state before deployment or restart.

Control Plane

To control devices and MQTT at the edge a control plane is setup to reliably control routing and enforce security with policies.

  • Handle inbound commands from MQTT topics (configuration updates, test pings, etc.).
  • Persist/apply configuration; publish control messages to devices.

Telemetry (Observability)

Telemetry is the process of collecting output data from the system such as log, traces and metrics, that can be used for observability, which is about understanding the internal state of the system by analyzing it's telemetry output data. To enable observability, the gateway uses OpenTelemetry to instrument the code to emit telemetry traces with small single units of work called spans. Prometheus is used to collect and query metrics. Structured JSON logging are also setup across the code to gain further insight.

  • Tracing: create spans for ingest, WAL append, publish, retry; export via OTLP to Tempo/collector.
  • Metrics: Prometheus /metrics endpoint (ingest rate, queue depth, publish success %, latency, CPU/mem IO).
  • Logging: structured JSON logs.

WAL Queue (Durable Buffer)

Reliability is built around the principle that every message must survive a network failure. To guarantee at-least-once delivery, the gateway uses an append-only SQLite write-ahead log (WAL). This allows crash-safe recovery and idempotent retries while keeping the footprint small for edge environments.

  • Append-only write-ahead log (SQLite WAL) for at-least-once delivery.
  • Index messages by status: Enqueued → InFlight → Acked/Dead.
  • Compaction & retention (size/time).
  • Backpressure thresholds.
  • Crash-safe recovery.
  • Idempotency keys.

Dispatcher

A dispatcher abstraction is setup in the gateway to coordinate ingest of data from the WAL Queue in a reliable and automated way. The dispatcher fetches a batch of N events from the WAL Queue and marks InFlight events to check for pending events and acknowledges events that have been published. Events that fails to get published gets scheduled for retry later.

  • run_dispatcher: fetch N, mark InFlight.
  • Calls publish(); schedules retries.
  • Ack WAL on success.

Fanout

A FansoutSink is added to the dispatcher abstraction to allow broadcast to multiple downstream sinks. This enables data to flow reliably to multiple systems in the cloud and allows for per-sink error isolation if a sink process halts for an unknown reason. The gateway allows fanout to multiple sinks such as S3 Object Storage, MQTT or Kafka.

  • FanoutSink broadcasts to sinks.
  • Ensures per-sink error isolation.
  • List of sinks: S3 Sink, MQTT Sink, Kafka Sink

Recovery Tools

Dashboards and recovery tools are created to allow operators to fix on-call scenarios when operating the Gateway in production.

  • Requeue or export dead-lettered messages; support replay.
  • Diagnostics for failed publishes and WAL inconsistencies.

Interfaces

Here are a list of the interfaces for the Rust IoT Gateway - Ingest Plane.

I1a. MQTT Telemetry Ingest

  • Protocol: MQTT
  • Topics: Subscribes: devices/{device_id}/telemetry
  • QoS: 1 (at-least-once). Retain = false.
  • Payload: JSON/Protobuf (configurable)
  • Errors: invalid schema

I1b. HTTP JSON Ingest

  • Protocol: HTTP
  • Endpoint: POST /ingest/{device_id}
  • Payload: JSON
  • Errors: invalid schema

I2. Admin HTTP API

  • GET /healthz (liveness) → 200/500
  • GET /readyz (readiness: MQTT + WAL + Publisher)
  • GET /metrics (Prometheus export)
  • GET /version (Gateway version)

I3. Observability Export

  • Tracing: OTLP/HTTP to collector → Tempo
  • Metrics: Prometheus scrape of /metrics
  • Logs: stdout

I4. Config & Secrets

  • Inputs: env vars
  • Contract: typed schema with defaults.

I5. Local persistence (Local WAL)

  • DB: SQLite in WAL mode
  • Durability: fsync on append; periodic checkpoint.

I6. Control topics

  • device commands via MQTT

Deployment Viewpoint — Rust IoT Gateway (Edge → Cloud)

Designing IoT system that run reliably in the field requires more than functional architecture. The deployment viewpoint describes where each functional element runs, how components communicates, what infrastructure is required, and how security, reliability, and observability is enforced at runtime.

This section explains the operational environment for the Rust IoT Gateway - from STM32 devices at the edge, to the MQTT broker and containerized gateway process, through to cloud-based observability and analytics services.

Runtime Platform & Deployment Topology

At each site, STM32 devices published telemetry over MQTT to and edge MQTT broker running on a Docker host. A Rust gateway container consumes messages, persistent them to local SQLite WAL, and forwards them securely over TLS to cloud analytics and observability services (Prometheus, Grafana, Tempo, Analytics, S3).

The table below shows a list of components mapped from the functional elements of the gateway to describe containers, storage, ports with notes to better understand the runtime platform.

Component Container Storage Ports Notes
STM32 devices (B-L475E-IOT01A1) Bare-metal firmware (no container) On-board flash (firmware + config) MQTT over TLS via Wi-Fi (client only) Publishes signed telemetry to edge MQTT broker using mTLS client cert.
Edge MQTT Broker eclipse-mosquitto:2 /mosquitto (config, certs) 8883/tcp mTLS; topic ACLs for devices and gateway.
Rust Gateway container combotto/gateway:TAG /data (SQLite WAL) 8080/tcp /metrics, /healthz, /readyz, /version; MQTT + HTTP ingest
Prometheus Managed service or prom/prometheus (Docker) Local disk / managed storage 443/tcp (if managed) or 9090/tcp Scrapes gateway /metrics; stores time-series metrics.
Grafana Managed service or grafana/grafana (Docker) Local disk / managed storage 443/tcp (if managed) or 3000/tcp Dashboard for Prometheus metrics and Tempo Traces.
Tempo Managed service or grafana/tempo (Docker) Object storage (S3/MinIO) or local disk 443/tcp (managed) or 3200/tcp Distributed trace storage backend; storages and serves traces queried by Grafana.
OTLP Collector Managed service or otel/opentelemetry-collector (Docker) Stateless (no long-term storage) 4317/4318/tcp, 443/tcp behind ingress Receives OTLP telemetry, batches/processes it, and forwards traces to tempo.
Analytics API Backend service (Kubernetes / VM / DOcker) Application DB / data lake 443/tcp Received telemetry from gateway over HTTPS (idempotent ingest).
S3 / Object Storage Managed object storage (S3-compatible) Bucket per environment 443/tcp Optional batch/offload sink for archival telemetry.

Runtime Platform Model Diagram

The runtime platform model describes the concrete hardware and execution environments that hosts the functional elements of the gateway. The model is organized into Edge, Gateway Host, and Cloud nodes, each with their own processing nodes, execution environments, and deployed components/artifacts.

Edge Site Nodes
Processing node Execution environment Deployed artifacts Notes / sizing
STM32 devices (B-L475E-IOT01A1) STM32L475VG MCU, Arm® Cortex®-M4 @ 80 MHz, 128 KB SRAM, 1 MB Flash. Bare-metal or RTOS-based firmware. On-board peripherals: ISM43362 Wi-Fi (802.11 b/g/n), BLE 4.1, Sub-GHz RF, NFC tag, IMU, magnetometer, barometer, humidity/temperature, ToF sensor, microphones. Telemetry + control firmware image: samples sensor data, publishes MQTT over TLS via the ISM43362 Wi-Fi module, subscribes to control topics for configuration/commands. Ultra-low-power device with multiple radios and sensors on board. The 128 KB RAM and 1 MB Flash constrain firmware size and buffering; MQTT payload size and local queuing must be kept small. OTA update and power management are handles in device firmware (out of scope for the gateway).
Edge MQTT Host (industrial PC or SBC) Linux (e.g. Ubuntu Server 22.04) on ARMv8 or x86-64, 2 vCPU, 2-4 GB RAM, Docker Engine ≥ 24. eclipse-mosquitto:2 container, TLS CA + server certs, per-device client certs, ACL configration under /mosquitto. Typically co-located with the Rust Gateway on the same edge LAN. Must provide stable Wi-Fi connectivity to the STM32 boards (via AP/router) and enough disk to store broker config/certs and logs.

The choice of B-L475E-IOT01A1 (STM32L475VG @ 80 MHz, 128 KB RAM, 1 MB Flash) strongly influences the gateway architecture: device firmware cannot buffer large backlogs or run heavyweight crypto/protocol stacks, so reliability is achieved primarily by the edge MQTT broker + Rust Gateway WAL, not by deep queues on the device. This is a typical pattern in constrained IoT deployments and one of the key assumptions behind the ingest and durability design.

Gateway Host Node
Processing node Execution environment Deployed artifacts Notes / sizing
Rust Gateway Host (edge server / SBC) Same Linux host as MQTT or separate industrial PC, 4 vCPU, 4-8 GB RAM, SSD storage SSD storage (≥ 50 GB) combotto/gateway:TAG Docker container, config file/env vars, SQL DQ in WAL mode under /data Hosts the gateway ingest plane: MQTT Client, HTTP ingest, WAL Queue, Dispatcher, Fanout sinks, Admin & Health API, Telemetry instrumentation. WAL sizing depends on expected outrage window and ingest rage (e.g. keep 24-72 h backlog).

In small deployments, the Edge MQTT Host and Rust Gateway Host are the same physical machine with two containers. In larger deployments, they may be split across two processing nodes on the same edge LAN for isolation.


Cloud / Data Center Nodes
Processing node Execution environment Deployed artifacts Notes / sizing
Observability Node Managed Grafana stack or Kubernetes cluster (e.g. 3x nodes, 2-4 vCPU, 8-16 GB RAM each). Prometheus server, Tempo trace storage, OTLP Collector, Grafana Dashboard Collect /metrics from gateway, receives OTLP spans, stores time-series + traces. High availability and retention tuned per customer SLOs.
Analytics API Node Kubernetes Deployment or VM pool behind HTTPS load balancer, 2-8 vCPU, 8-32 GB RAM analytics-api container exposing HTTP/gRPC ingest endpoint Receives telemetry from the gateway over TLS. Ingest is idempotent so that WAL replay does not corrupt data.
Object Storage Service Managed S3-compatible service Buckets: telemetry-offload, telemetry-archive Optional batch/offload sink for long-term storage, compliance or reprocessing. No CPU sizing; cost driven by throughput and retention.

This model below ties each functional element from the earlier viewpoint to a concrete runtime location:

  • MQTT Telemetry Client, Ingest Router, HTTP Server, WAL Queue, Dispatcher, Fanout, Admin & Health API, Config & Secrets, Telemetry (Obs) → run inside the Rust Gateway container on the Rust Gateway Host.
  • Control Plane functions share the same container but interact with the Edge MQTT Broker and STM32 devices across the edge LAN.
  • Observability export (traces, metrics, logs) targets the Observability Node in the cloud.
  • Kafka/S3/MQTT sinks (if enabled) push to the Analytics API or Object Storage nodes.
Runtime Platform Model - Rust IoT Gateway (Edge to Cloud)
Shows the concrete hardware, execution environments, and deployed artifacts that host the Rust IoT Gateway from edge devices to cloud observability and analytics nodes.

Network Nodes & Communication Links

While the runtime platform describes where each component executes, the deployment viewpoint also need to capture how these components communicate from across the edge-cloud boundary. The network nodes and communication links, shown below illustrate the secure telemetry from STM32 devices to the edge MQTT broker, through the Rust Gateway, and onward to the cloud analytics and observability stack. This view highlights the protocols, ports, trust boundaries, and communication paths enforced at runtime.

From Prototype to Consulting Offering

Over the past months, this gateway evolved from a technical prototype into a foundation for my consulting work at Combotto.io — where I help teams design secure, reliable, and observable IoT infrastructures.

By transforming the prototype into a reusable reference architecture, I now use it as a blueprint for audits, hardening sprints, and reliability partnerships with clients who need production-grade edge-to-cloud systems.

If you’re building something similar and want an outside perspective, I offer:

  • Audit: Review of your existing IoT dataflows and reliability setup.
  • Hardening Sprint: Short engagement to improve observability and security.
  • Reliability Partnership: Long-term consulting to scale your system safely.

Conclusion

Building IoT systems that are both reliable and secure requires more than just code — it’s about embedding the right architectural perspectives from day one.

This Rust IoT Gateway is my reference architecture for how to bridge embedded devices and cloud infrastructure safely. It’s open-ended by design, and I continue to evolve it as part of my consulting work at Combotto.io.

If you’re building connected devices and want to audit your data reliability, harden your telemetry paths, or scale with confidence, feel free to reach out — I’d be glad to share insights or help your team.


  1. Rozanski, Nick and Woods, Eoin. *Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives* (2nd Edition). Addison-Wesley, 2012.