Building a Secure Rust IoT Gateway: From Edge to Cloud

Published Oct 21, 202510 min read0 comments

Building a Secure Rust IoT Gateway: From Edge to Cloud

Abstract

This blog post presents the architecture and design principles behind a secure IoT gateway written in Rust. The system connects a STM32-based edge devices to a cloud analytics using MQTT, SQLite WAL persistence, OpenTelemetry traces, and Prometheus metrics. It explores how reliability, observability, and security can be built in from day one - with insights drawn from real-world projects.

Introduction

IoT systems are easy to prototype — but scaling them securely and reliably is a different story. When devices go offline, messages get lost, or firmware updates fail, the cost can be massive. In this post, I’ll share how I built a secure Rust IoT Gateway that connects embedded devices (STM32) to the cloud — with reliability, observability, and security as first-class citizens. Along the way, I’ll also include anonymized examples and lessons learned from real-world projects.

Background - The Design Philosophy

Inspired by the concept of architectural perspectives from Rozanski and Woods [1] - such as security, performance & scalability, availability and resilience perspective - I approach IoT gateway design with a similar mindset. To achieve production-grade quality and meet customer expectations, I emphasize three guiding perspectives:

  • Reliability: Every message must survive network failures (WAL + retry logic).
  • Observability: Traces and metrics should explain what the system is doing.
  • Security: From device authentication to encrypted telemetry and hardening.

These perspectives shape how I structure IoT systems end-to-end - from embedded devices at the edge, through the gateway layer, to the analytics and monitoring systems in the cloud.

The following context diagram for the Rust IoT Gateway show system relationship and interaction with the customers when designing for achieving reliability, observability and security.

IoT Edge to Gateway to Cloud Context Diagram

Architecture Overview — From Edge to Cloud

To give an overview of the architecture for the Rust IoT Gateway (edge-to-cloud), this document describes the context & scope and the functional elements with their responsibilities and interfaces.

Context & Scope

The Rust IoT Gateway consists of a single gateway instance running in Docker on an edge host (x86/ARM). The gateway includes device-facing ingest, a durable queue, processing, publishing to cloud, and observability.

External services / actors

  • Devices (STM32-based sensors): publish telemetry via MQTT and receive control messages.
  • Cloud Services:
    • Analytics Service (HTTP/gRPC ingestion endpoint)
    • S3 (object storage for batch/offload)
    • Prometheus (scrapes metrics)
    • Tempo/OTLP collector (receives traces)
    • Grafana (visualizes metrics/traces)
  • Operator/Admin: interacts with the gateway’s Admin API & logs.

Out of scope: downstream analytics internals; device firmware internals.

Functional Elements & Responsibilities

Elements are deployable Rust services/crates within one containerized gateway binary, organized as modules.

Device Link (MQTT Client)

  • Maintain MQTT session with broker (local or embedded) using configured Client ID and credentials.
  • Subscribe to command/control topics.
  • Receive telemetry payload from device topics (JSON/Protobuf).
  • QoS handling, reconnection, exponential backoff.

Ingest Router

  • Validate/normalize telemetry payload (schema/version, max size, required fields).
  • Attach gateway metadata (gateway_id, receive_ts, correlation_id).
  • Route to write-ahead log (WAL Queue).

WAL Queue (Durable Buffer)

  • Append-only write-ahead log (SQLite WAL) for at-least-once delivery.
  • Index messages by status: Enqueued → InFlight → Acked/Dead.
  • Compaction & retention (size/time).
  • Backpressure thresholds.
  • Crash-safe recovery.
  • Idempotency keys.

Processor/Enricher

  • Decoding, transformation, device calibration, unit normalization.
  • Envelope creation for cloud publisher (batching, compression).
  • Personal Identifiable Information scrubbing/redaction policies.

Publisher

  • Send to Analytics Service with retries.
  • Fallback to S3 offload when primary is unavailable.
  • Confirm acks → update WAL Queue; handle retry schedules.

Control Plane

  • Handle inbound commands from MQTT topics (configuration updates, test pings, etc.).
  • Persist/apply configuration; publish control messages to devices.

Observability Subsystem

  • Tracing: create spans for ingest, WAL append, publish, retry; export via OTLP to Tempo/collector.
  • Metrics: Prometheus /metrics endpoint (ingest rate, queue depth, publish success %, latency, CPU/mem IO).
  • Logging: structured JSON logs.

Admin API & Health

  • HTTP endpoints: /healthz, /readyz, /metrics, /version.

Secrets & Config Manager

  • Load config via env/files → strongly typed config.
  • Manage credentials.

Recovery Tools

  • Requeue or export dead-lettered messages; support replay.
  • Diagnostics for failed publishes and WAL inconsistencies.

Interfaces

I1. MQTT Telemetry Ingest

  • Protocol: MQTT
  • Topics: devices/{device_id}/telemetry
  • QoS: 1 (at-least-once). Retain = false.
  • Payload: JSON/Protobuf (configurable)
  • Errors: invalid schema

I2. Admin HTTP API

  • GET /healthz (liveness) → 200/500
  • GET /readyz (readiness: MQTT + WAL + Publisher)
  • GET /metrics (Prometheus export)

I3. Observability Export

  • Tracing: OTLP/HTTP to collector → Tempo
  • Metrics: Prometheus scrape of /metrics
  • Logs: stdout

I4. Config & Secrets

  • Inputs: env vars
  • Contract: typed schema with defaults.

I5. Storage (Local WAL)

  • DB: SQLite in WAL mode
  • Durability: fsync on append; periodic checkpoint.

From Prototype to Consulting Offering

Over the last months, I’ve built this gateway starting out as a prototype, now transitioning into a consulting offering as part of my work at Combotto.io — where I help teams design secure, reliable, and observable IoT infrastructures.

If you’re building something similar and want an outside perspective, I offer:

  • Audit: Review your existing IoT dataflows and reliability setup.
  • Hardening Sprint: Short engagement to improve observability and security.
  • Reliability Partnership: Long-term consulting to scale your system safely.

  1. Rozanski, Nick and Woods, Eoin. *Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives* (2nd Edition). Addison-Wesley, 2012.