Thomas Bonderup

Scalable IoT Data Streaming Platform For The Transportation Industry with Apache Kafka & Confluent Platform

Published Apr 21, 20225 min read0 comments

This project was part of my bachelors project in Computer Science back in january 2020, where I researched how data can be collected from cars, and how such data can be used to increase safety and optimize the transportation industry.

The project included research and analysis, where I researched how data can be collected from cars and worked on 5V’s big data analysis, big data lifecycle analysis, scalability considerations with AKF scale cube, cost-benefit analysis, cross functional teams analysis, risk analysis and a software testing strategy.

For the project I developed a prototype system for a predictive maintenance use-case for anomaly detection on car sensor data to identify defect car-parts. The prototype was developed with SCRUM agile development methodology and consisted of 3 sprints of 14 days of work each.

The final prototype system consisted of a platform that was built on top of Apache Kafka and Confluent Platform to provide a highly scalable data platform. I used Java Spring Framework to develop functionality combined with Kafka Streams and ksqlDB to process streaming workloads. An opt-out feature for connected cars was developed to handle data privacy issues and GDPR compliance.