Thomas Bonderup

Kafka Summit 2020

Published Aug 26, 202010 min read0 comments

I just attended Kafka Summit 2020, a virtual conference by Confluent. It was a two day virtual event with lots of great speakers from the Kafka Community. I watched the event from Denmark, so I was up almost all night drinking coffee and chatting and watching the speakers.

Day 1 recap

I started the event by watching the Day 1 morning Keynote Program, which officially kicked off the event. Here Gwen Shapira from Confluent talked about some of the new Kafka Improvement Proposals with a focus on KIP-405 Kafka Tiered Storage and KIP-500 Replace Zookeeper with Metadata Quorum. Both huge improvements to Kafka.

Kafka Tiered Storage helps with the Elasticity of Kafka Cluster by introducing a cold storage layer to offload data from the kafka broker. This feature makes it easier to scale and add new Kafka Brokers, because less data has to be copied to the new Kafka brokers.

The Zookeeper removal simplifies the operational burden of running Kafka Clusters by having one less software component to manage. By materializing the metadata state in memory gives better scalability improvements with up to 10,000,000 partitions in the future.

After that I moved on to a great talk about the tradeoffs in distributed systems design: Is Kafka the best? I really really recommend watching this talk. They talked about many of the trade-offs in infrastructure design with lots of benchmark comparisons, for example data throughput benchmark comparison with Kafka, Pulsar and RabbitMQ. Messaging model basicsm contiguous streams vs fragmented streams and all the other good stuff!!

Then I moved on to watch Viktor Gamov’s talk about testing stream processing applications. Also a good talk with a very fun and energetic speaker. I recommend that you also check out his livestream videos on Viktor’s youtube channel and Confluent’s youtube channel, if you want to learn more about testing and kafka streams applications.

The talk: Can Kafka Handle a Lyft Ride included a nice demo and some interesting talk about State Machines, PubSub architecture, message delivery time and Kafka.

Day 2 recap

I started day 2 by watching Kai Waehner’s talk: Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake. A great talk with explanation and predictive maintenance demo, where he showed a full data pipeline for machine learning, for example how to train a machine learning model with streaming data from kafka and model predictions with ksqlDB.

I then moved on to Robin Moffatt’s talk: Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline! Very nice customer data demo with Kafka, ksqlDB, Kafka Connect with change Data Capture from Relational Database and integration to ElasticSearch and Kibana dashboard.

Day 2 was kicked off with the Keynote Program Day 2 Morning talk by Jay Kreps, co-founder of Kafka and Sam Newman, Author of Building Microservices. This was the two best talks at the event. Watch them here:

I ended the event by watching A Tale of Two Data Centers: Kafka Streams Resiliency by Anna McDonald. Fun and interesting stuff about resiliency, replication and stretch clusters.

Thank you very much to all the speakers, sponsors and attendees who made this event possible!!