Introduction
Imagine youβre booking a ride on Uber. The app instantly matches you with a driver, calculates ETA, and updates both your and the driverβs screens in real-time. How does this happen so fast?
The answer lies in Apache Kafkaβa high-performance, distributed event streaming platform. Whether it's financial transactions, e-commerce order tracking, or real-time analytics, Kafka is the engine that powers them all.
In this blog, weβll break down what Kafka is, how it works, real-world use cases, and code examples to help you get started.
Apache Kafka is an open-source distributed system for real-time event streaming, data processing, and messaging. It allows different systems to send and receive messages at massive scale and ultra-low latency.
π‘ In simple terms: Kafka is like a high-speed message broker that connects applications and helps them communicate efficiently.
β
Scalable: Handles millions of messages per second.
β
Fault-Tolerant: Replicates data across multiple nodes.
β
Real-Time Processing: Enables event-driven architectures.
β
High Throughput: Optimized for large-scale data pipelines.
β
Decouples Microservices: Enables loosely coupled applications.
Kafka works in a publish-subscribe model, where Producers send messages, Consumers receive them, and everything is stored in a distributed log.
Kafka Architecture Diagram
βββββββββββββ
β Producer β
βββββββββββββ
β
ββββββββββΌβββββββββ
β Topic β
ββββββββββ¬βββββββββ€
Partition 0 Partition 1 Partition 2
βββββββββ βββββββββ βββββββββ
βBroker β βBroker β βBroker β
βββββββββ βββββββββ βββββββββ
β β β
ββββββββΌββββββ ββββββΌβββββββ ββββΌββββββββ
β Consumer A β β Consumer B β β Consumer C β
βββββββββββββ βββββββββββββ βββββββββββββ
Generates data and sends it to Kafka topics.
Example: A sensor sending temperature data.
A logical group for messages (like an email folder).
Topics are partitioned for parallel processing.
A topic is split into multiple partitions for efficiency.
Messages are distributed across partitions for load balancing.
A Kafka broker stores messages and delivers them to consumers.
Kafka clusters usually have multiple brokers.
Reads messages from topics.
Consumer Groups allow multiple consumers to share workload.
Manages metadata, leader elections, and cluster health.
Future versions of Kafka will replace ZooKeeper with KRaft.
Kafka requires Java. Check your Java version:
java -version
If Java isnβt installed, install OpenJDK:
sudo apt update && sudo apt install openjdk-11-jdk -y
Download Kafka and extract it:
wget https://downloads.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz tar -xvzf kafka_2.13-3.4.0.tgz cd kafka_2.13-3.4.0
Start ZooKeeper (Kafka requires it):
bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Broker:
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Run a producer to send messages:
bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
Type a message and hit Enter
Hello, Kafka!
Run a consumer to receive messages:
bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
You should see:
Hello, Kafka!
π Congrats! You just set up a working Kafka pipeline!