Kafka: Introduction Pt.1

3 min readSep 12, 2022

Self described as distributed streaming platform. Well, who doesn’t love streaming nowadays! Let’s dive deeper.

COMPONENTS:

Broker(Heart of Kafka): Center of the Kafka universe(where your data resides) and manages how it’s stored and which partition it’s stored it. When you’ve more than 1 broker, it’s called Kafka cluster coordinating together as a single unit.

Producer: Data can be unstructured & sent as byte arrays. You can also attach schemas to verify and type-cast messages. It can be JSON, XML, Avro(standard format). Avro is a serialization framework originally built for Hadoop.

Consumer: The end that consumes all this highly available data.

Topic: Each message is assigned to a specific name within the Kafka system. It can be equated to a table name in relational DB world.

Partitions: Splitting the topic into configurable sections. Each can be considered as separate and single log. There can be one reader per partition at a time. Therefore, total partition count = maximum parallelism capability.

CORE IDEA: It is a centralized system for publishing and subscribing to data, offering the below capabilities:

Scalability: It was built for start to scale-up(vertical scaling) and scale-out(horizontal scaling), with durable data store as its backing. It enables many readers and writers to coexist seamlessly.

Failures and reprocessing can be embraced: This helps to read and re-read data and allows you to reliably build up the state of your system from any given point.

High level of abstraction: Separates developer from underlying platform and focus on core business logic.

Handle time-aspect of data: Kafka acts as buffer between data generation and consumption. You can write/read at speed suitable for your application. Data can be handles in a batch or real-time stream.

CQRS(Command Query Responsibility Segregation): Producers optimize for writes and the consumers focus on reading. It offers backward and forward compatibility. It offers a asynchronous message driven architecture.

Fault Tolerance: A topic can specify how many full copies of data(replicas) it would prefer the system to keep as a backup. Zookeeper(aka Failure Manager), the clusters’ metadata manager which is a centralized service for maintaining configuration, naming and providing distributed synchronization.

USE CASES:

Log aggregation, monitoring and analyzing critical distributed systems. Go-to for microservices: It provides storing events for an event-driven source system or an external distributed commit log.

References:

https://kafka.apache.org/documentation/#introduction,

https://kafka.apache.org/documentation/#configuration,

https://kafka.apache.org/uses

Kafka: Introduction Pt.1

Written by Shristi Katyayani