Kafka: Introduction Pt.1
Self described as distributed streaming platform. Well, who doesn’t love streaming nowadays! Let’s dive deeper.
COMPONENTS:
Broker(Heart of Kafka): Center of the Kafka universe(where your data resides) and manages how it’s stored and which partition it’s stored it. When you’ve more than 1 broker, it’s called Kafka cluster coordinating together as a single unit.
Producer: Data can be unstructured & sent as byte arrays. You can also attach schemas to verify and type-cast messages. It can be JSON, XML, Avro(standard format). Avro is a serialization framework originally built for Hadoop.
Consumer: The end that consumes all this highly available data.
Topic: Each message is assigned to a specific name within the Kafka system. It can be equated to a table name in relational DB world.
Partitions: Splitting the topic into configurable sections. Each can be considered as separate and single log. There can be one reader per partition at a time. Therefore, total partition count = maximum parallelism capability.

CORE IDEA: It is a centralized system for publishing and subscribing to data, offering the below capabilities:
Scalability: It was built for start to scale-up(vertical scaling) and scale-out(horizontal scaling), with durable data store as its backing. It enables many readers and writers to coexist seamlessly.
Failures and reprocessing can be embraced: This helps to read and re-read data and allows you to reliably build up the state of your system from any given point.
High level of abstraction: Separates developer from underlying platform and focus on core business logic.
Handle time-aspect of data: Kafka acts as buffer between data generation and consumption. You can write/read at speed suitable for your application. Data can be handles in a batch or real-time stream.
CQRS(Command Query Responsibility Segregation): Producers optimize for writes and the consumers focus on reading. It offers backward and forward compatibility. It offers a asynchronous message driven architecture.
Fault Tolerance: A topic can specify how many full copies of data(replicas) it would prefer the system to keep as a backup. Zookeeper(aka Failure Manager), the clusters’ metadata manager which is a centralized service for maintaining configuration, naming and providing distributed synchronization.

USE CASES:
Log aggregation, monitoring and analyzing critical distributed systems. Go-to for microservices: It provides storing events for an event-driven source system or an external distributed commit log.

References:
https://kafka.apache.org/documentation/#introduction,