Intro

Kafka

Kafka is an

Open source

Distributed

Partitioned

Replicated

Commit-log-based

publish-subscribe messaging system.

A filesystem or database commit log is designed to provide a durable record of all transactions so that they can be replayed to consistently build the state of a system



“distributed commit log”

data within Kafka is stored durably, in order, and can be read deterministically. In addition, the data can be distributed within the system to provide additional protections against failures, as well as significant opportunities for scaling performance.

The key abstraction in Kafka is the Topic. At the implementation level, a Kafka topic is just sharded write-ahead log. Topics are partitioned and each partition is represented by an ordered, immutable sequence of messages.

Processes that publish messages to a Kafka topic are referred to as Producers. Producers publish data to a topic by choosing the appropriate partition within the topic.

Processes that subscribe to topics and process the published messages are called Consumers.

Kafka is run as a cluster comprised of one or more servers each of which is called a Broker. Topics are created within the context of broker processes.

Publish-subscribe

0.8, as it is not backward compatible. If the existing Kafka cluster is based on 0.7.x,

a migration tool is provided for migrating the data from the Kafka 0.7.x-based

cluster to the 0.8-based cluster.

Last updated

Was this helpful?