Kafka-PyFlink Getting Started -Part 1 : Apache Kafka with KRaft-Install and Configure on Windows 10/11 (no WSL2)

Diptiman Raichaudhuri
10 min readMar 11, 2024

--

Disclosure: All opinions expressed in this article are my own, and represent no one but myself and not those of my current or any previous employers.

I intend to publish a series of articles on setting up development environment for Kafka KRaft mode and subsequently stream processing using PyFlink Table API and Flink SQL. This is the first in the series which focuses on setting up a development environment on Windows 10/11 for thousands of developers using Windows environment, without installing WSL2.

Here’s the whole series :

  1. Kafka with KRaft on Windows
  2. Data Engineering with pyflink Table API
  3. Data Engineering with FlinkSQL
  4. FlinkSQL Tumbling Window aggregation
  5. Pyflink Table API Tumbling Window aggregation
  6. Pyflink Table API UDF — User Defined Functions

As of Kafka 3.3 KRaft mode of consensus for Kafka cluster is production ready. It greatly simplifies monitoring, administering and supporting Kafka clusters, without the need of running a Zookeeper cluster for managing cluster metadata.

The Apache Kafka QuickStart talks about how to setup and get going with the KRaft mode for Linux/Mac etc …

I wanted to give it a shot and check if it is possible to set the cluster up and configuring my IntelliJ Idea community edition for the KRaft mode on a Windows 11 laptop. Hundreds of builders and developers install and configure a single node Kafka cluster on their favorite IDEs to test and run code locally on Windows laptops(without using WSL2), so, setting KRaft cluster on my windows laptop remained a priority for me.

I thought I could jot down what works what not etc …

Broadly, these are the steps I am following :

  1. Install OpenJDK (Java 11) , set JAVA_HOME, PATH and CLASSPATH
  2. Install IntelliJ Idea Community Edition
  3. Download Apache Kafka distribution on my Windows disk
  4. Configure Kafka ports, log directories on my Windows disk
  5. Create a maven Java project with IntelliJ Idea
  6. Run Kafka with KRaft mode (no Zookeeper ! !)
  7. Test by creating a topic and consuming from the topic.

Step 1 : Install OpenJDK

OpenJDK distributions for Windows are offered by many orgs, such as :

a. OpenLogic — Select JDK 11 and download the zip file

b. Microsoft — Select JDK 11 and download the zip file

Unzip the file in a folder with no space between the folder name. I have downloaded mine here :

JDK 11

I have also configured the JAVA_HOME environment variable with the full path -> D:\sw\jdk11012

JAVA_HOME

Then, I appended %JAVA_HOME%\bin to the PATH system variable

Then I created another system environment variable CLASSPATH and assigned the following to CLASSPATH : %JAVA_HOME%\bin;%JAVA_HOME%\lib

All set with JDK, let’s give it a test ! Open a command prompt and type java -version or java

java -version

Done with Step-1, OpenJDK 11 is installed and configured on my windows laptop.

Step-2 — Install IntelliJ Idea Community Edition

Download the .exe for IntelliJ Idea Community edition here .

Once, downloaded, install it in a folder.

Also, create a new folder for all IntelliJ Idea projects.

For me, I created D:\testing_Workspace\intellij_workspace.

Now, that the installation and configuration of my development environment is complete, let’s focus on getting open-source Kafka (OSS Kafka) installed on Windows.

Step-3 — Install OSS Kafka on Windows

Head over to Kafka downloads page. Current stable version is 3.7.0, let’s download the binary with Scala 2.13 support, here.

The downloaded file will be a .tgz , which when unzipped using 7zip or windows unzipper, unzips a .tar file.

Unzip the .tar file in a folder, I have unzipped here :

Kafka Installation Folder

Let’s also create a environment variable for the Kafka installation directory.

Since, I have multiple version of OSS Kafka on my laptop, I am creating a env variable which recognizes Kafka 3.7.x, but, you would ideally have only one version, so, you could name it KAFKA_HOME . Do not add the bin folder on PATH yet, we’ll use a trick to get everything configured from within Intellij Idea.

KAFKA_HOME

Step-4 — Configure Kafka ports, log directories on Windows

Let’s examine the KAFKA37_HOME folder. The bin folder has all the CLI command files(.sh files for linux and Mac), while bin/windows has the set of .bat files which runs most of the CLI commands on windows.

The config folder contains the all important set of configurations. config\kraft is the folder we’ll tinker with ! Had we used zookeeper, most of the configuration changes would have happened at zookeeper.properties and server.properties. But, in this exercise, we will not use zookeeper and run kafka brokers in KRaft consensus mode.

Inside config\kraft , let’s have a look at the server.properties file. The first change to notice is process.roles=broker,controller , this property fundamentally enables a broker to be a broker as well as a controller and take part in the controller quorum voting process. Unlike, zookeeper, with kraft mode, the cluster metadata remains inside brokers in a specific topic(we’ll see more in the upcoming section). This detaches the dependency on an external metadata manager like zookeeper.

In server.properties let’s change the advertised.listeners port to 9098, and also make the controller.quorum.voters port to 9099. Also, change the listeners to 9098 and 9099 respectively.

Next, in controller.properties , let’s change the controller.quorum.voters to 9099 and listeners to 9099.

Nest, in broker.properties change the controller.quorum.voters to 9099 and change listeners and advertised.listeners to 9098.

We are done with configuring the port of this single node kafka cluster with kraft consensus mode.

Step-5 — Create a maven java project to develop code

Create a new project using File -> New project on IntelliJ Idea commnity edition . In advanced settings fillup your GAV coordinates(fancy way of calling groupid , artifactid and version ! !

Provide the maven groupid and artifactid, and the new project would show up in the IDE.

Let’s add some dependencies through the pom.xml . (I have uploaded all files in Github, link down below)

For this exercise, the following properties are good enough to start with :

Now add the dependencies required to run basic Kafka producer-consumer tests

From the right collapsed menu bar, select maven and click ‘Reload All Maven projects’.

Also, add a log4j.properties file in the src->main->resources folder (you’d find the example file in the github repo mentioned below).

Right click the Main.java file and hit run to ensure that the setup is working fine.

Step-6 — Run kraft kafka from intellij

Now, that the maven project is all setup, let us create a new folder cli at the top most level, as a peer folder to src . This folder will be used to run the kraft kafka CLI files.

Create a new file inside cli folder and name it 01_kafka_kraft_random_uuid_gen.bat copy the following command in the new file and save it.

%KAFKA37_HOME%\bin\windows\kafka-storage.bat random-uuid

Hope you had already set the KAFKA37_HOME environment variable, as explained in the earlier section.

Create another file inside cli folder and name it 02_kafka_kraft_format_log_storage.bat , copy the following command in the new file and save it.

%KAFKA37_HOME%\bin\windows\kafka-storage.bat format -t <RANDOM_UUID> -c %KAFKA37_HOME%\config\kraft\server.properties

In place of <RANDOM_UUID> we’ll use the random uuid generated form the output of the first CLI command. I’ll explain in detail when I run through all steps a little later in the section.

Create another file inside cli folder and name it 03_kafka_server_start.bat , copy the following command in the new file and save it :

%KAFKA37_HOME%\bin\windows\kafka-server-start.bat %KAFKA37_HOME%\config\kraft\server.properties

Next, create another file inside cli folder to stop the server in a clean manner and name it 04_kafka_server_stop.bat , copy the following command in the file and save it :

%KAFKA37_HOME%\bin\windows\kafka-server-stop.bat %KAFKA37_HOME%\config\kraft\server.properties

Next, create the fifth file inside cli folder, name it 05_kraft_kafka_topic_create.bat , copy the following command and save it :

%KAFKA37_HOME%\bin\windows\kafka-topics.bat --create --topic <TOPIC_NAME> --bootstrap-server localhost:9098 --partitions 3 --replication-factor 1

Next, create another file inside cli folder , name it 06_kafka_kraft_console_producer.bat and copy the following command :

%KAFKA37_HOME%\bin\windows\kafka-console-producer.bat --topic <TOPIC_NAME> --bootstrap-server localhost:9098

The last file to be created inside cli folder, for this exercise, name it 07_kafka_kraft_console_consumer.bat , copy the following command and save it :

%KAFKA37_HOME%\bin\windows\kafka-console-consumer.bat --topic <TOPIC_NAME> --bootstrap-server localhost:9098 --from-beginning

Essentially, all we have done here is copy Kafka CLI files bundled with the open source distribution for windows and configure it to run from within the IDE.

We are almost done, the last change needed is to change the server.properties and the other two config files, so that, the log.dirs folder which is essentially the folder on the disk where kafka keeps its logs, is created within the IDE (IntelliJ IDea) environment.

Open server.properties inside %KAFKA37_HOME%\config\kraft folder and modify the log.dirs property to ../logs/kraft-combined-logs

server.properteis

Make similar changes in controller.properties as well as to broker.properties.

controller.properties
broker.properties

Now, to run kafka broker in kraft mode, we need to do the following :

  1. Run the first script and generate a random uuid
  2. Use the generated random uuid form step 1 to format log storage using the second strip
  3. Then start kafka broker

Let’s give it a shot.

Form inside IntelliJ, run the first script , right click ‘Run’ :

See, we have a random generated UUID in the console. Now, copy the UUID and paste it in the second script and then run the second script, which will format the kafka log storage :

Running the second script with the UUID will format log storage and you would see the kraft-combined-logs sub-folder within logs folder (remember, we changed the log.dirs to logs

Now, we are ready to run our broker. Run script 3 to start the kafka broker in kraft mode :

The server started successfully at the 9098 listener port , as configured.

Also, notice the new folder _cluster_metadat-0 sub-folder within logs, which is the major change introduced within Kafka, and now the cluster metadata resides within the broker ! (Check KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum)

Fantastic, all our steps succeeded, so far !

We would quickly create a new topic with script 5, propose a new topic name in place of <TOPIC_NAME> . I chose kraft-demo , kept partition count to 3 and replication factor to 1(of course, this is a single node broker, duh ! !).

Got my topic kraft-demo created alright !

Let’s run a console producer and publish some hello-world messages.

Run, script 6, change the <TOPIC_NAME> to kraft-demo . Running, this script would open the console producer within the IDE and let’s write some message :

Finally, run script 7, a console consumer, to test, if our setup works fine, replace <TOPIC_NAME> with kraft-demo :

Perfect ! ! ! We get to see messages published using the console producer !

Now, to stop the server cleanly, run script 4.

Also, remember to delete logs folder if you want to start afresh or after a test scenario.

Moving from zookeeper to kraft is a significant step for Kafka, and now that our development setup is done, I would continue to sue this setup to move into stream data processing using Apache Flink Table API and FlinkSQL in subsequent articles.

Do checkout KIP 500 !

Here’s the github repo : kafkakraft01

Happy coding !

--

--