Kafka Streaming Job

Introduction

These Talend jobs 1) create a Kafka topic, 2) wait for messages to appear on that topic (Consumer) and 3) read a list of JSON files, extract the contents, convert to a Kafka message and then send the message to the Kafka topic (Producer).

Setup

Download the data files and the job files.

Github location to download the files.

Import the jobs into Talend Studio.

(As of this writing you could download it from here: https://www.talend.com/lp/open-studio-for-data-integration/)

Copy the data to a location on your machine and then point to that file location in the Kafka Producer job (the parameter is called, "Directory", in the tFileList component).

Ensure the Kafka topic has the same name across all three jobs.

To Install Kafka:

Download Kafka (https://kafka.apache.org/) and install it (Windows is tricky):


    $ tar -xzf kafka_2.13-3.2.0.tgz

    $ cd kafka_2.13-3.2.0

    # Start the ZooKeeper service

    # Note: Soon, ZooKeeper will no longer be required by Apache Kafka.

    $ bin/zookeeper-server-start.sh config/zookeeper.properties

    # Start the Kafka broker service

    $ bin/kafka-server-start.sh config/server.properties

Run the jobs in the following sequence:

Now once all services have successfully launched, run this job to create a topic.

A simple demonstration of Talend's streaming capabilities using Kafka.

Start this job to consume Kafka messages:

Then run this job to produce the Kafka messages. Return to the job above to see the received Kafka messages.