Apache Kafka is a free, open source stream-processing software platform based on Java. In this article, we will introduce how to install Kafka on Ubuntu 16.04.
Apache Kafka needs a Java runtime environment, so you will need to install the latest version of Java to your system. By default, the latest version of the java is not available in Ubuntu 16.04 repository. So, you will need to add Java repository to your system. You can do this by running the following command:
add-apt-repository ppa:webupd8team/java
Next, update the repository and install Java by running the following command:
apt-get install oracle-java8-installer -y
Once the Java is installed, you can check the Java version using the following command:
java -version
Output:
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Then you can install ZooKeeper for maintaining configuration information, providing distributed synchronization, and naming and providing group services.
Next, you can download the lated version of Kafka from the Apache website and extract the file to the /opt
directory.
Next, start the Kafka server by running the following script:
/opt/Kafka/bin/kafka-server-start.sh /opt/Kafka/config/server.properties
Then you will see the INFO from the output and Kafka server is now up and listening on port 9092.
For details and how to test Kafka, you can go to How to Configure an Apache Kafka Cluster on Ubuntu 16.04.
LinkedIn was the first company to develop Kafka using Java and Scala languages. Its source code was opened up in 2011, and it became a top project of the Apache Software Foundation in 2012. In 2014, several founders of Kafka set up a new company named Confluent, which specialized in Kafka.
The purpose of the Kafka project is to provide a unified, high-throughput, and low-delay system platform for real-time data processing. Kafka delivers the following three functions:
In this article, we will look at the system implemented by Alibaba's Xianyu team, which can process tens of millions of data records every second in real time.
LogHub can be regarded as a data release and subscription component and has similar functions to Kafka. However, LogHub is a more stable and secure data transmission channel than Kafka.
Kafka provides a collection of metrics that are used to measure the performance of Broker, Consumer, Producer, Stream, and Connect. E-MapReduce collects metrics for Kafka Broker by using Ganglia to monitor the running status of this Kafka Broker. A Kafka system consists of two roles: a Kafka Broker and multiple Kafka clients. When an issue of read/write performance occurs, you must perform an analysis on the both Kafka Broker and clients. Metrics from Kafka clients are important for performing the analysis.
This section describes how to use E-MapReduce to collect metrics from a Kafka client to conduct effective performance monitoring.
This section describes two common issues with Kafka.
EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.
Log Service is a complete real-time data logging service that has been developed by Alibaba Group. Log Service supports collection, consumption, shipping, search, and analysis of logs, and improves the capacity of processing and analyzing large amounts of logs.
Log Service fully supports Kafka, elastic scaling, delay alarms, and all streaming computing systems, such as Spark Streaming, Storm, StreamCompute, Flink, and Consumer Library (automatic load balancing).
ACtivate the Value of Your Data: Get Free Support from Our Big Data and A.I. Experts
2,599 posts | 762 followers
FollowAlibaba Clouder - June 13, 2018
Alibaba Clouder - February 13, 2018
ApsaraDB - September 9, 2021
Alibaba Clouder - April 23, 2019
Alibaba Clouder - May 7, 2019
Alibaba Clouder - June 4, 2019
2,599 posts | 762 followers
FollowConduct large-scale data warehousing with MaxCompute
Learn MoreRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreMore Posts by Alibaba Clouder