ksqlDB overview

ksqlDB features

ksqlDB is a Kafka database used for stream processing. ksqlDB processes data stored in Kafka and combines Kafka Streams for stream processing and Kafka Connect for collecting and processing events from various data sources. The Kafka Streams processing applications inside ksqlDB are a set of SQL queries.

ksqlDB features
ksqlDB features
ksqlDB features
ksqlDB features

The main functionality of ksqlDB is listed below:

  • Modeling data stored in Kafka into streams or tables via SQL.

  • Executing push queries for data transfer. Using push requests, you can subscribe to changes occurring in the database and track changes in real time. A push request can be sent via the сommand line interface (CLI) or as an HTTP request to the ksqlDB REST API.

  • Executing pull queries to retrieve data. A pull query can retrieve the current value from a materialized view, table, or stream.

  • Creating materialized views from streams and tables. The materialized view in the background collects data from the Kafka table, converts it into the required format, and places it into a pre-created target table. You can bind multiple materialized views to a single Kafka table to store data at different levels of granularity across multiple tables.

  • Creating connectors for integration with external data stores.

ksqlDB architecture

The figure below shows the architecture of ksqlDB.

ksqlDB architecture
ksqlDB architecture
ksqlDB architecture
ksqlDB architecture

Every ksqlDB server runs the Kafka Streams applications and is essentially a separate instance of the Kafka Streams application. As part of a cluster, several ksqlDB servers share the load created by the Kafka Streams topology.

The ksqlDB server consists of two components:

  • ksqlDB engine — executes SQL statements and queries. The engine parses SQL statements, converts them to Kafka Streams topology, and launches the Kafka Streams applications.

  • REST API ksqlDB — provides client access to the ksqlDB engine to send requests.

 

To connect to the ksqlDB REST interface, the client can use the following types of the ksqlDB interface:

ksqlDB in ADS

Connection

After adding and installation of the ksqlDB service as part of an ADS cluster, you can connect to the ksqlDB server from the hosts where the ksqlDB Client component is located using the following command: ksql http://ksql-server:ksql-server-port.

Information about the ksqlDB service in the ADCM interface
Information about the ksqlDB service in the ADCM interface

Kafka

In ADS, the ksqlDB service can be installed only after installing the Kafka service. After installing ksqlDB, the bootstrap.servers parameter in the /etc/ksqldb/ksql-server.properties configuration file is automatically set for communication with the Kafka broker, as well as others options for interaction between Kafka and ksqlDB (for example, for internal ksqlDB topics created in Kafka).

Schema Registry

The ksqlDB service, installed in ADS simultaneously with the Schema Registry service, allows registration, reading of schemas, as well as serialization of data using specified (with an identifier) ​​schemas. This makes it easier to work with data serialization, since there is no need to manually define columns and data types in ksqlDB. The format can be created for both keys and values. For example, when creating a stream based on a Kafka topic, the use of schemas is described by the KEY_SCHEMA_ID or/and VALUE_SCHEMA_ID properties of the thread as shown below.

CREATE STREAM pageviews
  WITH (
    KAFKA_TOPIC='avro-topic',
    VALUE_FORMAT='AVRO',
    VALUE_SCHEMA_ID=1
  );

ksqlDB may contain columns whose contents do not match the schema format. In this case, the output of columns corresponding to the schema format can be displaying.

After installing the ksqlDB service, the key.converter.schema.registry.url and value.converter.schema.registry.url parameters, which are responsible for the interaction between ksqlDB and Schema Registry, are set in the /etc/ksqldb/connect.properties configuration file.

Kafka Connect

The ksqlDB service, installed in ADS simultaneously with the Kafka Connect service, provides the ability to manage the Kafka Connect connectors by performing the following actions:

ksqlDB and Kafka Connect
ksqlDB and Kafka Connect
ksqlDB and Kafka Connect
ksqlDB and Kafka Connect

Examples of connectors that can be created using ksqlDB:

  • Debezium PostgreSQL connector — a source connector that receives a snapshot from a PostgreSQL database, then tracks all the subsequent changes of this data at the row level for each table and writes it to a separate Kafka topic;

  • JDBC connectors — connectors that work with any database with a JDBC driver with data import into Kafka topics (JDBC Source connector) or exporting data from Kafka topics (JDBC Sink connector).

For interaction between the ksqlDB and Kafka Connect services, configure the ksql.connect.url parameter in the /etc/ksqldb/connect.properties configuration file.

Configure KsqlDB

Configuring ksqlDB parameters in the ADCM interface is performed on the configuration page of the ksqlDB service.

To configure the parameters of the /etc/ksqldb/ksql-server.properties and /etc/ksqldb/connect.properties configuration files, activate the Show advanced switch, expand the server.properties or connect.properties node and enter new values ​​for the parameters. To change the ksqlDB parameters that are not available in the ADCM interface, use the Add key,value field. Select Add property and enter the name of the parameter and its value.

After changing the parameters using the ADCM interface, restart the ksqlDB service. To do this, apply the Restart action by clicking actions default dark actions default light in the Actions column.

Found a mistake? Seleсt text and press Ctrl+Enter to report it