Arenadata Documentation
Our passion is to build efficient flexible solutions that scale up to dozens of petabytes
Products
Explore our range of solutions in the world of Big Data
Overview
Arenadata Streaming (ADS) is a real-time data streaming platform developed by the company Arenadata. It is designed to enable businesses to process, analyze, and react to high-volume data streams in real-time.
The platform uses Apache Kafka as its core messaging system, which is known for its high throughput and low latency. Arenadata Streaming provides a distributed and fault-tolerant architecture that can handle large volumes of data from various sources, including databases, IoT devices, sensors, and other streaming sources.
Use cases
Real-time data ingestion

Arenadata Streaming can ingest data in real-time from various sources, including databases, sensors, and IoT devices.

Data processing

The platform can process and transform data streams in real-time using Apache Kafka's stream processing capabilities.

Analytics

Arenadata Streaming provides tools for real-time data analytics, including machine learning, predictive analytics, and anomaly detection.

Integration

The platform offers integration with other data systems, such as Hadoop, Spark, and NoSQL databases.

IoT

MiNiFi can also be integrated with MQTT (Message Queuing Telemetry Transport) protocol, which is a lightweight messaging protocol designed for IoT devices. This integration allows MiNiFi to receive and publish data to MQTT brokers, which can be used for real-time data streaming and processing at the edge.

Enterprise
Community
Cluster management and monitoring
Deploy & upgrade automation
Offline installation
High availability
Advanced security features (encryption, role-based access control)
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADB
ADB
To read data there is an extension for Arenadata Database (ADB) that implements transactional data load from ADS. Writing data to ADS is provided by PXF plugin from ADB.
Existing tools for interacting with ADB:
  • Kafka Connect. This is a tool for scalable and reliable data streaming between Kafka and databases in both directions in real-time for processing and analysis.
  • Kafka JDBC Connector. This is an open-source Kafka connector that provides a simple way to connect Kafka with a database using JDBC. The Kafka JDBC Connector can be used to stream data from Kafka topics into a database in real-time, or to stream data from a database into Kafka topics.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
ADH
Arenadata QuickMarts
Kafka Connect can be used to move data between Kafka and Arenadata Hadoop (ADH) in both directions, allowing real-time streaming of data into ADH for processing and analysis.
A large set of NiFi connectors for interoperability with ADH services.
ADQM
ADQM
Existing tools for interacting with Arenadata QuickMarts (ADQM):
  • Kafka Connect ClickHouse Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to ADQM in near real-time. The ClickHouse Sink Connector can be used to stream data from Kafka topics into ADQM tables, either as individual rows or as batches of rows.
  • Kafka JDBC Connector. This is a Kafka Connect plugin that provides a way to connect to a JDBC-compliant database, such as ClickHouse. The JDBC Connector can be used to stream data from Kafka topics into ClickHouse tables, enabling data to be analyzed and processed in real-time.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
Oracle
Oracle
Existing tools for interacting with Oracle:
  • Kafka Connect. This is a tool for scalable and reliable data streaming between Kafka and Oracle in both directions, allowing real-time streaming of data into a database for processing and analysis.
  • Kafka JDBC Connector. This is an open-source Kafka connector that provides a simple way to connect Kafka with a database using JDBC. The Kafka JDBC Connector can be used to stream data from Kafka topics into a database in real-time, or to stream data from a database into Kafka topics.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
MS SQL
MS SQL
Existing tools for interacting with MS SQL:
  • Kafka Connect. This is a tool for scalable and reliable data streaming between Kafka and MS SQL in both directions, allowing real-time streaming of data into a database for processing and analysis.
  • Kafka JDBC Connector. This is an open-source Kafka connector that provides a simple way to connect Kafka with MS SQL using JDBC. The Kafka JDBC Connector can be used to stream data from Kafka topics into a database in real-time, or to stream data from a database into Kafka topics.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
S3
S3
Existing tools for interacting with S3:
  • Kafka Connect S3 Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to S3 in near real-time. The S3 Sink Connector can be used to stream data from Kafka topics into S3 buckets, either as individual objects or as batches of objects. This integration can be particularly useful for long-term storage and archiving of data from Kafka.
  • Kafka Connect S3 Source. This is a Kafka Connect plugin that provides a way to source data from S3 to Kafka. The S3 Source Connector can be used to stream data from S3 objects into Kafka topics, enabling data to be analyzed and processed in real-time.
  • S3 Object Processor. This is NiFi processor that can be used to perform CRUD (create, read, update, delete) operations on S3 objects. The S3 Object Processor can be configured to interact with S3 using access keys or roles, and can be used to transfer data between NiFi and S3 in real-time.
  • Amazon S3 Put/Get Object processors. The PutS3Object processor can be used to write data from NiFi to S3, while the GetS3Object processor can be used to read data from S3 into NiFi.
MongoDB
MongoDB
Existing tools for interacting with MongoDB:
  • Kafka Connect MongoDB Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to MongoDB in near real-time. The MongoDB Sink Connector can be used to stream data from Kafka topics into MongoDB collections, either as individual documents or as batches of documents.
  • Kafka MongoDB Source Connector. This is a Kafka Connect plugin that provides a way to source data from a MongoDB replica set into Kafka topics in near real-time.
  • PutMongoRecord Processor. This is a built-in NiFi processor that can be used to write data from NiFi to MongoDB in near real-time. The PutMongoRecord Processor can be configured to connect to MongoDB using a MongoDB client and credentials, and can be used to insert data into MongoDB collections from NiFi.
  • GetMongo Processor. This is a built-in NiFi processor that can be used to read data from MongoDB and bring it into NiFi for further processing. The GetMongo Processor can be configured to connect to MongoDB using a MongoDB client and credentials, and can be used to retrieve data from MongoDB collections for further processing in NiFi.
AVRO
AVRO
AVRO is a binary data format that is designed to be compact and fast. It supports schema evolution, which allows data schemas to change over time without requiring data to be rewritten or reloaded.
JSON
JSON
JSON (JavaScript Object Notation) is a lightweight data format that is commonly used for exchanging data between applications.
Operating systems
AltLinux 8.4 SP
Supported
CentOS 7
Supported
RedHat 7
Supported
AstraLinux
Currently in development
Cluster management and monitoring
Deploy & upgrade automation
Offline installation
High availability
Advanced security features (encryption, role-based access control)
Technical support 24/7
Corporate training courses
Tailored solutions
Available integrations
ADB
ADB
Available only for Enterprise
ADH
Arenadata QuickMarts
Kafka Connect can be used to move data between Kafka and Arenadata Hadoop (ADH) in both directions, allowing real-time streaming of data into ADH for processing and analysis.
A large set of NiFi connectors for interoperability with ADH services.
ADQM
ADQM
Existing tools for interacting with Arenadata QuickMarts (ADQM):
  • Kafka Connect ClickHouse Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to ADQM in near real-time. The ClickHouse Sink Connector can be used to stream data from Kafka topics into ADQM tables, either as individual rows or as batches of rows.
  • Kafka JDBC Connector. This is a Kafka Connect plugin that provides a way to connect to a JDBC-compliant database, such as ClickHouse. The JDBC Connector can be used to stream data from Kafka topics into ClickHouse tables, enabling data to be analyzed and processed in real-time.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
Oracle
Oracle
Existing tools for interacting with Oracle:
  • Kafka Connect. This is a tool for scalable and reliable data streaming between Kafka and Oracle in both directions, allowing real-time streaming of data into a database for processing and analysis.
  • Kafka JDBC Connector. This is an open-source Kafka connector that provides a simple way to connect Kafka with a database using JDBC. The Kafka JDBC Connector can be used to stream data from Kafka topics into a database in real-time, or to stream data from a database into Kafka topics.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
MS SQL
MS SQL
Existing tools for interacting with MS SQL:
  • Kafka Connect. This is a tool for scalable and reliable data streaming between Kafka and MS SQL in both directions, allowing real-time streaming of data into a database for processing and analysis.
  • Kafka JDBC Connector. This is an open-source Kafka connector that provides a simple way to connect Kafka with MS SQL using JDBC. The Kafka JDBC Connector can be used to stream data from Kafka topics into a database in real-time, or to stream data from a database into Kafka topics.
  • NiFi Database Connection Pooling Service. This is a built-in NiFi service that allows NiFi to connect to a database using JDBC.
  • ExecuteSQL Processor. This is a NiFi processor that can be used to execute SQL statements and queries against a database using JDBC.
S3
S3
Existing tools for interacting with S3:
  • Kafka Connect S3 Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to S3 in near real-time. The S3 Sink Connector can be used to stream data from Kafka topics into S3 buckets, either as individual objects or as batches of objects. This integration can be particularly useful for long-term storage and archiving of data from Kafka.
  • Kafka Connect S3 Source. This is a Kafka Connect plugin that provides a way to source data from S3 to Kafka. The S3 Source Connector can be used to stream data from S3 objects into Kafka topics, enabling data to be analyzed and processed in real-time.
  • S3 Object Processor. This is NiFi processor that can be used to perform CRUD (create, read, update, delete) operations on S3 objects. The S3 Object Processor can be configured to interact with S3 using access keys or roles, and can be used to transfer data between NiFi and S3 in real-time.
  • Amazon S3 Put/Get Object processors. The PutS3Object processor can be used to write data from NiFi to S3, while the GetS3Object processor can be used to read data from S3 into NiFi.
MongoDB
MongoDB
Existing tools for interacting with MongoDB:
  • Kafka Connect MongoDB Sink. This is a Kafka Connect plugin that provides a way to sink data from Kafka to MongoDB in near real-time. The MongoDB Sink Connector can be used to stream data from Kafka topics into MongoDB collections, either as individual documents or as batches of documents.
  • Kafka MongoDB Source Connector. This is a Kafka Connect plugin that provides a way to source data from a MongoDB replica set into Kafka topics in near real-time.
  • PutMongoRecord Processor. This is a built-in NiFi processor that can be used to write data from NiFi to MongoDB in near real-time. The PutMongoRecord Processor can be configured to connect to MongoDB using a MongoDB client and credentials, and can be used to insert data into MongoDB collections from NiFi.
  • GetMongo Processor. This is a built-in NiFi processor that can be used to read data from MongoDB and bring it into NiFi for further processing. The GetMongo Processor can be configured to connect to MongoDB using a MongoDB client and credentials, and can be used to retrieve data from MongoDB collections for further processing in NiFi.
AVRO
AVRO
AVRO is a binary data format that is designed to be compact and fast. It supports schema evolution, which allows data schemas to change over time without requiring data to be rewritten or reloaded.
JSON
JSON
JSON (JavaScript Object Notation) is a lightweight data format that is commonly used for exchanging data between applications.
Operating systems
AltLinux 8.4 SP
Available only for Enterprise
CentOS 7
Supported
RedHat 7
Supported
AstraLinux
Currently in development
Components
Apache ZooKeeper

Apache ZooKeeper is a distributed coordination service used by Arenadata Streaming to manage the configuration and coordination of its clusters. It is a crucial component of the system as it helps to ensure high availability and fault tolerance in Arenadata Streaming clusters.

ZooKeeper provides a hierarchical namespace that allows Arenadata Streaming to store configuration data, manage distributed locks, and coordinate distributed processes. It provides a consistent view of the system state across all nodes in the cluster, which helps to prevent data inconsistencies and ensure data integrity.

For example, Arenadata Streaming uses ZooKeeper to manage its Kafka brokers, topics, and partitions. When a new broker is added to the cluster, ZooKeeper is used to assign it a unique identifier and to coordinate the distribution of data across the cluster.

Apache Kafka

Apache Kafka is a distributed streaming platform used by Arenadata Streaming to manage the ingestion, processing, and analysis of real-time data streams. It provides a scalable, fault-tolerant, and highly available infrastructure for processing and storing real-time data.

Arenadata Streaming leverages Kafka's capabilities to handle large volumes of data and support multiple data sources. It provides a real-time data processing platform that enables businesses to analyze data as it flows through the system, providing near-instant insights into business operations.

Schema Registry

Schema Registry is a centralized repository used by Arenadata Streaming to store and manage schemas for data produced and consumed by Apache Kafka. It allows users to define, evolve, and share schemas across different applications and systems that use Kafka.

In Arenadata Streaming, Schema Registry enables users to ensure data compatibility across different versions of their applications and systems. It provides a way to enforce data validation and to ensure that all data produced and consumed by Kafka conforms to a predefined schema.

KSQL

KSQL is a streaming SQL engine used by Arenadata Streaming to process real-time data streams. It allows users to write SQL queries to transform, aggregate, and analyze data in real-time, making it easy to create real-time data processing pipelines without the need for complex programming.

In Arenadata Streaming, KSQL provides a simple yet powerful way to interact with data streams, enabling users to query, join, and filter data as it flows through the system. It supports a wide range of SQL operations, including windowing, aggregations, and joins, allowing users to create complex processing logic without the need for custom code.

Kafka Connect

Kafka Connect is a data integration framework used by Arenadata Streaming to move data between Apache Kafka and other systems. It provides a scalable and fault-tolerant infrastructure for ingesting and exporting data to and from Kafka, making it easy to integrate different systems and technologies with Kafka.

In Arenadata Streaming, Kafka Connect enables users to integrate data from various sources such as databases, file systems, and messaging systems with Kafka. It provides connectors that can be configured to extract data from different systems and write it to Kafka topics, or to read data from Kafka topics and write it to external systems.

It is also used for MirrorMaker 2. MirrorMaker 2 is a tool used by Arenadata Streaming to replicate data between Apache Kafka clusters. It is a replacement for the original MirrorMaker tool and provides several new features and improvements over its predecessor.

Kafka REST Proxy

Kafka REST Proxy is a tool used by Arenadata Streaming to expose Apache Kafka functionality as a RESTful API. It provides a simple and scalable way to integrate Kafka with other systems and technologies that support RESTful APIs.

Apache NiFi

Apache NiFi is an open-source data integration tool used by Arenadata Streaming to automate the flow of data between different systems and technologies. It provides a visual drag-and-drop interface for designing and configuring data flows, making it easy for users to build complex data pipelines without writing any code.

In Arenadata Streaming, Apache NiFi enables users to build and manage data flows across different systems and technologies. It provides a wide range of processors and connectors that can be used to integrate with various data sources and destinations, including databases, message queues, and cloud platforms.

Apache MiNiFi

Apache MiNiFi is a lightweight data collection tool used by Arenadata Streaming to collect and preprocess data at the edge of the network. It is designed to run on resource-constrained devices, such as sensors and IoT devices, and enables users to collect and process data in real-time, without relying on a central server.

In Arenadata Streaming, Apache MiNiFi enables users to collect and preprocess data at the edge of the network, before sending it to a central server for further processing and analysis. It provides a wide range of processors and connectors that can be used to collect data from various sources, including sensors, cameras, and other IoT devices.

Apache NiFi Registry

Apache NiFi Registry is a version control and management system used by Arenadata Streaming to manage and version data flows and other assets created using Apache NiFi. It provides a central repository for storing and managing NiFi flows, templates, and other artifacts, enabling users to easily version, deploy, and reuse them across different environments.

Kafka Manager

Kafka Manager (also known as CMAK) is a web-based management tool used to manage Apache Kafka clusters. It is designed to simplify the administration of Kafka clusters, providing a user-friendly interface for managing and monitoring Kafka topics, partitions, and brokers.

In Arenadata Streaming, Kafka Manager enables users to easily manage and monitor their Kafka clusters. It provides a web-based interface for performing administrative tasks, such as creating and deleting topics, reassigning partitions, and managing broker configurations. It also provides real-time metrics and monitoring of Kafka clusters, allowing users to easily identify and troubleshoot issues.

Features
Faster deployment
Arenadata Streaming streamlines the installation and configuration process, reducing the time required for setup compared to manual methods
User-friendly
Users can easily deploy and configure their data streaming infrastructure, even without extensive technical knowledge
Consistent installation
Arenadata Streaming ensures standardized deployment across multiple systems, minimizing errors and discrepancies
Improved performance
By optimizing the data streaming setup process, Arenadata Streaming enhances system performance, minimizing downtime and improving efficiency
Community-driven enhancements
Our team evaluates enhancements from the wider data streaming community, ensuring their incorporation into the product for seamless performance
Arenadata Platform Security
Enterprise edition
Arenadata Platform Security (ADPS) is a combination of two security components:
Apache Ranger
Apache Ranger is an open-source security framework that provides centralized policy management for Hadoop and other big data ecosystems. Arenadata Platform integrates with Apache Ranger to provide policy-based access control and fine-grained authorization for data and analytics applications.
Apache Knox
Apache Knox is an open-source gateway that provides secure access to Hadoop clusters and other big data systems. Arenadata Platform integrates with Apache Knox to provide secure access to the platform and its services.
ADPS provides a comprehensive security framework that includes policy-based access control, fine-grained authorization, and secure access to the platform and its services. This helps organizations protect sensitive data and ensure compliance with regulations.
ADS Control
Arenadata Streaming Control is a web-based graphical user interface (GUI) for managing and monitoring Arenadata Streaming clusters. It provides a user-friendly way to manage Kafka Connect instances.
ADS Control allows administrators to manage all aspects of their ADS Connect clusters, including stream processing, cluster configuration. It also provides monitoring capabilities that enable administrators to view the status of their clusters.
Roadmap
2023
ADS 1.8.1
  • Changed versions:
    • ZooKeeper up to 3.5.10
    • Kafka Connect up to 2.8.1
  • Added actions for Schema Registry service for expand and shrink operations
  • Added installation of the MiNiFi Toolkit
  • Added the ability to use configuration groups for MiNiFi service
  • Reworked log4j templates for the ksqlDB service
ADS 1.8.0
  • Changed versions:
    • NiFi Server and NiFi Registry components up to Apache NiFi 1.18
    • MiNiFi service up to Apache MiNiFi 1.18
  • Added the ability to delete a service from a cluster in the ADCM interface
  • Implemented support for Alt 8 SP in minifi.sh for NiFi service version 1.18
ADS 1.7.2
  • For the NiFi Registry (a component of the NiFi service), Ranger authorization is implemented – access protection when storing and managing shared resources in one or more NiFi instances
  • Added the ability to manage all parameters using the ADCM user interface in all configuration files
  • The basic authentication is now available for the following ADS services:
    • Schema Registry
    • Kafka REST Proxy
    • KSQL
ADS 1.7.1
  • Added support of AltLinux 8.4 operating system for ADS
  • For the NiFi service, the Ranger authorization plugin has been added, the ability to add or remove permissions for processing messages in NiFi has been implemented
  • Added support for Kafka Connect service and Mirror Maker 2 mechanism for ADS
ADS 1.7.0
  • Updating package versions:
    • Kafka 2.8.1
    • Nifi 1.15.0
    • Nifi-Registry 1.15.0
    • Schema-Registry 6.2.1
    • Kafka REST Proxy 6.2.1
    • KSQL 6.2.1
    • MiNiFi 1.15.0
  • Added LDAP/AD authentication for NiFi service
  • For the NiFi service, the ability to work with the "_routing" option using the NiFi Elasticsearch processor has been added
  • Switching of the logging level for ADS services is implemented
  • For ADS in ADCM, the ability to configure channel protection via the SSL protocol has been implemented
ADS 1.6.2
  • The authentication protocol Kerberos is implemented for ADS
  • Added the ability to use Active Directory as a Kerberos store for ADS
ADS 1.6.0
  • Implemented assembly of components with a dependency on ZooKeeper 3.5.8:
    • MiNiFi 0.7.0
    • NiFi-Registry 0.7.0
    • NiFi 1.12.0
  • Updating package versions:
    • Kafka 2.6.0
    • Zookeeper 3.5.8
    • Nifi 1.12.0
    • Nifi-Registry 0.7.0
    • Schema-Registry 6.0.0
    • Kafka REST Proxy 6.0.0
    • KSQL 6.0.0
    • Kafka Manager 3.0.0.5
  • Implemented SASL/PLAIN support for Kafka, KSQL, Schema-Registry, Kafka-Rest, Kafka-Manager services
  • Implemented the ability to add/update users for SASL/PLAIN
  • Implemented ADS integration with ADPS (Arenadata Platform Security)
  • Implemented support for Ranger Kafka Plugin
  • Enterprise version released
ADS 1.5.0
  • Updating package versions:
    • Kafka 2.4.0
    • Zookeeper 3.5.6
    • Nifi 1.10.0
    • Nifi-Registry 0.5.0
    • Schema-Registry 5.4.0
    • Kafka REST Proxy 5.4.0
    • KSQL 5.4.0
    • Kafka Manager 1.3.3.23
  • Implementation of the MiNiFi 0.5.0 service
  • For the MiNiFi service, the following actions have been implemented:
    • Install
    • Start/Stop/Restart
    • Check
    • Expand
    • Shrink
  • Monitoring implemented for MiNiFi service
  • ALT Linux operating system support
  • Added support for Kafka version 2.4.0 in Kafka-Manager
  • Added Analytics Framework support for NiFi service
ADS 1.4.11
  • Added cluster update operation
  • Added operations for adding/removing a host from a running Kafka, Nifi, Zookeeper cluster
  • Added the ability to export/import the connection string to the Zookeeper service for sharing one service instance in different clusters
  • Added the ability to install offline
  • Added Restart and Check operations - checking the performance of the Nifi service
  • Added integration of Nifi and Nifi-Registry services
  • Implemented collection and visualization, automatic sending of Nifi metrics to the Monitoring cluster