Arenadata Hadoop

Arenadata Hadoop is a full-fledged enterprise distribution package based on Apache Hadoop and designed for storing and processing semi-structured and unstructured data.

TOP-10 popular articles

Hive provides several ways to work with tables. You can use data manipulation language (DML) queries to import or add data to a table. Also, you can directly ingest data to a Hive table using HDFS commands.

HiveServer2 supports the Beeline command shell which is a JDBC client based on the SQLLine CLI.

Airflow writes text logs used for analyzing errors that can occur while running DAGs. These logs are located in the logs subfolder of the Airflow home directory.

A guide on using DBeaver to connect to Hive with Kerberos authentication enabled.

The article shows how to create and run your first DAG to process CSV files.

Airflow is a platform that allows to develop, plan, run, and monitor complex workflows. It fits perfectly with ETL/ELT processes and also can be useful if you need to periodically run any processes and monitor their execution.

In HDFS, you can restrict access to files or directories using a standard model based on POSIX with modifications. You can grant permissions to a file for its owner, a specified user group, and other users.

ADB Spark 3 Connector provides the possibility of high-speed, parallel data exchange between Spark 3 and Arenadata DB. The article contains a full description of the ADB Spark 3 Connector.

Solr is a search server that deals with large sets of data. Since Solr can also store data, it is a NoSQL, non-relational storage, and a processing technology.

There are two major ways to launch Spark jobs on your cluster: by using spark-submit and via spark-shell.

Found a mistake? Seleсt text and press Ctrl+Enter to report it