Arenadata Hadoop

Arenadata Hadoop is a full-fledged enterprise distribution package based on Apache Hadoop and designed for storing and processing semi-structured and unstructured data.

TOP-10 popular articles

A cheatsheet that describes the most common HDFS commands with examples.

The article shows how to create and run your first DAG to process CSV files.

An overview of Apache Iceberg architecture, benefits, and use cases. Iceberg is an open-source table format for data lakes that enables ACID transactions, time travel, schema evolution, partition evolution, and more.

An overview of working with sensors in Airflow: available sensor types and parameters. Examples of sensor use, as well as a description of the process of creating a custom sensor.

The section provides reference information on configuration parameters that can be used to configure ADH services via ADCM.

An overview of HDFS (Hadoop Distributed File System) — a highly fault-tolerant distributed file system designed for deployment on low-cost hardware.

The tables with Arenadata Hadoop network requirements: ADH service ports, JMX ports, ports redefined by Kerberos, client ports.

Information about Airflow logs for analyzing errors that may occur during DAG operation. Working with text logs and using the Airflow web interface for analyzing log files.

Arenadata Hadoop (ADH) is a commercial distribution of the open-source Apache Hadoop software. It is a big data platform designed for storing, processing, and analyzing large volumes of structured and unstructured data.

Apache Iceberg is an open, high-performance format for large analytic tables. The ADH Spark3 service adopts this format allowing you to work with Iceberg tables through Spark.

Found a mistake? Seleсt text and press Ctrl+Enter to report it