Glossary

AD

Active Directory — a directory service for Windows Server family operating systems. It was initially created as an LDAP-compatible implementation of a directory service. However, starting with Windows Server 2008, it includes integration capabilities with other authorization services, performing an integrating and unifying role for them.

It allows administrators to apply group policies to ensure consistency in the configuration of the user work environment, deploy software on multiple computers through group policies or System Center Configuration Manager (formerly — Microsoft Systems Management Server), install operating system, application, and server software updates on all computers of the network using Windows Server Update Service. It stores data and environment settings in a centralized database. Active Directory networks can be of various sizes: from several dozen to several million objects.

API

Application programming interface — a set of ready-made classes, procedures, functions, structures, and constants provided by an application (library, service) or operating system for use in external software products.

CLI

Command-line interface — a kind of text user interface (TUI) where users give instructions to a computer by typing text strings (commands) from the keyboard. Other names are console and terminal.

ClickHouse Keeper

A coordination service that provides a ZooKeeper-compatible client-server protocol and can be used for data replication and distributed DDL query execution in ADQM/ClickHouse as an alternative to ZooKeeper.

Cluster

A group of servers and coordinating software that are united logically and capable of processing the same requests and acting as a single resource.

Codec

Defines a compression method applied to ADQM/ClickHouse data.

Database engine

In ClickHouse, it is a data storage mechanism that is responsible for managing data (storing, retrieving, manipulating) in a database. The main database engine that ADQM/ClickHouse uses by default is Atomic.

DataNode

A working server, which is a program code that typically runs on a separate HDFS instance and is responsible for file-level operations (such as writing and reading data) and executing commands received from a NameNode (create, delete, replicate blocks, etc.). Besides that, a DataNode usually performs:

  • periodic sending of status messages (heartbeats);

  • processing read and write requests received from HDFS clients, since data comes from the rest of the cluster machines to the client, bypassing the NameNode.

Dictionary

A key/value data store that is fully or partially stored in the RAM of a ClickHouse server and can be used as a reference to substitute data values by keys in the final sample. Dictionaries in ADQM/ClickHouse are an easier-to-use alternative to the JOIN operator.

DNS

Domain Name System — a distributed and hierarchical system used to identify computers, domains, services, and other resources accessible through the Internet or other network protocols. It is most often used to get an IP address by a host name (computer or device), obtain information about mail routing and/or service nodes for protocols in a domain.

A distributed DNS database is maintained using a hierarchy of DNS servers that interact over a specific protocol.

DNS Server

An application designed to respond to DNS queries using the appropriate protocol. This term can also be used to refer to a host where the corresponding application is running.

Firewall

A software package designed to monitor and filter network traffic.

FreeIPA

A free and open-source identity management system for Linux/UNIX networked environments. It is based on Fedora Linux, 389 Directory Server, MIT Kerberos, NTP, DNS, the DogTag certificate system, SSSD, and other free/open-source components. FreeIPA is designed with an intent to provide the same services as Active Directory.

FQDN

Fully Qualified Domain Name — a domain name that has no ambiguities in its definition. Includes the names of all the parent domains in the DNS hierarchy.

Gateway

A network device designed to transfer the user traffic between two networks that have different characteristics, use different protocols or technologies. One of the most common ways to use Gateway is the provision of access from a local area network (LAN) to an external one (Internet).

Granule

The smallest indivisible data set in ClickHouse (always contains an integer number of rows — 8192 by default) that is read when selecting data.

HDFS

Hadoop Distributed File System — a file system designed to store large data distributed block-by-block across cluster nodes. All blocks in HDFS (except for the last file block) have the same size, and each block can be hosted on multiple nodes. The block size and replication factor (the number of nodes to which each block should be replicated) are defined in the file-level settings. Due to replication, the distributed system is resistant to failures of individual nodes.

Hive

Apache Hive — a distributed system for execution of SQL queries in the Apache Hadoop ecosystem.

Host

A computer or another device connected to a network. A host can work as a server providing information about resources, services, and applications to users or other hosts. Each host on a network is assigned at least one network address.

Indexes

Indexes in ADQM/ClickHouse are special data structures that provide fast search of requested data rows by values ​​of a key column (or set of columns) without a full table scan.

Inode

Index Descriptor — a data structure in traditional Unix file systems, such as UFS, ext4, etc. This structure stores meta information about standard files, directories, and other file system objects, except for the data itself and names.

Instance

A single copy of any software running on a single physical or virtual server. In object-oriented programming, this term is also used to refer to a class object.

IP

Internet Protocol Address — a unique network address of a node in a computer network built on the IP protocol stack.

Kafka

Apache Kafka — an open-source distributed message broker that implements a system for publishing and subscribing to messages.

Kerberos Authentication Server

An authentication server whose main function is to receive a request containing the name of a client requesting authentication and return an encrypted TGT to the client. The user can then use this TGT for further requests. In most Kerberos implementations, the TGT lifetime is 8-10 hours. After that, the client should request a TGT from the authentication server again.

Kerberos KDC

Key Distribution Center — a third-party authentication mechanism used by users and services to authenticate each other. It consists of three parts:

  • A database of users and services (known as principals) that the KDC has access to, and the corresponding Kerberos passwords.

  • Authentication Server (AS) that performs the initial authentication and issues a Ticket Granting Ticket (TGT).

  • Ticket Granting Server (TGS) — a server that issues subsequent tickets based on the initial TGT.

Kerberos keytab

A file containing one or more principals and their keys. It is used for authentication in the Kerberos infrastructure and allows users not to enter usernames and passwords manually.

Kerberos principal

A unique name of a user or service.

Kerberos realm

A Kerberos network that includes a KDC and several clients.

Kerberos TGS

Ticket Granting Server — a server for issuing grants or permissions.

Kerberos TGT

Ticket Granting Ticket — includes a copy of the session key, user name, and ticket expiration time. TGT is encrypted using the master key of the KDC and can only be decrypted by the KDC service itself.

LDAP

Lightweight Directory Access Protocol — a simple protocol that uses TCP/IP and allows authentication, search and compare operations, as well as operations for adding, modifying, or deleting records.

Materialized view

Materialized view in ADQM/ClickHouse — calculates intermediate aggregate states for data from a source table according to the SELECT query, which is specified in the view definition, and saves the results to its internal table or to a separate target table. Aggregate states in a view are updated automatically each time new data is inserted into a source table.

Metadata

A structured service information about the used data. Contains characteristics useful for identification, search, evaluation, and management.

MySQL

An open-source relational database management system.

NameNode

A lead server that manages the HDFS file system metadata. It is a program code that typically runs on a separate HDFS instance machine and is responsible for file operations (such as opening and closing files, creating and deleting directories, etc.). Besides that, NameNode is responsible for:

  • file system namespace management;

  • external clients access control;

  • providing correspondence between files and blocks replicated on DataNodes.

Node

A device connected to other devices via a network. It has its own IP address and can exchange data. Nodes can be computers, mobile phones, pocket computers, as well as special network devices (such as routers, switches, hubs, etc.).

NTP

Network Time Protocol — a network protocol for synchronizing the internal computer clock using networks with variable latency.

Part

A physical file on a disk that stores a part of data from a ClickHouse table. Do not confuse with a partition.

Partition

A set of records in a ClickHouse table, logically combined according to a criterion that a partition key defines.

Postgres

A superuser in PostgreSQL having maximum rights in all databases, including the right to create other users. Global rights can be changed at any time by the current superuser.

PostgreSQL

An open-source relational database management system.

Projection

An additional hidden table that stores data from a source ClickHouse table in an alternative form to be optimal for executing some type of queries.

RAID

Redundant Array of Independent Disks — a data virtualization technology that involves combining multiple disks into a logical element for redundancy and performance improvement.

Replica

A copy of data stored in a ClickHouse database. Also, this term refers to ClickHouse hosts in a cluster/shard that contain the same data.

Replication

A mechanism for synchronizing the contents of multiple copies of the same object (for example, the contents of a database). Duplicating data across multiple replica hosts provides higher data availability and increases system reliability.

Root

Superuser — a special account in Unix-like systems, the owner of which has the right to perform any and all operations.

Script

A set of instructions executed by the system. The difference between programs and scripts is quite blurry: a script is a program dealing with ready-made software components.

In a narrower sense, a scripting language is a specialized language for extending the capabilities of a command shell, a text editor, or operating system administration tools.

Self-signed certificate

A special type of digital certificate signed by its subject. Technically, such a certificate is no different from a certificate signed by a certification authority (CA), only that the user creates its own signature. A certificate creator is also the certification authority in this case. All root certificates of trusted CAs are self-signed.

Shard

A subset of data. A ClickHouse cluster always has at least one shard — if you do not split the data between multiple servers, it is stored in a single shard. Also, this term can refer to cluster nodes (servers or groups of servers) that store different parts of the same database.

Sharding

A database design principle that suggests locating parts of the same table on different shards. Sharding data across multiple servers allows the load to be distributed in such a way that the capacity of a single server is not exceeded.

Source code

A text of a computer program in any programming or markup language that can be read by a human. More generally, it is any input data for an interpreter/compiler.

Snapshot

A copy of files and directories of the file system (or database) at a certain point in time.

SSH

Secure Shell — an application-level network protocol that allows remote control of the operating system and tunneling of TCP connections (for example, to transfer files). It is similar in functionality to the Telnet and rlogin protocols, but, unlike them, it encrypts all traffic, including transmitted passwords. SSH allows you to choose different encryption algorithms. SSH clients and SSH servers are available for most network operating systems.

SSL

Secure Sockets Layer — a cryptographic protocol that implies secure communication. It uses asymmetric cryptography to authenticate exchange keys, symmetric encryption to maintain confidentiality, and message authentication codes to ensure message integrity.

Sudo

Substitute user and do — a program for system administration of Unix-like operating systems that allows delegating certain privileged resources to users with the maintenance of the work protocol. The main idea is to give users as few rights as possible, while enough to solve the tasks.

Su

Switch user — a command in Unix-like operating systems that allows a user to log in under a different name without terminating the current session. It is usually used by the superuser for temporary login to perform administrative work.

Table engine

Table engine in ClickHouse — a table type that determines how and where data is stored, what queries are supported and how. The most universal and functional table engine for working with big data in ADQM/ClickHouse is MergeTree.

TTL

ADQM/ClickHouse functionality that allows you to set a time interval (time to live) after which old data will be deleted from a table, moved to another disk/volume, rolled up, or compressed by a specified codec in the background.

UDF

User-defined functions — can be used to extend the built-in functionality of ADQM/ClickHouse to perform custom tasks using lambda expressions or by running external executable programs/scripts to process data.

URI

Uniform Resource Identifier — a unified sequence of characters that identifies an abstract or physical resource.

URL

Uniform Resource Locator — a uniform identifier for the location of an abstract or physical resource.

View

View in ADQM/ClickHouse — reads data from another table on each access (view does not store any data) by executing the SELECT query specified in the view definition. In other words, it is a saved query that can be used as a subquery in the FROM clause.

ZooKeeper

An open-source service for synchronization and coordination of distributed systems. In ADQM/ClickHouse, ZooKeeper is used for data replication and execution of distributed DDL queries.

Found a mistake? Seleсt text and press Ctrl+Enter to report it