Glossary
- ACL
-
Access Control List — it defines users or groups that can access a particular object, and operations that are allowed or prohibited for them to carry out over this object.
- AD
-
Active Directory — Directory Services for Windows Server family operating systems. Initially, it was created as an LDAP-compatible implementation of the Directory Service. However, starting with Windows Server 2008, it includes integration capabilities with other authorization services, performing an integrating and unifying role for them.
It allows administrators to apply group policies to ensure consistency in customizing the users work environment. It allows to deploy software on multiple computers via group policies or through System Center Configuration Manager (formerly — Microsoft Systems Management Server), install OS updates, application and server software on all computers of the network, using the Update Service of Windows Server. Stores data and environment settings in a centralized database. Active Directory networks can be of various sizes: from several tens to several million objects.
- API
-
Application programming interface — a set of ready-made classes, procedures, functions, structures, and constants, provided by an application (library, service) or operating system for use in external software products.
- Arenadata Unified Data Platform
-
An integrated set of enterprise-level components, based on open source solutions.
- Cache Directive
-
defines the contour for caching. Paths can specify either directories or files. Directories are not cached recursively, i.e. only files are cached in the first-level directory listing. Cache Directives also specify additional parameters, such as the replication factor cache and end time.
- Cache Pool
-
An administrative object, used to manage Cache Directive groups. Cache pools have Unix permissions that restrict access of users and groups to the pool.
- CLI
-
Command Line Interface — a kind of text user interface (TUI), in which instructions to the computer are given mainly by typing text strings (commands) from the keyboard. It is also known by the names console and terminal.
- Cluster
-
A group of servers and coordinating software, united logically, capable of processing identical requests, and used as a single resource.
- DataNode
-
The working server, representing the program code, that runs, as a rule, on a separate instance of HDFS and is responsible for file-level operations, such as writing, reading data, and executing commands, received from the NameNode: create, delete, replicate blocks, and so on.
Besides that, the DataNode usually performs:
-
Periodic sending of status messages (heartbeats).
-
Processing of read and write requests, received from Clients of the HDFS, since data comes from the rest of the cluster machines to the Client, bypassing the NameNode.
-
- Distribution Package
-
A form of software distribution. Usually contains programs for the first initialization of the system.
- DNS
-
Domain Name System — a distributed and hierarchical system, used to identify computers, services, and other resources, reachable through the Internet or other internet protocol networks. It is most often used to get an IP address by the host name (computer or device), obtain information about mail routing, service nodes, etc.
A distributed DNS database is maintained, using a hierarchy of DNS servers, that interact over the specific protocol.
- DNS Server
-
It is an application, designed to respond to DNS queries, using the appropriate protocol. Also, this term is called the host, on which the application is running.
- ECC memory
-
Error-correcting code memory — a type of computer memory, that automatically recognizes and corrects spontaneous changes (errors) in memory bits.
- FD
-
File Descriptor — a non-negative integer, that allows access to all I/O streams, which can be associated with files, directories, sockets, and FIFOs. When a process creates or opens an object by its name, it receives a descriptor, that gives it access to the object. Referencing an object by its descriptor is faster, than using its name.
- Firewall
-
It is a software package, used to protect the computer from hacking by hackers, as well as all kinds of viruses and Trojans. Thanks to this system, the degree of the network security is increased, and many attacks on the computer are reflected by filtering information packets.
- FreeIPA
-
A free and open source identity management system for Linux/UNIX networked environments. It is based on Fedora Linux, 389 Directory Server, MIT Kerberos, NTP, DNS, the DogTag certificate system, SSSD and other free/open-source components. FreeIPA is designed with an intent to provide the same services as Active Directory.
- FQDN
-
Fully Qualified Domain Name — a domain name, that has no ambiguities in its definition. Includes the names of all the parent domains in the DNS hierarchy. FQDN ends with a dot (for example, example.com.), i.e. includes the root domain name, which is unnamed.
- Gateway
-
It is a network device, designed to transfer the user traffic between two networks, which have different characteristics, use different protocols or technologies. One of the most common ways to use Gateway is the provision of access from a local area network (LAN) to an external one, such as the Internet.
- HAR
-
Hadoop archives.
- HBase
-
It is a non-relational distributed database with open source code, written in Java. Represents the analogue of Google BigTables.
It is developed as a part of the Apache Software Foundation Hadoop project. Runs on the top of HDFS and provides BigTable-like capabilities for Hadoop, i.e. provides a fail-safe way to store large volumes of sparse data.
- HDFS
-
Hadoop Distributed File System — a file system, designed to store large files, distributed block-by-block between nodes of the computing cluster. All blocks in HDFS (except the last file block) have the same size, and each block can be hosted on multiple nodes. The block size and replication factor (the number of nodes, on which each block should be located) are determined in the settings at the file level. Thanks to replication, the distributed system is resistant to failures of individual nodes.
- Heap
-
A dynamically allocated memory area, created at the start of the JVM.
- Host
-
A computer or other device connected to a computer network. A host may work as a server offering information resources, services, and applications to users or other hosts on the network. Hosts are assigned at least one network address.
- Inode
-
Index Descriptor — a data structure in traditional Unix file systems, such as: UFS, ext4, etc. This structure stores meta information about standard files, directories, and other file system objects, except for the data and name itself.
- Instance
-
It means a single copy of any software, running on a single physical or virtual server. In the object-oriented programming this term is also called an object of some class.
- IP
-
Internet Protocol Address — a unique network address of a node in a computer network, built on the basis of the TCP/IP protocol stack. The IP address consists of two parts: a network number and a node number.
- IOPS
-
Input/output operations per second — a number of input/output operations, performed by the data storage system per one second.
- JCE
-
Java Cryptography Extension — the officially released standard extension for the Java platform and part of the Java Cryptography Architecture (JCA). It is a set of packages, that provide implementation of cryptographic tasks, such as: data encryption and decryption, generation and verifying of management keys, as well as implementation of Message Authentication Code (MAC) algorithms.
- JMX
-
Java Management Extensions — the Java technology, designed for control and management applications, system objects, devices (for example, printers) and computer networks.
- JNI
-
Java Native Interface — a standard mechanism for running code under Java Virtual Machine (JVM), which is written in languages C/C++ or Assembler and compiled in the form of dynamic libraries. It allows users not to use static linking.
- Kerberos KDC
-
Key Distribution Center — a third-party authentication mechanism, that is used by users and services to authenticate each other.
It consists of three parts:
-
A database of users and services (known as principals), that KDC knows about, and the corresponding Kerberos passwords.
-
The Authentication Server (AS), that performs the initial authentication and issues a Ticket Granting Ticket (TGT).
-
Ticket Granting Server (TGS) — the server, that issues subsequent tickets, based on the initial TGT.
-
- Kerberos Authentication Server
-
Authentication Server, that performs one function: receives a request, containing the name of the Client, requesting authentication, and returns an encrypted TGT to him. Then the user can use this TGT for further requests. In most implementations of Kerberos, the TGT lifetime is 8-10 hours. After that, the Client should again request it from the Authentication Server.
- Kerberos Keytab
-
A file, containing one or more principals and their keys. It is used for authentication in the Kerberos infrastructure and allows not to enter usernames and passwords manually.
- Kerberos Principal
-
A unique name of a user or service.
- Kerberos Realm
-
A Kerberos network, that includes a KDC and several Clients.
- Kerberos TGS
-
Ticket Granting Server — a server for issuing grants or permissions.
- Kerberos TGT
-
Ticket Granting Ticket — includes the copy of the session key, user name and ticket expiration time. TGT is encrypted, using the own master key of the KDC, i.e. TGT can be decrypted only by KDC service itself.
- LDAP
-
Lightweight Directory Access Protocol — a simple protocol, that uses TCP/IP and allows authentication, search and compare operations, as well as operations for adding, modifying, or deleting records.
- MapReduce
-
A service for programming distributed computing within the MapReduce paradigm. The developer of an application for Hadoop MapReduce needs to implement a basic handler, that transforms the source key/value pairs on each computing node of the cluster to an intermediate set of key/value pairs (a class, implementing the Mapper interface), and a handler, that reduces an intermediate set of pairs into a final reduced set (a class implementing the Reducer interface).
- Metadata
-
A structured service information about the used data. Contains characteristics, useful for the purposes of their identification, search, evaluation, and management.
- MTBF
-
Mean time between failures — the average time from the end of the restoration of the functional state of the system after its failure, until the next failure occurs.
- MySQL
-
An open-source relational database management system.
- NameNode
-
A lead server, that manages the file system metadata. It is a program code, that runs, in general, on a separate HDFS instance machine and is responsible for file operations, such as: opening and closing files, creating and deleting directories, etc.
Besides that, NameNode is responsible for:
-
File system namespace management.
-
External Clients access control.
-
Providing correspondence between files and blocks, replicated on Data Nodes.
-
- Node
-
It is a device, connected to other devices via a network. It has its own IP address and is able to exchange data. Nodes can be computers, mobile phones, pocket computers, as well as special network devices, such as: routers, switches, hubs, and so on.
- NSCD
-
Name Service Caching Daemon — a daemon (service), that provides cache for the most common Name Service requests.
- NTP
-
Network Time Protocol — a network protocol for synchronizing the internal computer clock, using networks with variable latency.
- OpenJDK
-
A fully compatible Java Development Kit project, consisting exclusively of free and open source code.
- Over-Provisioning
-
A technology, used in solid-state drives to reserve free space for specific controller activities.
- PostgreSQL
-
A free object-relational database management system.
- Postgres
-
A superuser in PostgreSQL, having all rights in all databases, including the right to create other users. Global rights can be changed at any time by the current superuser.
- PSU
-
Power supply unit.
- RAID
-
Redundant Array of Independent Disks — a data virtualization technology, that combines several disks into a logical element for redundancy and enhancement performance.
- Replication
-
It is a mechanism for synchronizing the contents of multiple copies of the same object (for example, the contents of a database).
- REST
-
Representational State Transfer — an architectural style of interaction between components of a distributed application on the network via the protocol HTTP. It is a consistent set of constraints, that are taken, when designing a distributed hypermedia system.
- Root
-
Superuser — a special account in Unix-like systems, the owner of which has the right to perform all without excluding operations.
- RPM Package Manager
-
It means two entities: the format of software packages and the program, created to manage these packages. The program allows users to install, remove and update software.
- Script
-
A brief description of the actions, performed by the system. The difference between programs and scripts is quite blurry: the script is a program, dealing with ready-made software components.
In a narrower sense, a scripting language can be understood as a specialized language for expanding the capabilities of a command shell, a text editor or operating system administration tools.
- Self-signed certificate
-
A special type of the certificate, signed by its subject. Technically, this type is no different from a certificate, signed by the certification center (CC), only that the user creates its own signature. In this case the creator of the certificate is also the certification center. All root certificates of trusted CC are self-signed.
- Secondary NameNode
-
An HDFS node, that periodically saves the namespace and maintains the size of the HDFS modification log file within the certain limits on the NameNode.
Performs the following functions:
-
Copies the HDFS image (located in the FsImage file) and the transaction log of operations with file blocks (Edit Log) to a temporary folder.
-
Applies the changes, accumulated in the transaction log, to the HDFS image.
-
Writes a new FsImage to the NameNode, after which the Edit Log is cleared.
-
- Smoke Test
-
A minimal set of tests for obvious errors. It is usually performed by a programmer.
- Source code
-
A text of a computer program in any programming or markup language, that can be read by a human. In a generalized sense — any input data for the translator.
- Snapshot
-
A copy of files and directories of the file system (or database) at a certain point in time.
- SSH
-
Secure Shell — an application layer network protocol, that allows to perform remote control of the operating system and tunneling of TCP connections (for example, for files transfer). It is similar in functionality to the Telnet and rlogin protocols, but, unlike them, encrypts all traffic, including transmitted passwords. SSH allows a choice of different encryption algorithms. SSH сlients and SSH servers are available for most network operating systems.
- SSL
-
Secure Sockets Layer — a cryptographic protocol, that implies more secure communication. It uses asymmetric cryptography for authentication exchange keys, symmetric encryption for saving privacy, message authentication codes for integrity messages.
- Stack
-
An abstract data type, representing a list of elements, organized according to the LIFO principle (last in — first out).
- Sticky bit
-
An additional attribute of files or directories in Unix-like operating systems. Firstly, it was used to reduce the loading time of the most frequently used programs. Now, sticky bits are used mainly for directories to protect files in them.
- Sudo
-
Substitute user and do — a program for system administration Unix-like operating systems, that allows to delegate certain privileged resources for users with the maintenance of the work protocol. The main idea is to give users as few rights as possible, while enough to solve the tasks.
- Su
-
Switch user — a command in Unix-like operating systems, that allows a user to log in under a different name, without terminating the current session. Usually, it is used for temporary login by the superuser to perform administrative work.
- TCO
-
Total Cost of Ownership — the total value of the target costs, that are forced to be borne by the owner from the moment of the beginning of the implementation of some system into the state of ownership until the moment of withdrawal from the state of ownership.
- URI
-
Uniform Resource Identifier — a unified sequence of characters, identifying an abstract or physical resource.
- URL
-
Uniform Resource Locator — a unified locator of an abstract or physical resource.
- View
-
It is a virtual (logical) table, representing a named query, which will be substituted as a subquery, when using the view.
Unlike regular relational database tables, the view can not be considered as independent part of the data set, stored in the database. The content of the view is dynamically calculated, based on the real tables data. Any data changes in the real database table immediately reflect the content of all views, built on the basis of this table.
- YARN
-
Yet Another Resource Negotiator — a module, introduced with Hadoop version 2.0 and responsible for cluster resource management and planning tasks. If in the previous releases this function was integrated into the MapReduce module, where it was implemented as a single component (JobTracker), then YARN has a logically independent daemon — a resource scheduler (ResourceManager), abstracting all computing resources of the cluster and managing their provision to distributed processing applications.
YARN can manage both MapReduce programs and any other distributed applications, that support the appropriate programming interfaces. YARN provides the possibility of parallel execution of several different tasks within the cluster and their isolation (according to the principles of multi-tenancy).