Home
Arenadata DB
Tutorials
Arenadata DB Control
Configuration
Job policies

Configure job policies

Daria Barysheva

Contents

Data cleanup
Metrics offload
- Configuration
- Structure of offloaded data

The Configuration → Job policy page in the ADB Control web interface allows you to configure schedules for different system jobs. This page includes two tabs, each of which is described in detail below.

Data cleanup

On the Configuration → Job policy → Data cleanup tab, you can configure a schedule for removing old metrics from ADB Control.

The "Configuration → Job policy → Data cleanup" tab

To configure the job, follow the steps:

Activate the Data cleanup switcher.

Fill in the following parameters in the Main parameters section.

Field Description Default value

Batch size

A maximum number of rows in the data batch

500

Expire duration

A maximum time period that is used to store metrics in ADB Control. If this period expires, the metrics will be removed during the next job launch. Use the following format — <value><unit> (without spaces), where <unit> can have the following values:

ms — milliseconds;
sec — seconds;
min — minutes;
hr — hours;
d — days;
w — weeks.

Examples: 5min, 1hr, 1d. The minimal value is 100ms; the maximum is 1w

Data cleanup schedule

A schedule for automatic data cleanup (cron expression). The cron dark cron light icon allows you to edit the schedule in the separate form Cron expression generator

0 0 23 * * ? *

Click Apply. The Revert button is designed to undo the changes that have not been yet saved by clicking Apply.

If all steps are completed successfully, the next launch time of the Data cleanup job is updated on the Jobs → ADB Control → Schedule page.

CAUTION

The data cleanup job is enabled in ADB Control by default. It is not recommended to disable the job, since it can cause a lack of disk space.
For long-term storage of historical data, you can configure loading metrics to an external database.

Metrics offload

Configuration

CAUTION

Metrics offload to ADB as a target database cannot be configured if the gp_default_storage_options configuration parameter in the source ADB cluster has the appendonly=true value.
Ensure that you do not set up for metrics offload a database of the same ADB cluster that is monitored by ADB Control (where metrics are collected). It is necessary to avoid monitoring and exporting records about export itself.

On the Configuration → Job policy → Metrics offload tab, you can configure a schedule for uploading ADB Control metrics and events to an external database (PostgreSQL/ADB). To configure the job, follow the steps:

On the target host, ensure you have a database to store the uploaded metrics. Its name will be used when configuring the JDBC connection later.
Provide the ADB Control access to the selected external database. PostgreSQL requires the following record in the pg_hba.conf file.

Without SSL
With SSL

host    <database_name>  <user_name>       <adbc_address>      trust

where:

<database_name> — a database name in PostgreSQL. For example, adbc_metrics.
<user_name> — a user name in PostgreSQL. For example, postgres.
<adbc_address> — an IP address of the host where ADB Control is deployed (with a subnet number). For example, 10.92.43.139/32.

hostssl    all  all       0.0.0.0/0      md5

For more information on how to configure PostgreSQL for using SSL, see SSL encryption.

Go to the Configuration → Job policy → Metrics offload tab in the ADB Control interface.

The "Configuration → Job policy → Metrics offload" tab

The "Configuration → Job policy → Metrics offload" tab
Activate the Metrics offload switcher.
In the JDBC URL field, enter a URL of the JDBC connection to the host where the target database is located. For PostgreSQL, use the following format:
jdbc:postgresql://<external_ip>:5432/<db_name>
where:
- <external_ip> — an IP address of the host where the target database is located. For example, 10.92.41.7.
- <db_name> — a target database name. For example, postgres.
JDBC connection string

JDBC connection string
Click Connect. In the window that opens, fill in the following fields:
- User — a user name. In the following example, the default user postgres is used.
- Password — a user password.
  
  JDBC connection parameters
  
  JDBC connection parameters
Click Connect.

Fill in the following fields in the Main parameters section.

Field

Description

Default value

Export resource groups

The flag that indicates whether to export metrics that are related to resource groups (see resgroup_config_audit, resgroup_status, resgroup_status_per_segment)

False

Export sessions

The flag that indicates whether to export information on sessions (see session)

False

Batch size

A maximum number of rows in the data batch that is sent to an external database

1000

Interval

A maximum date range for metrics offload (in days)

Delay

The time delay that is used to look for new metrics to export (in minutes)

10min

Offload metrics job schedule

A schedule for the automatic data export to an external database (cron expression). The cron dark cron light icon allows you to edit the schedule in the separate form Cron expression generator

0 0 * * * ? *

Click Apply. The Revert button is designed to undo the changes that have not been yet saved by clicking Apply.

Data is filled

Data is filled
Get the following message on the successful result.

The successful result

The successful result

As a result, the adcc schema should be created in the external database. The tables in this schema will store metrics and events collected from ADB Control. To check whether all necessary objects are created, you can run the following query in PostgreSQL:

SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'adcc' ORDER BY  table_name;

Result:

         table_name
-----------------------------
 audit_auth
 audit_auth_adb
 audit_operation
 cluster
 cluster_query_metric
 query
 query_error
 relation_statistic
 resgroup_config_audit
 resgroup_status
 resgroup_status_per_segment
 session
 transaction
(13 rows)

If all steps are completed successfully, the new scheduled job Export job should be added to the Jobs → ADB Control → Schedule page.

IMPORTANT

In addition to the job schedule, you should specify the clusters and databases which metrics should be exported to the external database. For more information, see Cluster management.

Structure of offloaded data

The following tables are used in external databases to store metrics and events offloaded from ADB Control. Note that the conditions for offloading data to the listed tables are different:

audit_auth, audit_auth_adb, audit_operation, relation_statistic — it is enough to configure Export job on the Configuration → Job policy → Metrics offload tab.
cluster, cluster_query_metric, query, query_error, transaction — besides configuring Export job on the Configuration → Job policy → Metrics offload tab, it is necessary to specify the databases which metrics should be exported. You should do it on the Configuration → Clusters tab.
resgroup_config_audit, resgroup_status, resgroup_status_per_segment — when configuring Export job on the Configuration → Job policy → Metrics offload tab, you should set the Export resource groups flag.
session — besides configuring Export job on the Configuration → Job policy → Metrics offload tab (with the Export sessions checkbox selected), it is necessary to set the Sessions flag on the Configuration → General → Sessions tab.

audit_auth

This table stores user login attempts to ADB Control, as well as failed ADB authorizations. For more information on the below mentioned parameters, see the Authorizations section of the Audit article.

Column Description

A unique identifier of the authorization event

time

An event timestamp

username

A user name

session_time

A start timestamp of the session within which the current event took place

session

An identifier of the session within which the current event took place

host

IP address of the host from which the event was initiated

auth_type

An authorization type:

ADCC — ADB Control login attempt via basic authentication;
ADCC LDAP — ADB Control login attempt via LDAP authentication;
ADB — failed ADB login attempt;
ADB LDAP — failed ADB login attempt via LDAP authentication.

type

An event type. Possible values:

Auth
Logout
Session expired

result

An event result. Possible values:

Success
Failed

message

An error message

audit_auth_adb

This table contains additional information about failed login attempts to ADB.

Column Description

auth_id

An identifier of the authorization event. Refers to audit_auth.id (see audit_auth)

pid

An identifier of the process within which the current event took place

database

A name of the database to which the connection was made

port

The port number that was used for the current connection

segment

The content identifier of a segment (or master) instance. Corresponds to gp_segment_configuration.content. For more information, see the System catalog tables → gp_segment_configuration section of the Tables article

transaction

A transaction identifier

cmd_count

A command identifier

cluster

A name of the ADB cluster to which the connection was made

audit_operation

This table contains information about user operations in ADB Control. For more information on the below mentioned parameters, see the Operations section of the Audit article.

Column Description

A unique operation identifier

object_name

A name of the object on which the operation was performed

object_type

A type of the object on which the operation was performed. Possible values:

User — user;
User forbidden DB — assigning advanced permissions to a specific database (actual for users with the Advanced User role);
LDAP mapping — role mapping with LDAP group;
Transaction — transaction;
Resource group — resource group;
Cluster — cluster;
Job — ADB Control job;
System settings — system configuration parameters;
System user — system user;
Session — ADB session;
User session — ADB Control user session;
Cluster guc — cluster GUC.

operation_type

A type of the operation that was applied to the object_name object. Possible values:

Cancel
Create
Delete
Resource group change
Terminate
Update
Archive
Activate

operation_result

An operation result. Possible values:

Success
Failed
Denied

operation_time

An operation end timestamp

username

A name of the user who performed the operation

session

An identifier of the ADB Control user session

hostname

IP address of the host from which the operation was initiated

diffs

Information about changed attributes of the object_name object in the JSON format. For each changed attribute, the following data is stored:

name — an attribute name;
oldValue — a previous value;
newValue — a new value.

The example:

[
    {
        "name":"CLUSTER_ARCHIVED",
        "oldValue":true,
        "newValue":false
    }
]

cluster

This table contains information on ADB clusters selected in ADB Control for export of metrics and events to the external database.

Column

Description

A cluster unique identifier

name

A cluster name

cluster_query_metric

This table stores statistics of system resource consumption by commands at the cluster level. For more information, see the System metrics collected for commands section of the Monitoring of commands article.

Column Description

query_id

A command identifier. Refers to query.id (see query)

cpu_usage_total

The total CPU consumption during the command execution (in seconds)

cpu_usage_percent_sum

The CPU amount consumed at the end of the command execution (percentage)

cpu_usage_percent_avg

The average amount of CPU consumed across cluster segments at the end of the command execution (percentage)

cpu_usage_percent_skew

The indicator that displays the CPU consumption skew across cluster segments at the end of the command execution (percentage)

mem_sum

The RAM amount consumed at the end of the command execution (in bytes)

mem_max

The maximum amount of RAM consumed at the end of the command execution (in bytes)

mem_avg

The average amount of RAM consumed across cluster segments at the end of the command execution (in bytes)

mem_skew

The indicator that displays the RAM consumption skew across cluster segments at the end of the command execution (percentage)

vmem_sum

The virtual memory amount allocated at the end of the command execution (in bytes)

vmem_max

The maximum amount of virtual memory allocated at the end of the command execution (in bytes)

vmem_avg

The average amount of virtual memory allocated across cluster segments at the end of the command execution (in bytes)

vmem_skew

The indicator that displays the virtual memory consumption skew across cluster segments at the end of the command execution (percentage)

spill

The amount of spill files on cluster segment hosts (excluding master) at the end of the command execution (in bytes)

spill_hosts

The amount of spill files on cluster hosts (including master) at the end of the command execution (in bytes)

spill_max

The maximum amount of spill files produced on cluster segments at the end of the command execution (in bytes)

spill_skew

The indicator that displays the spill file amount skew at the end of the command execution (percentage)

read_bytes_total

The total amount of data read during the command execution (in bytes)

read_bytes_sum

The amount of data read at the end of the command execution (in bytes)

read_bytes_max

The maximum amount of data read at the end of the command execution (in bytes)

read_bytes_avg

The average amount of data read across cluster segments per second at the end of the command execution (in bytes)

read_bytes_skew

The indicator that displays the data reading skew across cluster segments at the end of the command execution (percentage)

write_bytes_total

The total amount of data written during the command execution (in bytes)

write_bytes_sum

The amount of data written at the end of the command execution (in bytes)

write_bytes_max

The maximum amount of data written at the end of the command execution (in bytes)

write_bytes_avg

The average amount of data written across cluster segments per second at the end of the command execution (in bytes)

write_bytes_skew

The indicator that displays the data writing skew across cluster segments at the end of the command execution (percentage)

query

This table contains information on completed SQL commands that were registered in ADB Control.

Column Description

query_id

A unique command identifier

status

A command status. Possible values:

Done — successfully completed commands.
Cancelled — cancelled commands (as a part of the transaction cancellation or termination).
Error — the commands during which some errors occurred.
Unknown — the commands, the final status of which is undefined. It is set on a timeout for queries in the Queued, Running, Cancelling statuses, for which there are no corresponding rows in the pg_stat_activity view.

pid

An identifier of the process, within which the command was started

submitted

The timestamp when the user submitted the command

run_at

A command start timestamp

finish_at

A command end timestamp

username

A name of the user who started the command

database

A name of the database where the command was started

resource_group

A name of the resource group that was used for the command

planner

A name of the query optimizer that was used to produce the query execution plan. Possible value:

GPORCA — GPORCA is used (optimizer = on).
Fallbacked — after the attempt to produce a plan via GPORCA (optimizer = on) failed, Postgres query optimizer is used.
Legacy — Postgres query optimizer is used (optimizer = off).

querytext

A text of the SQL query

plantext

A text of the command execution plan

session_id

An identifier of the session, within which the command was started

cluster_id

An identifier of the ADB cluster where the command was started. Refers to cluster.id (see cluster)

queued_time

The total time that the command was queued (in hours, minutes, seconds)

run_time

The total command duration (in hours, minutes, seconds)

sql_id

A common identifier for SQL commands with the same structure