SSM configuration parameters

To configure the service, use the following configuration parameters in ADCM.

NOTE
  • Some of the parameters become visible in the ADCM UI after the Advanced flag has been set.

  • The parameters that are set in the Custom group will overwrite the existing parameters even if they are read-only.

Credentials Encryption
Parameter Description Default value

Encryption enable

Set to true to enable credentials encryption

false

Credential provider path

Path to a keystore file used to encrypt credentials

jceks://file/etc/ssm/conf/ssm.jceks

Custom jceks

Set to true to use a custom JCEKS file. Set to false to use the auto-generated JCEKS keystore

false

smart-site.xml
Parameter Description Default value

smart.hadoop.conf.path

Path to the Hadoop configuration directory

/etc/hadoop/conf

smart.conf.dir

Path to the SSM configuration directory

/etc/ssm/conf

smart.server.rpc.address

RPC address of the SSM Server

0.0.0.0:7042

smart.file.access.count.aggregator.failover

Failover strategy for the file access event aggregator. Possible values: FAIL — throws an exception, no failover. SAVE_FAILED_WITH_RETRY — saves all file access events that caused the exception

SAVE_FAILED_WITH_RETRY

smart.agent.master.address

Active SSM server’s address

<hostname>

smart.agent.address

Defines the address of SSM Agent components on each host

0.0.0.0

smart.agent.port

Port number used by SSM agents to communicate with the SSM Server

7048

smart.agent.master.port

Port number used by the SSM Server to communicate with SSM agents

7051

smart.rest.server.port

Port of the SSM REST server

7045

smart.rest.server.security.enabled

Enables or disables the SSM REST server security

false

smart.rest.server.auth.spnego.enabled

Enables or disables the SPNEGO authentication for the SSM REST server

false

smart.rest.server.auth.predefined.enabled

Enables or disables the basic authentication for users, listed in the smart.rest.server.auth.predefined.users option

false

smart.rest.server.auth.predefined.users

List of users and their credentials that have access to the SSM REST server if the smart.rest.server.auth.predefined.enabled parameter is set to True

 — 

smart.ignore.dirs

A list of comma-separated HDFS directories to ignore. SSM will ignore all files under the given HDFS directories

 — 

smart.cover.dirs

A list of comma-separated HDFS directories where SSM scans for files. By default, all HDFS files are covered

 — 

smart.work.dir

HDFS directory used by SSM as a working directory to store temporary files. SSM will ignore HDFS inotify events for all files under the working directory. Only one directory can be set

/system/ssm

smart.client.concurrent.report.enabled

Used to enable/disable concurrent reports for Smart Client. If enabled, Smart Client concurrently attempts to connect to multiple configured Smart Servers to find the active Smart Server, which is an optimization. Only the active Smart Server will respond to establish the connection. If the report has been successfully delivered to the active Smart Server, connection attempts to other Smart Servers are canceled

 — 

smart.server.rpc.handler.count

Number of RPC handlers on the server

80

smart.namespace.fetcher.batch

Batch size of the namespace fetcher. SSM fetches namespaces from the NameNode during the startup. Large namespaces may lead to long startup time. A larger batch size can speed up the fetcher efficiency and reduce the startup time

500

smart.namespace.fetcher.producers.num

Number of producers in the namespace fetcher

3

smart.namespace.fetcher.consumers.num

Number of consumers in the namespace fetcher

6

smart.rule.executors

Maximum number of rules that can be executed in parallel

5

smart.cmdlet.executors

Maximum number of cmdlets that can be executed in parallel

10

smart.dispatch.cmdlets.extra.num

Number of extra cmdlets dispatched by Smart Server

10

smart.cmdlet.dispatchers

Maximum number of cmdlet dispatchers that work in parallel

3

smart.cmdlet.mover.max.concurrent.blocks.per.srv.inst

Maximum number of file mover cmdlets that can be executed in parallel per SSM service. The 0 value removes the limit

0

smart.action.move.throttle.mb

The throughput limit (in MB) for the SSM move operation

0

smart.action.copy.throttle.mb

The throughput limit (in MB) for the SSM copy operation

0

smart.action.ec.throttle.mb

The throughput limit (in MB) for the SSM EC operation

0

smart.action.local.execution.disabled

Defines whether the active Smart Server can also execute actions like an agent. If set to true, the active SSM Server will NOT be able to execute actions. This configuration has no impact on a standby Smart Server

false

smart.cmdlet.max.num.pending

Maximum number of pending cmdlets in an SSM Server

20000

smart.cmdlet.hist.max.num.records

Maximum number of historic cmdlet records kept in an SSM server. SSM deletes the oldest cmdlets when this threshold is exceeded

100000

smart.cmdlet.hist.max.record.lifetime

Maximum lifetime of historic cmdlet records kept in an SSM server. The SSM Server deletes cmdlet records after the specified interval. Valid time units are day, hour, min, sec. The minimum update granularity is 5sec

30day

smart.cmdlet.cache.batch

Maximum batch size of the cmdlet batch insert

600

smart.copy.scheduler.base.sync.batch

Maximum batch size of the Copy Scheduler base sync batch insert

500

smart.file.diff.max.num.records

Maximum file diff records with useless state

10000

smart.status.report.period

The status report period for actions in milliseconds

10

smart.status.report.period.multiplier

The report period multiplied by this value defines the largest report interval

50

smart.status.report.ratio

If the finished actions ratio equals or exceeds this value, a status report will be triggered

0.2

smart.top.hot.files.num

Number of top hot files displayed in web UI

200

smart.cmdlet.dispatcher.log.disp.result

Defines whether to log dispatch results for each cmdlet dispatched

false

smart.cmdlet.dispatcher.log.disp.metrics.interval

Time interval in milliseconds to log statistic metrics of the cmdlet dispatcher. If no cmdlets were dispatched within this interval, no output is generated for this interval. The 0 value disables the logger

5000

smart.compression.codec

The default compression codec for SSM compression (Zlib, Lz4, Bzip2, snappy). You can also specify codecs as action arguments, which overrides this setting

Zlib

smart.compression.max.split

Maximum number of chunks split for compression

1000

smart.compact.batch.size

Maximum number of small files to be compacted by the compact action

200

smart.compact.container.file.threshold.mb

Maximum size of a container file in MB

1024

smart.access.count.day.tables.num

Maximum number of tables that can be created in the Metastore database to store the file access count per day

30

smart.access.count.hour.tables.num

Maximum number of tables that can be created in the Metastore database to store the file access count per hour

48

smart.access.count.minute.tables.num

Maximum number of tables that can be created in the Metastore database to store the file access count per minute

120

smart.access.count.second.tables.num

Maximum number of tables that can be created in the Metastore database to store the file access count per second

30

smart.access.event.fetch.interval.ms

The interval in milliseconds between access event fetches

1000

smart.cached.file.fetch.interval.ms

The interval in milliseconds between fetches of cached files from HDFS

5000

smart.namespace.fetch.interval.ms

The interval in milliseconds between namespace fetches from HDFS

1

smart.mover.scheduler.storage.report.fetch.interval.ms

The interval in milliseconds between fetches of storage reports from HDFS DataNodes in the mover scheduler

120000

smart.metastore.small-file.insert.batch.size

Maximum size of the Metastore insert batch with information about small files

200

smart.agent.master.ask.timeout.ms

Maximum time in milliseconds for a Smart Agent to wait for a response from the Smart Server during the submission action

5000

smart.ignore.path.templates

A list of comma-separated regex templates of HDFS paths to be completely ignored by SSM

 — 

smart.internal.path.templates

A list of comma-separated regex templates of internal files to be completely ignored by SSM

.*/\..*,.*/__.*,.*_COPYING_.*

smart.security.enable

Enables Kerberos authentication for SSM

false

smart.server.keytab.file

Path to the SSM Server’s keytab file

 — 

smart.server.kerberos.principal

The SSM Server’s Kerberos principal

 — 

smart.agent.keytab.file

Path to the SSM Agent’s keytab file

 — 

smart.agent.kerberos.principal

The SSM Agent’s Kerberos principal

 — 

smart.rest.server.auth.spnego.principal

SSM REST server Kerberos principal

 — 

smart.rest.server.auth.spnego.keytab

SSM REST server keytab

 — 

smart.proxy.user.strategy

The scope of the LDAP user search. Possible values:

  • DISABLED — impersonation is disabled, all actions are performed by the SSM node user (either the Kerberos principal or the user who started SSM).

  • NODE_SCOPE — impersonation is enabled at the node level, all actions are performed by the user specified in the smart.proxy.user option.

  • CMDLET_SCOPE — impersonation is enabled at the cmdlet level, all actions are performed by the cmdlet owner (currently, the cmdlet creator).

DISABLED

smart.proxy.users.cache.ttl

Minimum amount of time that must pass after the last access to a proxy users cache entry before it is evicted. The value must be specified in the [Amount][TimeUnit] format, where Amount is a number and TimeUnit is one of the following:

  • day or d — for days;

  • hour or h — for hours;

  • min or m — for minutes;

  • sec or s — for seconds.

2m

smart.proxy.users.cache.size

Maximum size of the proxy users cache

20

smart-env.sh
Parameter Description Default value

LD_LIBRARY_PATH

Path to extra native libraries for SSM

/usr/lib/hadoop/lib/native

HADOOP_HOME

Path to the Hadoop home directory

/usr/lib/hadoop

Other
Parameter Description Default value

Enable SmartFileSystem for Hadoop

When enabled, requests from different clients (Spark, HDFS, Hive, etc.) are taken into account when calculating AccessCount for files. Otherwise, the AccessCount value gets incremented only when a file is accessed from SSM

false

log4j.properties

The contents of the log4j.properties configuration file

 — 

Custom smart-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file smart-site.xml

 — 

Custom smart-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file smart-env.sh

 — 

SSM Server component
Druid configuration
Parameter Description Default value

db_url

The URL to the Metastore database

jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/ssm

db_user

The user name to connect to the database

ssm

db_password

The user password to connect to the database

 — 

initialSize

The initial number of connections created when the pool is started

10

minIdle

Minimum number of established connections that should be kept in the pool at all times. The connection pool can shrink below this number if validation queries fail

4

maxActive

Maximum number of active connections that can be allocated from this pool at the same time

50

maxWait

Maximum time in milliseconds the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception

60000

timeBetweenEvictionRunsMillis

Time in milliseconds to sleep between the runs of the idle connection validation/cleaner thread. This value should not be set less than 1 second. It specifies how often to check for idle and abandoned connections, and how often to validate idle connections

90000

minEvictableIdleTimeMillis

Minimum amount of time an object may remain idle in the pool before it is eligible for eviction

300000

validationQuery

The SQL query used to validate connections from the pool before returning them to the caller

SELECT 1

testWhileIdle

Indicates whether connection objects are validated by the idle object evictor (if any)

true

testOnBorrow

Indicates whether objects are validated before being borrowed from the pool

false

testOnReturn

Indicates whether objects are validated before being returned to the pool

false

poolPreparedStatements

Enables the prepared statement pooling

true

maxPoolPreparedStatementPerConnectionSize

Maximum number of prepared statements that can be pooled per connection

30

removeAbandoned

A flag to remove abandoned connections if they exceed removeAbandonedTimeout

true

removeAbandonedTimeout

Timeout in seconds before an abandoned (in use) connection can be removed

180

logAbandoned

A flag to log stack traces for application code which abandoned a connection. Logging of abandoned connections adds extra overhead for every borrowed connection

true

filters

Sets the filters that are applied to the data source

stat

Found a mistake? Seleсt text and press Ctrl+Enter to report it