SSM configuration parameters
To configure the service, use the following configuration parameters in ADCM.
|
NOTE
|
| Parameter | Description | Default value |
|---|---|---|
Encryption enable |
Set to |
false |
Credential provider path |
Path to a keystore file used to encrypt credentials |
jceks://file/etc/ssm/conf/ssm.jceks |
Custom jceks |
Set to |
false |
| Parameter | Description | Default value |
|---|---|---|
smart.hadoop.conf.path |
Path to the Hadoop configuration directory |
/etc/hadoop/conf |
smart.conf.dir |
Path to the SSM configuration directory |
/etc/ssm/conf |
smart.server.rpc.address |
RPC address of the SSM Server |
0.0.0.0:7042 |
smart.file.access.count.aggregator.failover |
Failover strategy for the file access event aggregator. Possible values: |
SAVE_FAILED_WITH_RETRY |
smart.agent.master.address |
Active SSM server’s address |
<hostname> |
smart.agent.address |
Defines the address of SSM Agent components on each host |
0.0.0.0 |
smart.agent.port |
Port number used by SSM agents to communicate with the SSM Server |
7048 |
smart.agent.master.port |
Port number used by the SSM Server to communicate with SSM agents |
7051 |
smart.rest.server.port |
Port of the SSM REST server |
7045 |
smart.rest.server.security.enabled |
Enables or disables the SSM REST server security |
false |
smart.rest.server.auth.spnego.enabled |
Enables or disables the SPNEGO authentication for the SSM REST server |
false |
smart.rest.server.auth.predefined.enabled |
Enables or disables the basic authentication for users, listed in the |
false |
smart.rest.server.auth.predefined.users |
List of users and their credentials that have access to the SSM REST server if the |
— |
smart.ignore.dirs |
A list of comma-separated HDFS directories to ignore. SSM will ignore all files under the given HDFS directories |
— |
smart.cover.dirs |
A list of comma-separated HDFS directories where SSM scans for files. By default, all HDFS files are covered |
— |
smart.work.dir |
HDFS directory used by SSM as a working directory to store temporary files.
SSM will ignore HDFS |
/system/ssm |
smart.client.concurrent.report.enabled |
Used to enable/disable concurrent reports for Smart Client. If enabled, Smart Client concurrently attempts to connect to multiple configured Smart Servers to find the active Smart Server, which is an optimization. Only the active Smart Server will respond to establish the connection. If the report has been successfully delivered to the active Smart Server, connection attempts to other Smart Servers are canceled |
— |
smart.server.rpc.handler.count |
Number of RPC handlers on the server |
80 |
smart.namespace.fetcher.batch |
Batch size of the namespace fetcher. SSM fetches namespaces from the NameNode during the startup. Large namespaces may lead to long startup time. A larger batch size can speed up the fetcher efficiency and reduce the startup time |
500 |
smart.namespace.fetcher.producers.num |
Number of producers in the namespace fetcher |
3 |
smart.namespace.fetcher.consumers.num |
Number of consumers in the namespace fetcher |
6 |
smart.rule.executors |
Maximum number of rules that can be executed in parallel |
5 |
smart.cmdlet.executors |
Maximum number of cmdlets that can be executed in parallel |
10 |
smart.dispatch.cmdlets.extra.num |
Number of extra cmdlets dispatched by Smart Server |
10 |
smart.cmdlet.dispatchers |
Maximum number of cmdlet dispatchers that work in parallel |
3 |
smart.cmdlet.mover.max.concurrent.blocks.per.srv.inst |
Maximum number of file mover cmdlets that can be executed in parallel per SSM service.
The |
0 |
smart.action.move.throttle.mb |
The throughput limit (in MB) for the SSM move operation |
0 |
smart.action.copy.throttle.mb |
The throughput limit (in MB) for the SSM copy operation |
0 |
smart.action.ec.throttle.mb |
The throughput limit (in MB) for the SSM EC operation |
0 |
smart.action.local.execution.disabled |
Defines whether the active Smart Server can also execute actions like an agent.
If set to |
false |
smart.cmdlet.max.num.pending |
Maximum number of pending cmdlets in an SSM Server |
20000 |
smart.cmdlet.hist.max.num.records |
Maximum number of historic cmdlet records kept in an SSM server. SSM deletes the oldest cmdlets when this threshold is exceeded |
100000 |
smart.cmdlet.hist.max.record.lifetime |
Maximum lifetime of historic cmdlet records kept in an SSM server.
The SSM Server deletes cmdlet records after the specified interval.
Valid time units are |
30day |
smart.cmdlet.cache.batch |
Maximum batch size of the cmdlet batch insert |
600 |
smart.copy.scheduler.base.sync.batch |
Maximum batch size of the Copy Scheduler base sync batch insert |
500 |
smart.file.diff.max.num.records |
Maximum file diff records with useless state |
10000 |
smart.status.report.period |
The status report period for actions in milliseconds |
10 |
smart.status.report.period.multiplier |
The report period multiplied by this value defines the largest report interval |
50 |
smart.status.report.ratio |
If the finished actions ratio equals or exceeds this value, a status report will be triggered |
0.2 |
smart.top.hot.files.num |
Number of top hot files displayed in web UI |
200 |
smart.cmdlet.dispatcher.log.disp.result |
Defines whether to log dispatch results for each cmdlet dispatched |
false |
smart.cmdlet.dispatcher.log.disp.metrics.interval |
Time interval in milliseconds to log statistic metrics of the cmdlet dispatcher.
If no cmdlets were dispatched within this interval, no output is generated for this interval.
The |
5000 |
smart.compression.codec |
The default compression codec for SSM compression (Zlib, Lz4, Bzip2, snappy). You can also specify codecs as action arguments, which overrides this setting |
Zlib |
smart.compression.max.split |
Maximum number of chunks split for compression |
1000 |
smart.compact.batch.size |
Maximum number of small files to be compacted by the compact action |
200 |
smart.compact.container.file.threshold.mb |
Maximum size of a container file in MB |
1024 |
smart.access.count.day.tables.num |
Maximum number of tables that can be created in the Metastore database to store the file access count per day |
30 |
smart.access.count.hour.tables.num |
Maximum number of tables that can be created in the Metastore database to store the file access count per hour |
48 |
smart.access.count.minute.tables.num |
Maximum number of tables that can be created in the Metastore database to store the file access count per minute |
120 |
smart.access.count.second.tables.num |
Maximum number of tables that can be created in the Metastore database to store the file access count per second |
30 |
smart.access.event.fetch.interval.ms |
The interval in milliseconds between access event fetches |
1000 |
smart.cached.file.fetch.interval.ms |
The interval in milliseconds between fetches of cached files from HDFS |
5000 |
smart.namespace.fetch.interval.ms |
The interval in milliseconds between namespace fetches from HDFS |
1 |
smart.mover.scheduler.storage.report.fetch.interval.ms |
The interval in milliseconds between fetches of storage reports from HDFS DataNodes in the mover scheduler |
120000 |
smart.metastore.small-file.insert.batch.size |
Maximum size of the Metastore insert batch with information about small files |
200 |
smart.agent.master.ask.timeout.ms |
Maximum time in milliseconds for a Smart Agent to wait for a response from the Smart Server during the submission action |
5000 |
smart.ignore.path.templates |
A list of comma-separated regex templates of HDFS paths to be completely ignored by SSM |
— |
smart.internal.path.templates |
A list of comma-separated regex templates of internal files to be completely ignored by SSM |
.*/\..*,.*/__.*,.*_COPYING_.* |
smart.security.enable |
Enables Kerberos authentication for SSM |
false |
smart.server.keytab.file |
Path to the SSM Server’s keytab file |
— |
smart.server.kerberos.principal |
The SSM Server’s Kerberos principal |
— |
smart.agent.keytab.file |
Path to the SSM Agent’s keytab file |
— |
smart.agent.kerberos.principal |
The SSM Agent’s Kerberos principal |
— |
smart.rest.server.auth.spnego.principal |
SSM REST server Kerberos principal |
— |
smart.rest.server.auth.spnego.keytab |
SSM REST server keytab |
— |
smart.proxy.user.strategy |
The scope of the LDAP user search. Possible values:
|
DISABLED |
smart.proxy.users.cache.ttl |
Minimum amount of time that must pass after the last access to a proxy users cache entry before it is evicted. The value must be specified in the
|
2m |
smart.proxy.users.cache.size |
Maximum size of the proxy users cache |
20 |
| Parameter | Description | Default value |
|---|---|---|
LD_LIBRARY_PATH |
Path to extra native libraries for SSM |
/usr/lib/hadoop/lib/native |
HADOOP_HOME |
Path to the Hadoop home directory |
/usr/lib/hadoop |
| Parameter | Description | Default value |
|---|---|---|
Enable SmartFileSystem for Hadoop |
When enabled, requests from different clients (Spark, HDFS, Hive, etc.) are taken into account when calculating |
false |
log4j.properties |
The contents of the log4j.properties configuration file |
— |
Custom smart-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file smart-site.xml |
— |
Custom smart-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file smart-env.sh |
— |
| Parameter | Description | Default value |
|---|---|---|
db_url |
The URL to the Metastore database |
jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/ssm |
db_user |
The user name to connect to the database |
ssm |
db_password |
The user password to connect to the database |
— |
initialSize |
The initial number of connections created when the pool is started |
10 |
minIdle |
Minimum number of established connections that should be kept in the pool at all times. The connection pool can shrink below this number if validation queries fail |
4 |
maxActive |
Maximum number of active connections that can be allocated from this pool at the same time |
50 |
maxWait |
Maximum time in milliseconds the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception |
60000 |
timeBetweenEvictionRunsMillis |
Time in milliseconds to sleep between the runs of the idle connection validation/cleaner thread. This value should not be set less than 1 second. It specifies how often to check for idle and abandoned connections, and how often to validate idle connections |
90000 |
minEvictableIdleTimeMillis |
Minimum amount of time an object may remain idle in the pool before it is eligible for eviction |
300000 |
validationQuery |
The SQL query used to validate connections from the pool before returning them to the caller |
SELECT 1 |
testWhileIdle |
Indicates whether connection objects are validated by the idle object evictor (if any) |
true |
testOnBorrow |
Indicates whether objects are validated before being borrowed from the pool |
false |
testOnReturn |
Indicates whether objects are validated before being returned to the pool |
false |
poolPreparedStatements |
Enables the prepared statement pooling |
true |
maxPoolPreparedStatementPerConnectionSize |
Maximum number of prepared statements that can be pooled per connection |
30 |
removeAbandoned |
A flag to remove abandoned connections if they exceed |
true |
removeAbandonedTimeout |
Timeout in seconds before an abandoned (in use) connection can be removed |
180 |
logAbandoned |
A flag to log stack traces for application code which abandoned a connection. Logging of abandoned connections adds extra overhead for every borrowed connection |
true |
filters |
Sets the filters that are applied to the data source |
stat |