Configure processors and process groups

Configure a processor

To configure a processor, right-click the processor and select the Configure option from the context menu. The configuration dialog opens with four different tabs, each of which is described below. After completing the processor setup, it is possible to apply the changes by clicking APPLY, or cancel all changes by clicking CANCEL.

CAUTION
  • After the processor is started, the context menu displayed for the processor contains the View configuration option instead of Configure. The processor configuration cannot be changed while the processor is running. Before tuning the processor again, you must stop the processor and wait for all of its active tasks to complete.

  • Inputting certain characters is not supported and is automatically filtered out as you type. The following characters and any unpaired Unicode surrogate codes are not preserved in any configuration:

[#x0], [#x1], [#x2], [#x3], [#x4], [#x5], [#x6], [#x7], [#x8], [#xB], [#xC], [#xE], [#xF], [#x10], [#x11], [#x12], [#x13], [#x14], [#x15], [#x16], [#x17], [#x18], [#x19], [#x1A], [#x1B], [#x1C], [#x1D], [#x1E], [#x1F], [#xFFFE], [#xFFFF]

SETTINGS tab

nifi config 01 dark
SETTINGS tab
nifi config 01 light
SETTINGS tab

Name — processor name. The default processor name is the same as the processor type. Next to the processor name is an Enabled flag indicating whether the processor is enabled.

Id — the unique processor identifier.

Type — the processor type.

Bundle — the NAR bundle.

Yield Duration — the period of time without a response from the remote service, after which the processor must "yield", which will prevent the processor from being scheduled to run for some period of time.

Penalty Duration — the period of time during which scheduled processor execution is prevented in case of an error in receiving or sending files to a remote service.

Bulletin Level — the lowest bulletin level that should be displayed in the user interface.

SCHEDULING tab

nifi config 02 dark
SCHEDULING tab
nifi config 02 light
SCHEDULING tab

Scheduling Strategy — the scheduling strategy. Possible options for scheduling components:

  • Timer driven — the processor will run at regular intervals. The processor startup interval is determined by the Startup Schedule parameter.

  • CRON driven — when using CRON driven scheduling mode, the processor is scheduled to run periodically, similar to timer driven scheduling mode. However, CRON-managed mode provides much more flexibility at the cost of increased configuration complexity. The CRON-managed scheduling value is a string of six required fields and one optional field, each separated by a space.

Concurrent Tasks — determines how many threads will use the processor. In other words, this determines how many FlowFiles should be processed by this processor at the same time.

Run Schedule — determines how often the processor should be scheduled to run. Valid values for this field depend on the selected planning strategy.

Execution — determines which node(s) the processor will be scheduled to execute on.

PROPERTIES tab

nifi config 03 dark
PROPERTIES tab
nifi config 03 light
PROPERTIES tab

The PROPERTIES tab provides a mechanism for customizing the behavior of a particular processor. There are no default properties. Each processor type must determine which properties make sense for its use case.

Setting processor properties is described in the article Work with attributes.

RELATIONSHIP tab

nifi config 04 dark
RELATIONSHIP tab
nifi config 04 light
RELATIONSHIP tab

Automatically Terminate / Retry Relationships:

  • Automatically Terminate — for a processor to be considered valid and able to run, each relationship defined by the processor must either be connected to a child component or automatically terminated. If the relationship terminates automatically, any FlowFile routed to that relationship will be removed from the flow and its processing will be considered complete. Any association that is already associated with a child component cannot be automatically terminated. The relation must first be removed from any connection that uses it. Also, for any relationship that is selected for auto-completion, the auto-complete status will be cleared (disabled) if the relationship is added to the connection.

  • Automatically Retry — users can also configure whether or not to retry FlowFiles routed to this link.

Number of Retry Attempts — for relationships configured to retry, this number specifies how many times the FlowFile will try to retry before it is routed to another location.

Retry Back Off Policy — when a FlowFile needs to be retried, the user can configure a retry back policy with two options:

  • Penalize — retries will occur on time, but the processor will continue to process other FlowFiles.

  • Yield — no other processing of the FlowFile will be performed until all retries have been made.

Retry Maximum Back Off Period — initial retries are based on the Yield Duration and Penalty Duration specified in the SETTINGS tab. The duration time is doubled many times for each successive retry attempt. This number specifies the maximum amount of time allowed before the next retry attempt.

COMMENTS tab

nifi config 05 dark
COMMENTS tab
nifi config 05 light
COMMENTS tab

The tab simply provides an area for users to include any comments that are appropriate for that component.

NOTE

For more information on configuring NiFi processors, see Configuring a processor.

Configure process groups

GENERAL tab

nifi config 06 dark
GENERAL tab
nifi config 06 light
GENERAL tab

Process Group Name — the process group name. This name appears at the top of the process group on the canvas, as well as in the breadcrumbs at the bottom of the user interface. For the root process group (i.e., the highest level group), this is also the name that appears as the title of the browser tab. Note that this information is visible to any other NiFi instance that remotely connects to that instance (using remote process groups, also known as Site-to-Site).

Process Group Parameter Context — the process group parameter context that is used to provide parameters to flow components. From this drop-down list, the user can select which settings context should be bound to this process group, and can optionally create a new context to bind to the process group.

Process Group Comments — process group comments. This provides a mechanism for adding any useful information about the process group.

Process Group FlowFile Concurrency — used to control how data is passed to the process group.

Three options are available:

  • Unbounded — the input ports in a process group will receive data as fast as possible, provided that backpressure does not prevent them from doing so.

  • Single FlowFile Per Node — the input ports will only pass one FlowFile at a time. Once this FlowFile enters a process group, no additional FlowFiles will be added until all FlowFiles have left the process group (either by being purged/auto-terminated or exited via exit port).

  • Single Batch Per Node — the input ports will behave the same as in Single FlowFile Per Node mode, but when receiving a FlowFile, the input ports will continue to receive all data until all queues feeding the input ports , will not be emptied. At this point, they will no longer contribute data to the process group until all data has finished processing and leaves the process group.

Process Group Outbound Policy — while the FlowFile Concurrency dictates how data should be delivered to the process group, the outbound policy controls the flow of data out of the process group. There are two options available:

  • Stream When Available — data arriving at the output port is immediately streamed from the process group, provided no backpressure is applied.

  • Batch Output — the output ports will not transmit data from the process group until all data in the process group has been queued at the output port (i.e. no data has left the process group until processing all data is completed). It doesn’t matter if all data is queued to the same output port, or if some data is queued to output port A and other data is queued to output port B. Both of these conditions are considered the same in terms of the completion of FlowFile processing.

Default Settings for Connections — the default FlowFile expiration.

Default Back Pressure Object Threshold — the default backpressure object threshold.

Default Back Pressure Data Size Threshold — the default backpressure data size threshold.

CONTROLLER SERVICES tab

nifi config 07 dark
CONTROLLER SERVICES tab
nifi config 07 light
CONTROLLER SERVICES tab

The tab displays controller services for data flows.

NOTE

For more information on configuring NiFi process groups, refer to Configuring a Process Group.

Found a mistake? Seleсt text and press Ctrl+Enter to report it