Processor types in NiFi

This article describes the existing types of NiFi processors.

NOTE

For information about the architecture of the core NiFi objects and their interaction, you can refer to the NiFi objects article.

Data transformation

Processors belonging to the Data Transformation type transform FlowFile contents.

The processor copies the content from Content Repository, transforms the data according to its own logic, writes the new data to the Content Repository, and stores a link to the changed data in the FlowFile.

FlowFiles whose contents were successfully transformed are sent to the connection queue, which has the For Relationships set to success.

Data transformation
Data transformation
Data transformation
Data transformation

Examples of processors and their uses:

  • replacing all or part of the text using a regular expression — ReplaceText;

  • compressing or decompressing content — CompressContent;

  • converting characters from one character set to another — ConvertCharacterSet.

Routing and mediation

Processors belonging to the Routing and Mediation type are used for routing FlowFiles to different processors according to information in the attributes or contents of those FlowFiles.

The processor uses user-specified and proprietary logic (based on attributes or content, data rate, duplicate tracking, etc.) to control the movement of FlowFiles.

FlowFiles that match the given condition are sent to the connection queue that has the For Relationships set to success (matched, duplicate - depending on the processor) .

Routing and mediation
Routing and mediation
Routing and mediation
Routing and mediation

Examples of processors and their uses:

Database access

Processors belonging to the Database Access type are designed to operate with databases.

The processor executes an SQL request, depending on its own logic, and can use the contents of the incoming FlowFile as the contents of the request, or write the result of the query as the contents of the outgoing FlowFile.

Also, some processors can prepare queries by converting from other formats (ConvertJSONToSQL).

When a request is successfully completed, FlowFiles are sent to the connection queue that has the For Relationships set to success.

Database access
Database access
Database access
Database access

Examples of processors and their uses:

  • converting a JSON document into an SQL INSERT or UPDATE command, which can then be passed to the PutSQL processor — ConvertJSONToSQL;

  • updating the database using the contents of FlowFile in the command — PutSQL;

  • executing an SQL SELECT command and writing the results to the contents of a FlowFile in Avro format — ExecuteSQL.

Attribute extraction

Processors belonging to the Attribute Extraction type are created again or changed FlowFile attributes.

The processor uses a user-specified condition and its own logic (regular expression, hash function calculation, etc.) to change existing attributes or extract attributes from the contents of the FlowFile.

FlowFiles whose attributes were successfully converted are sent to the connection queue that has the For Relationships set to success (or matched - depending on the processor).

Attribute extraction
Attribute extraction
Attribute extraction
Attribute extraction

Examples of processors and their uses:

  • extracting attributes from the FlowFile content text according to regular expressions specified in custom processor properties — ExtractText;

  • updating FlowFile attributes using NiFi Expression Language or based on user-specified regular expressions — UpdateAttribute;

  • calculating a cryptographic hash value for the FlowFile content using the specified algorithm and writing the value to the output attribute — CryptographicHashContent.

System interaction

Processors belonging to the System Interaction type run processes in the operating system or scripts in various development environments.

The processor runs an operating system command or runs a script with or without the contents of the FlowFile. At the same time, it creates or modifies a FlowFile, the contents of which record the result of executing a command or script.

Upon successful command processing, the created FlowFiles are sent to the connection queue that has the For Relationships set to success.

System interaction
System interaction
System interaction
System interaction

Examples of processors and their uses:

  • running a user-specified operating system command and creating a FlowFile as a result of running the command — ExecuteProcess;

  • running a user-specified operating system command using the contents of the incoming FlowFile as StdIn and writing StdOut to the outgoing FlowFile — ExecuteStreamCommand.

Data ingestion

Processors belonging to the Data Ingestion type accept data into the NiFi data flow.

The processor creates a new FlowFile, writes the contents of the original file (or other source) to the Content Repository and creates the FlowFile attributes. Typically, such processors are the starting point of the data flow in Apache NiFi.

If contents are successfully written and attributes are created, FlowFiles are sent to the connection queue that has For Relationships set to success.

Data ingestion
Data ingestion
Data ingestion
Data ingestion

Examples of processors and their uses:

  • transfering the contents of a file from a local drive (or network drive) as the contents of the created FlowFile and then deleting the original file — GetFile;

  • subscribing to one or more Apache Kafka topics with the creation of a FlowFile, the contents of which are saved as an Apache Kafka message — ConsumeKafka.

Data egress/Sending data

Processors belonging to the Data Egress/Sending Data type send data to the destination server.

The processor passes the contents of the FlowFile to the destination, and also sends attributes as additional headers if specified by the user in the processor settings. Typically, such processors are the endpoint of the data flow in Apache NiFi.

If the data is sent successfully and the attributes are created, the FlowFiles are deleted.

Data egress
Data egress
Data egress
Data egress

Examples of processors and their uses:

  • sending the contents of the incoming FlowFile by email to configured recipients — PutEmail;

  • sending the contents of FlowFile as a message to Apache Kafka — PublishKafka;

  • writing the contents of FlowFile to a local system file — PutFile.

Splitting and aggregation

Processors belonging to the Splitting and Aggregation type split or aggregate contents of FlowFile.

When splitting, the processor creates new FlowFiles, the contents of which are written to the data obtained as a result of the split. The division occurs in accordance with a user-specified condition and the processors own logic. The attributes for each new FlowFile are usually data describing the resulting segment of content — size, ID, number of lines, etc.

During aggregation, the processor creates a new FlowFile, the contents of which combine the contents of all incoming files. The attributes for the new FlowFile are data describing the file resulting from the merge: file name, number of merged FlowFiles, packet lifetime before sending, etc.

The processor writes the contents of new FlowFiles to the Content Repository and stores a reference to them in each FlowFile.

When splitting or aggregating content, new FlowFiles are sent to the connection queue that has For Relationships set to success (splits, segments, merged — depending on the processor).

Splitting
Splitting
Splitting
Splitting

Examples of processors and their uses:

  • Unpacking archives of various types, such as ZIP and TAR, with each file after unpacking being transferred as a separate FlowFile — UnpackContent.

  • Combining multiple FlowFiles into one FlowFile in accordance with user-specified conditions, for example based on a common attribute. In this case, the minimum and maximum size of each packet can be set, as well as a timeout for waiting for the packet to be filled — MergeContent.

  • Splitting a FlowFile into potentially multiple FlowFiles based on a specific byte sequence by which to split the contents — SplitContent.

HTTP

Processors belonging to the HTTP type process HTTP and HTTPS requests.

For incoming requests, the processor creates a FlowFile and writes the contents of the request to the contents of the FlowFile. If the processor makes an outgoing request (PUT, POST, or PATCH), the contents of the FlowFile are sent as the body of the message.

Successfully created FlowFiles from incoming requests are sent to the connection queue that has the For Relationships set to success. For the universal processor responsible for receiving and sending requests (InvokeHTTP), several For Relationships values ​​(original, failure, response, etc.) can be configured for the connection, where FlowFiles can be distributed in depending on the HTTP response status code.

HTTP requests
HTTP requests
HTTP requests
HTTP requests

Examples of processors and their uses:

  • running an HTTP (or HTTPS) server and listening for incoming connections — ListenHTTP;

  • interacting with a custom HTTP endpoint using the GET method, where the response body becomes the content of the generated FlowFile — InvokeHTTP.

Amazon Web Services

Processors belonging to the Amazon Web Services type are responsible for interacting with the web system Amazon services.

The processor performs actions in accordance with its own logic — retrieves the contents of AWS objects or messages, sends the contents of FlowFile as the contents of an AWS object or notifications. When saving data from AWS to the contents of a FlowFile, parameters that describe the data are saved as attributes: identifier, hash sum, etc.

When a request is successfully completed, FlowFiles are sent to a connection queue that has the For Relationships set to success.

AWS
AWS
AWS
AWS

Examples of processors and their uses:

  • retrieving the contents of an object stored in Amazon Simple Storage — FetchS3Object;

  • writing the contents of FlowFile to an Amazon S3 object — PutS3Object.

Found a mistake? Seleсt text and press Ctrl+Enter to report it