Bryan Bende2022-10-06T23:31:05+00:00http://bbende.github.ioBryan Bendebbende@gmail.comApache NiFi 1.18.0 - Using the HashiCorp Vault Parameter Provider2022-09-25T00:00:00+00:00http://bbende.github.io/development/2022/09/25/apache-nifi-1-18-0-hashicorp-vault-parameter-provider
<h4 id="guest-author-joe-gresock">(Guest author: Joe Gresock)</h4>
<p>This post walks through setting up HashiCorp Vault Key/Value Secrets for use with the HashiCorpVaultParameterProvider in Apache NiFi 1.18.0.</p>
<h3 id="introduction">Introduction</h3>
<p>In a <a href="https://bryanbende.com/development/2021/11/08/apache-nifi-1-15-0-hashicorp-vault-secrets">previous post</a>, we walked through protecting NiFi configuration properties using HashiCorp Vault’s Key/Value Secrets Engine. Now with Apache NiFi 1.18.0, we can now derive Parameter Contexts from HashiCorp Vault
Key/Value secrets using the new HashiCorp Vault <a href="https://bryanbende.com/development/2022/09/24/apache-nifi-1-18-0-parameter-providers">Parameter Provider</a>.</p>
<h3 id="setup">Setup</h3>
<p>To get started, we’ll need to download and install HashiCorp Vault and NiFi 1.18.0.</p>
<ul>
<li>First, follow the relevant Vault installation guide for your system: <a href="https://www.vaultproject.io/downloads">https://www.vaultproject.io/downloads</a></li>
<li>Then head over to the Apache nifi downloads page: <a href="https://nifi.apache.org/download.html">https://nifi.apache.org/download.html</a>
<ul>
<li>Download <code class="language-plaintext highlighter-rouge">nifi-toolkit-1.18.0-bin.zip</code></li>
</ul>
</li>
</ul>
<p>Unzip <code class="language-plaintext highlighter-rouge">nifi-1.18.0-bin.zip</code> in the same parent directory, and then start NiFi:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unzip nifi-1.18.0-bin.zip
cd nifi-1.18.0
./bin/nifi.sh start
</code></pre></div></div>
<p>We’ll start Vault using development mode, which allows us to easily interact with the Vault server without having to <a href="https://www.vaultproject.io/docs/concepts/seal">unseal</a> the server. Run the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault server -dev > init.log
</code></pre></div></div>
<p>This will launch Vault in development mode, and will write something like the following to init.log:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.
You may need to set the following environment variable:
$ export VAULT_ADDR='http://127.0.0.1:8200'
The unseal key and root token are displayed below in case you want to
seal/unseal the Vault or re-authenticate.
Unseal Key: fUJ6zxqlo+vsWFlCxMK9MKrhgpsmxdNla727mu+5nuY=
Root Token: s.3qsioJdA3YXptvEQURDCijUw
Development mode should NOT be used in production installations!
</code></pre></div></div>
<p>In another terminal, create a file containing this root token, which we’ll use later:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>TOKEN=$(grep "Root Token" init.log | cut -d ":" -f2 | xargs)
echo "vault.token=$TOKEN" > /tmp/nifi-vault.properties
echo "vault.uri=http://localhost:8200" >> /tmp/nifi-vault.properties
</code></pre></div></div>
<p>Run the following to enable the K/V (version 1) Secrets Engine, which actually stores sensitive values in the Vault server:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault secrets enable kv
</code></pre></div></div>
<p>Here the Vault path for this secrets engine will be <code class="language-plaintext highlighter-rouge">kv</code>. We will see this used later in our NiFi configuration. If we had wanted to
use a different Vault path, for example, <code class="language-plaintext highlighter-rouge">nifi-kv</code>, we could have used the command <code class="language-plaintext highlighter-rouge">vault secrets enable -path nifi-kv kv</code>.</p>
<h3 id="configuring-hashicorp-vault-parameter-provider-and-controller-service">Configuring HashiCorp Vault Parameter Provider and Controller Service</h3>
<p>Now that we have Vault running and configured, we’ll add a Parameter Provider in NiFi to connect to our Vault instance.</p>
<p>First, visit NiFi’s UI: <code class="language-plaintext highlighter-rouge">https://localhost:8443/nifi</code>. Copy the generated username and password from <code class="language-plaintext highlighter-rouge">logs/nifi-app.log</code> in order to log in.</p>
<p>Then, select Controller Settings from the top-right hamburger menu in NiFi:</p>
<p><img src="/assets/images/nifi-parameter-providers/01-controller-settings-menu.png" class="img-responsive" width="30%" height="30%" /></p>
<p>Then go to the Parameter Providers tab and click the ‘+’ button to add a new provider. Select <code class="language-plaintext highlighter-rouge">HashiCorpVaultParameterProvider</code> and then ‘Add’:</p>
<p><img src="/assets/images/nifi-parameter-providers/02-create-vault-provider.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Click the pencil icon to edit the provider, and then go to the ‘Properties’ tab. In the ‘HashiCorp Vault Client Service’ property,
create a new controller service:</p>
<p><img src="/assets/images/nifi-parameter-providers/03-create-new-service.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Click the ‘Create’ button:</p>
<p><img src="/assets/images/nifi-parameter-providers/04-create-controller-service.png" class="img-responsive" width="50%" height="50%" /></p>
<p>Click the right arrow button to go to the new service, and say ‘Yes’ to save the changes so far. Configure the controller service and then go
to the Properties tab.</p>
<p>We can test out a couple different configurations easily by using the Configuration Verification button.</p>
<p>First, we’ll use the ‘Direct Properties’ configuration strategy, entering our configuration directly in the service. To do this, set
the Vault URI to <code class="language-plaintext highlighter-rouge">http://localhost:8200</code>. Then add a new dynamic property named <code class="language-plaintext highlighter-rouge">vault.token</code>, specifying ‘Yes’ to indicate it is a sensitive property:</p>
<p><img src="/assets/images/nifi-parameter-providers/05-add-vault-token.png" class="img-responsive" width="80%" height="80%" /></p>
<p>For the value, paste in the Root Token from the Vault installation above. At this point, you can verify the configuration with the check mark button:</p>
<p><img src="/assets/images/nifi-parameter-providers/06-config-verification.png" class="img-responsive" width="80%" height="80%" /></p>
<p>This demonstrates that we can connect to Vault! To demonstrate the other configuration strategy, switch the Configuration Strategy value to
<code class="language-plaintext highlighter-rouge">Properties Files</code>. Then set ‘Vault Properties Files’ to <code class="language-plaintext highlighter-rouge">/tmp/nifi-vault.properties</code>. Notice from above that we had specified the
<code class="language-plaintext highlighter-rouge">vault.token</code> and <code class="language-plaintext highlighter-rouge">vault.uri</code> properties in this properties file. These properties are named after the Spring Data Vault configuration property keys.
Any of the property keys described in <a href="https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration">this section)</a> of the Spring documentation can be used in this properties file.</p>
<p>Here it would also be good to note that if you already have the
<a href="https://bryanbende.com/development/2021/11/08/apache-nifi-1-15-0-hashicorp-vault-secrets">HashiCorp Vault sensitive properties provider</a> integrated and
wish to use the same Vault instance, you can specify a ‘Vault Properties Files’ value of <code class="language-plaintext highlighter-rouge">./conf/bootstrap-hashicorp-vault.conf</code>. Note that if you are using
the <code class="language-plaintext highlighter-rouge">vault.authentication.properties.file</code> property in this bootsrap configuration in order to reference another properties file that contains your
Vault authentication properties, you will also need to list this in the ‘Vault Properties Files’ (e.g., <code class="language-plaintext highlighter-rouge">./conf/bootstrap-hashicorp-vault.conf, /path/to/vault-auth.properties</code>). See the <code class="language-plaintext highlighter-rouge">StandardHashiCoprVaultClientService</code> usage documentation Additional Details section for further information regarding configuration
of this service.</p>
<p>Now run the verification again, and we should still be able to connect to Vault:</p>
<p><img src="/assets/images/nifi-parameter-providers/07-config-verification-2.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Apply the changes, and then click the lightning bolt button to enable the service:</p>
<p><img src="/assets/images/nifi-parameter-providers/08-enable-service.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Close this dialog and then return to the Parameter Providers tab. Edit the provider so we can inspect its other
properties. Notice that the ‘Key/Value Path’ defaults to <code class="language-plaintext highlighter-rouge">kv</code>, which is the same path we used above when enabling the secrets engine in Vault.
This is where you would change the path if you used a different one. Also notice the ‘Secret Name Pattern’ property. This allows us to
limit which secrets in this Vault path are pulled in as parameters, which can be useful if more than just the ones we want are present. To demonstrate
this in action, set this pattern to <code class="language-plaintext highlighter-rouge">VaultContext.*</code>, allowing us to select only secrets whose names start with ‘VaultContext’.</p>
<p>Click ‘Apply’.</p>
<h3 id="adding-vault-secrets">Adding Vault secrets</h3>
<p>Now we need to actually create some secrets to pull in as parameters. Back in the command line, run the following commands in the same
terminal where you set the <code class="language-plaintext highlighter-rouge">$TOKEN</code> variable:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>export VAULT_TOKEN=$TOKEN
export VAULT_URI=http://localhost:8200
vault kv put kv/VaultContext my-param1=value my-param2=value2
vault kv put kv/VaultContext2 my-param3=value3 my-param4=value4
</code></pre></div></div>
<p>We have just created two secrets, one named <code class="language-plaintext highlighter-rouge">VaultContext</code> and the other named <code class="language-plaintext highlighter-rouge">VaultContext2</code>. Each secret will represent a
parameter group, and each key/value pair (e.g., <code class="language-plaintext highlighter-rouge">my-param1=value</code>) will represent a parameter in the respective groups.</p>
<h3 id="fetching-the-parameters">Fetching the Parameters</h3>
<p>Finally, we return to NiFi and click the Fetch Parameters button (down arrow):</p>
<p><img src="/assets/images/nifi-parameter-providers/10-fetched-parameters.png" class="img-responsive" width="80%" height="80%" /></p>
<p>There are our two parameter groups, and since we start out with the first group selected, we see its two parameters. So far, we have
just told NiFi to see what is available. In order to actually create a Parameter Context, we can check the ‘Create Parameter Context’ box.
Do this for the ‘VaultContext’ group:</p>
<p><img src="/assets/images/nifi-parameter-providers/11-create-parameter-context.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Now we can select whether each of these parameters is sensitive or not. Since we’re using HashiCorp Vault, let’s assume we want them both
to be sensitive, though this is not required. Here we could also change the name of the Parameter Context that we create, but we will leave it
as <code class="language-plaintext highlighter-rouge">VaultContext</code>, matching the secret name. Notice that the <code class="language-plaintext highlighter-rouge">VaultContext</code> group now has a star, indicating that it will have a Parameter Context
associated with it. We will not create a parameter context for <code class="language-plaintext highlighter-rouge">VaultContext2</code>, just to demonstrate that we can have groups available that
we choose not to map to Parameter Contexts. Click the ‘Apply’ button, and then Close.</p>
<p>To view the new Parameter Context, Edit the provider again, and go to the ‘Settings’ tab, and we can see the new Parameter Context:</p>
<p><img src="/assets/images/nifi-parameter-providers/12-referenced-parameter-context.png" class="img-responsive" width="80%" height="80%" /></p>
<p>Click on its name, and then click the pencil icon to view its parameters:</p>
<p><img src="/assets/images/nifi-parameter-providers/13-new-parameter-context.png" class="img-responsive" width="80%" height="80%" /></p>
<p>There are our parameters! Notice that we cannot edit them – only the parameter provider can do that now. If we update their values in
Vault, or add or remove keys to these secrets, we have to Fetch Parameters again in order for NiFi to pick up the changes. Each time we add
a new key, we must specify the sensitivity as we did before. Sensitivities can be changed, but only if the parameter is not actively
referenced by a component in the flow.</p>
<p>For more information on the basics of Parameter Provider interaction, see <a href="https://bryanbende.com/development/2022/09/24/apache-nifi-1-18-0-parameter-providers">this previous post</a>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Now we have seen how to create Parameter Contexts from HashiCorp Vault Key/Value secrets. Enjoy!</p>
Apache NiFi 1.18.0 - Parameter Providers2022-09-24T00:00:00+00:00http://bbende.github.io/development/2022/09/24/apache-nifi-1-18-0-parameter-providers
<h4 id="guest-author-joe-gresock">(Guest author: Joe Gresock)</h4>
<p>A rundown of the new Parameter Providers extension point in Apache NiFi 1.18.0.</p>
<h3 id="introduction">Introduction</h3>
<p>Apache NiFi 1.18.0 introduces a new extension point: the Parameter Provider. This powerful feature allows automatic creation of
Parameter Contexts from external sources (e.g., file-based Kubernetes secrets, environment variables, HashiCorp Vault secrets engines).
Paired with Parameter Context inheritance, flow parameters are now more flexible than ever. This post goes over some of the basics
of Parameter Providers, but first let’s see what Parameter Providers do and don’t do:</p>
<p><strong>What Parameter Providers Do:</strong></p>
<ul>
<li>Provide a new feature in the Controller Settings that let you generate Parameter Contexts from an external sources</li>
<li>Let you manually keep your provided Parameter Contexts up to date with the external source by running a “Fetch Parameters” operation</li>
<li>Provide an extension point for developing new custom Parameter Providers that can be deployed in a NAR.</li>
<li>Provide a CLI command, <code class="language-plaintext highlighter-rouge">nifi fetch-params</code>, to fetch and apply parameters, allowing a scripted approach to keep Parameter Contexts up to date.</li>
</ul>
<p><strong>What Parameter Providers Don’t Do:</strong></p>
<ul>
<li>Parameter Providers don’t pull parameters directly from the external source at the time of usage in the flow. Parameter values are still stored (encrypted, if sensitive) inside the flow itself. Think of Parameter Providers as a mechanism that automates the creation of Parameter Contexts and facilitates keeping them updated, not as something that replaces the framework mechanism to resolve parameter values during Processor/Controller Service execution.</li>
<li>Parameter Providers don’t have an automatic mechanism to refresh parameter values from the NiFi UI. Fetching and applying parameters is a potentially disruptive operation, since it can involve stopping and starting large portions of the flow. This kind of activity could be scripted using the CLI command <code class="language-plaintext highlighter-rouge">fetch-params</code>, but
should be done with the potential for flow disruption in mind.</li>
</ul>
<p>Now, let’s take a look at Parameter Providers and how they fit into the flow.</p>
<h3 id="creating-and-configuring-a-parameter-provider">Creating and Configuring a Parameter Provider</h3>
<p>First, select Controller Settings from the top-right hamburger menu in NiFi:</p>
<p><img src="/assets/images/nifi-parameter-providers/01-controller-settings-menu.png" class="img-responsive" width="30%" height="30%" /></p>
<p>Then navigate to the new Parameter Providers tab and click the ‘+’ button on the top-right of the Parameter Providers listing to add a new Parameter Provider.</p>
<p><img src="/assets/images/nifi-parameter-providers/02-create-parameter-provider.png" class="img-responsive" /></p>
<p>For this example, select FileParameterProvider, and then Add. The Parameter Provider is created, and now you can click the Pencil button to edit it.</p>
<p><img src="/assets/images/nifi-parameter-providers/03-configure-parameter-provider.png" class="img-responsive" /></p>
<p>The FileParameterProvider lets you supply parameters in key-value files inside a Parameter Group directory. We will discuss what a Parameter Group is
below, but for now, create a directory named “parameters” somewhere on your filesystem, and type the absolute path to this directory in the
Parameter Group Directories property. Then, to make testing easier, select Plain Text from the Parameter Value Encoding property, and Apply.
If the specified directory is readable, you should now see a Fetch Parameters icon available.</p>
<p><img src="/assets/images/nifi-parameter-providers/04-fetch-parameters-icon.png" class="img-responsive" /></p>
<h3 id="fetching-the-parameters">Fetching the Parameters</h3>
<p>Click the Fetch Parameters (down arrow) icon, and a new dialog will pop up:</p>
<p><img src="/assets/images/nifi-parameter-providers/05-fetch-parameters-empty.png" class="img-responsive" /></p>
<p>We see a Parameter Group named “parameters”, which is named after the directory we specified earlier. This group can be used to create
a Parameter Context. However, we need to create some files in order to add some parameters to this group.</p>
<p>Create three files inside the “parameters” directory, and give them some basic contents. For example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ print admin > /tmp/parameters/sys.admin.username && \
print password > /tmp/parameters/sys.admin.password && \
print value > /tmp/parameters/sys.other
</code></pre></div></div>
<p>Each file represents a Parameter in the “parameters” group, and the contents of each file represents the parameter value.</p>
<p>Close the Fetch Parameters dialog, and click the Fetch Parameters button again. Now we see some parameters!</p>
<p><img src="/assets/images/nifi-parameter-providers/06-fetch-parameters-start.png" class="img-responsive" /></p>
<p>Notice the three filenames are now listed under Fetched Parameters in the middle column. This just shows what parameters are available in
the group – nothing has been applied to the flow yet.</p>
<h3 id="creating-a-parameter-context-from-a-parameter-group">Creating a Parameter Context from a Parameter Group</h3>
<p>To create our first provided Parameter Context, check the “Create Parameter Context” box. This updates the middle column to be editable,
allowing us to select whether each parameter should be considered “sensitive” when the Parameter Context is created. Since we only intended
the <code class="language-plaintext highlighter-rouge">sys.admin.password</code> parameter to be sensitive, we’ll leave this checked, but uncheck the other two parameters. Perhaps we also want
to name the Parameter Context something other than “parameters”, so we’ll update its name to “My Parameters”.</p>
<p><img src="/assets/images/nifi-parameter-providers/07-fetched-parameter-sensitivities.png" class="img-responsive" /></p>
<p>Also notice that there is now a star next to the “parameters” group. This means the group will have an associated Parameter Context once we apply
the change. In the future, if we have multiple parameter groups (possible in the FileParameterProvider if we specified multiple directories), we can
easily see which ones have associated Parameter Contexts by which ones are starred.</p>
<p>To create the Parameter Context, click Apply.</p>
<p>If we edit the Parameter Provider and go to the Settings tab, we’ll now see a reference to the newly created Parameter Context.</p>
<p><img src="/assets/images/nifi-parameter-providers/08-referenced-parameter-context.png" class="img-responsive" /></p>
<p>We can visit this Parameter Context in its listing by clicking on its name. Here, notice that there is a “Go To” button in this listing, which takes us
back to the Parameter Provider.</p>
<p><img src="/assets/images/nifi-parameter-providers/09-go-to-parameter-provider.png" class="img-responsive" /></p>
<p>However, for now we’ll look at the Parameter Context by clicking the pencil button.</p>
<p><img src="/assets/images/nifi-parameter-providers/10-parameter-context-view.png" class="img-responsive" /></p>
<p>We see the three parameters we created, along with the values of the non-sensitive ones. Notice that we cannot edit these
parameters: only the Parameter Provider can now add, remove, or update parameters in this Parameter Context. To see which
Parameter Provider is providing the values, go to the Settings tab:</p>
<p><img src="/assets/images/nifi-parameter-providers/11-linked-parameter-provider.png" class="img-responsive" /></p>
<p>To continue with the tutorial, click the provider name and then click the Fetch Parameters button again.</p>
<h3 id="updating-parameter-sensitivities">Updating Parameter sensitivities</h3>
<p>Now, let’s say we realized we needed to make <code class="language-plaintext highlighter-rouge">sys.other</code> a sensitive parameter. We can check the box next to this parameter, and
now we are able to click the Apply button. Do this, and the parameter will be updated.</p>
<p><img src="/assets/images/nifi-parameter-providers/12-updated-sensitivity.png" class="img-responsive" height="50%" width="50%" /></p>
<p>Note that if the “My Parameters” context is assigned to a group, and that group has a component that references the <code class="language-plaintext highlighter-rouge">sys.other</code> parameter,
we will not be able to update its sensitivity. In order to do so, we would have to first remove any references to it.</p>
<h3 id="fetching-to-pull-in-changes-from-the-external-source">Fetching to pull in changes from the external source</h3>
<p>Parameters are not automatically synchronized with the external source. To update any linked Parameter Contexts, we can
fetch the parameters again, and specify the sensitivity of any new parameters.</p>
<p>To simulate a few types of changes, do the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ rm /tmp/parameters/sys.other && \
print admin2 > /tmp/parameters/sys.admin.username && \
print test > /tmp/parameters/new-parameter
</code></pre></div></div>
<p>This deletes the <code class="language-plaintext highlighter-rouge">sys.other</code> parameter, updates the value of the <code class="language-plaintext highlighter-rouge">sys.admin.username</code> parameter, and adds a new
parameter <code class="language-plaintext highlighter-rouge">new-parameter</code>. Now Fetch Parameters again, and observe the changes:</p>
<p><img src="/assets/images/nifi-parameter-providers/13-parameter-update.png" class="img-responsive" /></p>
<p>We get a few “dirty” parameter indicators (the asterisk), indicating changes from the source of the Parameter Provider. Hovering
over these asterisks gives useful information, such as “Value has changed”, as we see for <code class="language-plaintext highlighter-rouge">sys.admin.username</code>. We also see that
<code class="language-plaintext highlighter-rouge">new-parameter</code> is marked as a newly discovered parameter, indicating to us that we need to specify the sensitivity.
As before, the default sensitivity of the new parameter is “sensitive”, so we’ll have to adjust that if we wanted it to
be non-sensitive. Apply the changes, and the Parameter Context will be updated to add the new parameter, remove
the <code class="language-plaintext highlighter-rouge">sys.other</code> parameter, and change the value of <code class="language-plaintext highlighter-rouge">sys.admin.username</code>.</p>
<p><img src="/assets/images/nifi-parameter-providers/14-updated-parameter-context.png" class="img-responsive" /></p>
<p>Note that if <code class="language-plaintext highlighter-rouge">sys.other</code> was actually referenced by a component, fetching would not remove this parameter, but would
flag it as “missing”. Applying the fetched parameters would leave the parameter in place, but if at any point the
parameter was no longer referenced in the flow, the next Fetch would not return <code class="language-plaintext highlighter-rouge">sys.other</code>. This helps to essentially
deprecate any parameters that are removed from the source without invalidating the flow when the parameters are applied.</p>
<h3 id="composed-provided-parameter-contexts">Composed Provided Parameter Contexts</h3>
<p>A powerful pattern, then, is to create separate Parameter Providers for different sources, and then simply compose their created Parameter Contexts through
inheritance. This allows sensitive parameters to originate from a secrets manager, like HashiCorp Vault, and
non-sensitive parameters to originate from other sources like Environment Variables or the filesystem. In the following
simple example, I created an EnvironmentVariableParameterProvider, applied the parameters as non-sensitive, and then
created a FileParameterProvider and applied its parameters as sensitive. I then created a new Parameter Context, “Parameters”,
and added both of the above as inherited contexts.</p>
<p><img src="/assets/images/nifi-parameter-providers/15-composed-parameter-contexts.png" class="img-responsive" /></p>
<p>This new Parameter Context will inherit parameters from both sources:</p>
<p><img src="/assets/images/nifi-parameter-providers/16-composed-parameters.png" class="img-responsive" /></p>
<h3 id="wrapping-up">Wrapping up</h3>
<p>In conclusion, the new Parameter Providers framework is a powerful addition to NiFi’s Parameter Context feature,
introducing a new extension point that can help populate flow parameters on demand.</p>
Apache NiFi - Stateless2021-11-10T00:00:00+00:00http://bbende.github.io/development/2021/11/10/apache-nifi-stateless
<p>The past several releases of Apache NiFi have made significant improvements to the Stateless
NiFi engine. If you are not familiar with Stateless NiFi, then I would recommend reading this
<a href="https://github.com/apache/nifi/blob/main/nifi-stateless/nifi-stateless-assembly/README.md">overview</a> first.</p>
<p>This post will examine the differences between running a flow in traditional NiFi vs. Stateless NiFi.</p>
<h2 id="traditional-nifi">Traditional NiFi</h2>
<p>As an example, let’s assume there is a Kafka topic with CDC events and we want to consume the
events and apply them to another relational database. This can be achieved with a simple flow
containing <code class="language-plaintext highlighter-rouge">ConsumeKafka_2_6</code> connected to <code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code>.</p>
<p><img src="/assets/images/nifi-stateless/01-traditional-flow.png" class="img-responsive" /></p>
<p>In traditional NiFi, each node has a set of internal repositories that are stored on local disk. The <em>Flow File Repository</em>
contains the state of each flow file, including its attributes and location in the flow, and the <em>Content Repository</em>
stores the content of each flow file.</p>
<p>Each execution of a processor is given a reference to a session that acts like a transaction for operating on
flow files. If all operations complete successfully and the session is committed, then all updates are persisted to
NiFi’s repositories. In the event that NiFi is restarted, all data is preserved in the repositories and the flow will
start processing from the last committed state.</p>
<p>Let’s consider how the example flow will execute in traditional NiFi…</p>
<p>First, <code class="language-plaintext highlighter-rouge">ConsumeKafka_2_6</code> will poll Kafka for available records. Then it will use the session to create a flow file
and write the content of the records to the output stream of the flow file. The processor will then commit the NiFi
session, followed by committing the Kafka offsets. The flow file will then be transferred to <code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code>. The
overall sequence is summarized in the following diagram.</p>
<p><img src="/assets/images/nifi-stateless/02-traditional-sequence.png" class="img-responsive" /></p>
<p>A key point here is the ordering of committing the NiFi session before committing the Kafka offsets. This provides an
<em>“at least once”</em> guarantee by ensuring the data is persisted in NiFi before acknowledging the offsets. If committing the
offsets fails, possibly due to a consumer rebalance, NiFi will consume those same offsets again and receive duplicate data.
If the ordering was reversed, it would be possible for the offsets to be successfully committed, followed by a failure to
commit the NiFi session, which would create data loss and be considered <em>“at most once”</em>.</p>
<p>A second key point is that there is purposely no coordination across processors, meaning that each processor succeeds or fails
independently of the other processors in the flow. Once <code class="language-plaintext highlighter-rouge">ConsumeKafka_2_6</code> successfully executes, the consumed data is now persisted
in NiFi, regardless of whether <code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code> succeeds.</p>
<p>Let’s look at how this same flow would execute in Stateless NiFi.</p>
<h2 id="stateless-nifi">Stateless NiFi</h2>
<p>Stateless NiFi adheres to the same NiFi API as traditional NiFi, which means it can run the same processors and flow definitions,
it just provides a different implementation of the underlying engine.</p>
<p><img src="/assets/images/nifi-stateless/03-stateless-flow.png" class="img-responsive" /></p>
<p>The primary abstraction is a <code class="language-plaintext highlighter-rouge">StatelessDataFlow</code> which can be triggered to execute. Each execution of the flow
produces a result that can be considered a success or failure. A failure can occur from a processor throwing an exception,
or from explicitly routing flow files to a named “failure port”.</p>
<p>A key difference in Stateless NiFi is around committing the NiFi session. A new commit method was introduced
to <code class="language-plaintext highlighter-rouge">ProcessSession</code> with the following signature:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void commitAsync(Runnable onSuccess);
</code></pre></div></div>
<p>This gives the session implementation control over when to execute the given callback. In traditional NiFi the session can execute the
callback as the last step of <code class="language-plaintext highlighter-rouge">commitAsync</code>, which produces the same behavior we looked at earlier. The stateless NiFi session can
hold the callback and execute it only when the entire flow has completed successfully.</p>
<p>Let’s consider how the example flow will execute in Stateless NiFi…</p>
<p>When the <code class="language-plaintext highlighter-rouge">StatelessDataFlow</code> is triggered, <code class="language-plaintext highlighter-rouge">ConsumeKafka_2_6</code> begins executing the same as it would in traditional NiFi, by polling
Kafka for records, creating a flow file, and writing the records to the output stream of the flow file. It then calls <code class="language-plaintext highlighter-rouge">commitAsync</code> to
commit the NiFi session and passes in a callback for committing the offsets to Kafka, which in this case will be held until later.</p>
<p>The flow file is then transferred to <code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code> which attempts to apply the event to the database. Let’s assume
<code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code> was successful, then the overall execution of the flow completes successfully. The stateless engine then
acknowledges the result which executes any held callbacks, and thus commits the offsets to Kafka. The overall sequence is
summarized in the following diagram.</p>
<p><img src="/assets/images/nifi-stateless/04-stateless-sequence.png" class="img-responsive" /></p>
<p>A key point here is that the execution of the entire flow is being treated like a single transaction. If a failure were to occur at
<code class="language-plaintext highlighter-rouge">PutDatabaseRecord</code>, the overall execution would be considered a failure, and the <code class="language-plaintext highlighter-rouge">onSuccess</code> callbacks from <code class="language-plaintext highlighter-rouge">commitAsync</code>
would never get executed. In this case, that would mean the offsets were never committed to Kafka, and the entire flow can be tried
again for the same records.</p>
<p>Another type of failure scenario would be if Stateless NiFi crashed in the middle of executing the flow. Since Stateless NiFi generally
uses in-memory repositories, any data that was in the middle of processing would be gone. However, since the source processor had not yet
acknowledged receiving that data (i.e. the onSuccess callback never got executed), it would pull the same data again on the next execution.</p>
<h2 id="executestateless">ExecuteStateless</h2>
<p>Previously, the primary mechanism to use Stateless NiFi was through the <code class="language-plaintext highlighter-rouge">nifi-stateless</code> binary which launches a standalone process to
execute the flow.</p>
<p>The 1.15.0 release introduced a new processor called <code class="language-plaintext highlighter-rouge">ExecuteStateless</code> which can be used to run the Stateless engine from within traditional
NiFi. This allow you to manage the execution of the Stateless flow using the traditional NiFi UI, as well as connect the output of the Stateless flow to follow on processing in the traditional NiFi flow.</p>
<p><img src="/assets/images/nifi-stateless/05-execute-stateless.png" class="img-responsive" /></p>
<p>In order to use the <code class="language-plaintext highlighter-rouge">ExecuteStateless</code> processor, you would first use traditional NiFi to create a process group containing the flow
you want to execute with the Stateless engine. You would then download the flow definition, or commit the flow to a NiFi Registry instance.
From there, you would configure <code class="language-plaintext highlighter-rouge">ExecuteStateless</code> with the location of the flow definition.</p>
<p>For a more in depth look at <code class="language-plaintext highlighter-rouge">ExecuteStateless</code>, check out <a href="https://www.youtube.com/watch?v=VyzoD8eh-t0">Mark Payne’s YouTube Video on “Kafka Exactly Once with NiFi”</a>.</p>
Apache NiFi 1.15.0 - Parameter Context Inheritance2021-11-08T00:00:00+00:00http://bbende.github.io/development/2021/11/08/apache-nifi-1-15-0-parameter-context-inheritance
<h4 id="guest-author-joe-gresock">(Guest author: Joe Gresock)</h4>
<p>Here we discuss the benefits and intricacies of the new Parameter Context Inheritance in Apache NiFi 1.15.0.</p>
<h3 id="introduction">Introduction</h3>
<p>The Parameter Context is a powerful way to make flows more portable in Apache NiFi. Paired with Process Groups imported from Flow Definition JSON or NiFi Registry, a Parameter Context allows different values to be supplied to the same basic flow in multiple NiFi instances or even within the same overall NiFi flow in a single instance.</p>
<p>However, as the number of parameters grows, Parameter Contexts can become more difficult to maintain, resulting in the inevitable Monolithic Parameter Context. Let’s see how this happens:</p>
<ol>
<li>You have a Process Group called “Source ABC Kafka to S3” with parameters for your Kafka brokers and Source ABC’s topic, and for the S3 bucket.</li>
<li>You have another Process Group called “Source DEF Kafka to GCS” with parameters for the same Kafka brokers but Source DEF’s topic, and for the GCP Cloud Storage bucket.</li>
<li>You have a third Process Group called “Source GHI Kafka to Elasticsearch” with parameters for the same Kafka brokers but Source GHI’s topic, and for the Elasticsearch hosts, username, and password.</li>
<li>In all of your Process Groups, you use some common Parameters tagging the flowfiles with site-specific information.</li>
<li>Now, where do you put all those parameters: in multiple separate Parameter Contexts, or in one Monolithic Parameter Context? You don’t want to keep repeating the Kafka brokers or the common site-specific parameters, and you also don’t want to break up these Process Groups because these are the units that you deploy atomically across your organization in different sites. So, you go with the Monolithic approach.</li>
</ol>
<p>However, the Monolithic Parameter Context has its own problems:</p>
<ul>
<li>You often end up with multiple slightly different parameters, like “Source ABC Kafka Topic” and “Source DEF Kafka Topic”. Didn’t you make the Parameter Context to parameterize the topic in the first place?</li>
<li>Parameterizing a Process Group with more Parameters than it needs is a bad practice, resulting in confusing extra parameters.</li>
</ul>
<p>What if you could avoid the Parameter duplication and yet still compose groups of Parameters as needed?</p>
<h3 id="introducing-parameter-context-inheritance">Introducing Parameter Context Inheritance</h3>
<p>With Apache NiFi 1.5.0 comes the ability to add Inherited Parameter Contexts to an existing Parameter Context. Any inherited Parameters are also available in the Context that inherits them, down to as many levels of inheritance as you desire. This structure allows Parameter Contexts to contain smaller groups of related Parameters, leaving the inheritance to group disparate sets of Parameter Contexts when needed.</p>
<p>Let’s see what it looks like!</p>
<p>First, select Parameter Contexts from the top-right hamburger menu in NiFi:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/01-param-context-menu.png" class="img-responsive" width="30%" height="30%" /></p>
<p>Then click the ‘+’ button on the top-right of the NiFi Parameter Contexts view to add a new Parameter Context.</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/02-param-context-view.png" class="img-responsive" /></p>
<p>You’ll notice a new tab, “INHERITANCE”. We’ll get to that in a moment, but first let’s simulate the example above to make it more concrete.</p>
<h3 id="designing-with-smaller-composable-contexts">Designing with Smaller, Composable Contexts</h3>
<p>The ability to compose Parameter Contexts through inheritance suggests an overall preference of smaller, logically grouped Parameter Contexts. Let’s see how this works in the example from the Introduction.</p>
<p>Note: the entire flow discussed below, along with Parameter Contexts, can be downloaded here to save time in constructing it: <a href="/assets/attachments/nifi-parameter-context-inheritance/Parameter_Context_Inheritance_Demo.json" target="_blank">Parameter_Context_Inheritance_Demo.json</a></p>
<p>First, create a context for our common Kafka settings:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">Kafka</code></li>
<li>Parameters:
<ul>
<li>Kafka Brokers: <code class="language-plaintext highlighter-rouge">localhost:9092</code></li>
<li>Kafka Group ID: <code class="language-plaintext highlighter-rouge">MyGroup</code></li>
</ul>
</li>
</ol>
<p>Next, create a context for each of the Kafka topics:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">Source ABC</code>, with Parameter “Kafka Topic”: <code class="language-plaintext highlighter-rouge">source-abc-topic</code></li>
<li>Name: <code class="language-plaintext highlighter-rouge">Source DEF</code>, with Parameter “Kafka Topic”: <code class="language-plaintext highlighter-rouge">source-def-topic</code></li>
<li>Name: <code class="language-plaintext highlighter-rouge">Source GHI</code>, with Parameter “Kafka Topic”: <code class="language-plaintext highlighter-rouge">source-ghi-topic</code></li>
</ol>
<p>Perhaps you would prefer to name these like “Source ABC Kafka”, but this depends on your overall use case.</p>
<p>Then we’ll add a Parameter Context for S3 settings:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">S3</code></li>
<li>Parameters:
<ul>
<li>S3 Bucket: <code class="language-plaintext highlighter-rouge">my-bucket</code></li>
<li>S3 Region: <code class="language-plaintext highlighter-rouge">us-west-2</code></li>
</ul>
</li>
</ol>
<p>Then we’ll add a Parameter Context for GCP Cloud Storage settings:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">GCP Cloud Storage</code></li>
<li>Parameters:
<ul>
<li>GCP Project ID: <code class="language-plaintext highlighter-rouge">my-project</code></li>
<li>GCS Bucket: <code class="language-plaintext highlighter-rouge">my-bucket</code></li>
</ul>
</li>
</ol>
<p>Then we’ll add a Parameter Context for Elasticsearch settings:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">Elasticsearch</code></li>
<li>Parameters:
<ul>
<li>Elasticsearch URL: <code class="language-plaintext highlighter-rouge">http://localhost:9200</code></li>
<li>Elasticsearch Username: <code class="language-plaintext highlighter-rouge">elastic</code></li>
<li>Elasticsearch Password: <code class="language-plaintext highlighter-rouge">password</code></li>
</ul>
</li>
</ol>
<p>Finally, we’ll add a Parameter Context for the common site-specific parameters:</p>
<ol>
<li>Name: <code class="language-plaintext highlighter-rouge">Site Properties</code></li>
<li>Parameters:
<ul>
<li>Site Identifier: <code class="language-plaintext highlighter-rouge">1234</code></li>
<li>Site Data Manager: <code class="language-plaintext highlighter-rouge">MyDM</code></li>
</ul>
</li>
</ol>
<p>When we’re all done, our Parameter Contexts should look like this:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/03-param-context-list.png" class="img-responsive" /></p>
<h3 id="adding-the-inheritance">Adding the Inheritance</h3>
<p>Now that we have our Parameter Contexts set up, we can begin composing them. Our goal here is to have one Parameter Context per Process Group, composed of the smaller ones.</p>
<p>Notice in our example that we will need both the <code class="language-plaintext highlighter-rouge">Site Properties</code> and the <code class="language-plaintext highlighter-rouge">Kafka</code> Parameters in all of our Process Groups. Then, we’ll need one of each <code class="language-plaintext highlighter-rouge">Source</code> Context in each Process Group, and one of each destination-related Context in each group. Further, let’s add the detail that we know we’ll need additional Process Groups with the same Kafka Brokers and Topics but with different destinations in the future. This suggests the following composition:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Kafka and Site</code> - inherits from <code class="language-plaintext highlighter-rouge">Kafka</code> and <code class="language-plaintext highlighter-rouge">Site Properties</code></li>
<li><code class="language-plaintext highlighter-rouge">ABC Kafka and Site</code> - inherits from <code class="language-plaintext highlighter-rouge">Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">Source ABC</code></li>
<li><code class="language-plaintext highlighter-rouge">DEF Kafka and Site</code> - inherits from <code class="language-plaintext highlighter-rouge">Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">Source DEF</code></li>
<li><code class="language-plaintext highlighter-rouge">GHI Kafka and Site</code> - inherits from <code class="language-plaintext highlighter-rouge">Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">Source GHI</code></li>
<li><code class="language-plaintext highlighter-rouge">ABC Kafka to S3</code> - inherits from <code class="language-plaintext highlighter-rouge">ABC Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">S3</code></li>
<li><code class="language-plaintext highlighter-rouge">DEF Kafka to GCS</code> - inherits from <code class="language-plaintext highlighter-rouge">DEF Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">GCP Cloud Storage</code></li>
<li><code class="language-plaintext highlighter-rouge">GHI Kafka to Elasticsearch</code> - inherits from <code class="language-plaintext highlighter-rouge">ABC Kafka and Site</code> and <code class="language-plaintext highlighter-rouge">Elasticsearch</code></li>
</ul>
<p>The hierarchy for one of these top-level Parameter Contexts would look like this:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ABC Kafka to S3</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">ABC Kafka and Site</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">Kafka and Site</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">Kafka</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">Kafka Brokers</code> Parameter with value <code class="language-plaintext highlighter-rouge">localhost:9092</code></li>
<li><code class="language-plaintext highlighter-rouge">Kafka Group ID</code> Parameter with value <code class="language-plaintext highlighter-rouge">MyGroup</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">Site Properties</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">Site Identifier</code> Parameter with value <code class="language-plaintext highlighter-rouge">1234</code></li>
<li><code class="language-plaintext highlighter-rouge">Site Data Manager</code> Parameter with value <code class="language-plaintext highlighter-rouge">MyDM</code></li>
</ul>
</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">Source ABC</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">Kafka Topic</code> Parameter with value <code class="language-plaintext highlighter-rouge">source-abc-topic</code></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Let’s compose one of these examples, and the rest can be seen by downloading the flow above.</p>
<p>First, create a new Parameter Context named <code class="language-plaintext highlighter-rouge">Kafka and Site</code> and go to the Inheritance tab:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/04-inheritance-example-1.png" class="img-responsive" /></p>
<p>There are all of our individual Parameter Contexts! Let’s drag <code class="language-plaintext highlighter-rouge">Kafka</code> to the right:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/05-inheritance-example-2.png" class="img-responsive" /></p>
<p>Then drag <code class="language-plaintext highlighter-rouge">Site Properties</code> to the right (doesn’t matter if it’s on the top or bottom for now):</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/06-inheritance-example-3.png" class="img-responsive" /></p>
<p>Click Apply, and then Edit <code class="language-plaintext highlighter-rouge">Kafka and Site</code> to see the new Parameters:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/07-inherited-parameters-1.png" class="img-responsive" /></p>
<p>Notice that all four Parameters from these two Contexts are available in the view, and that each of them has an arrow icon. You can click on one of these to “Go To” the Parameter Context in which the Parameter is actually defined. This allows you to easily navigate to the Parameter in order to edit the actual value.</p>
<p>Skipping ahead, we’ll show what <code class="language-plaintext highlighter-rouge">ABC Kafka to S3</code> looks like after the full setup:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/08-inherited-parameters-2.png" class="img-responsive" /></p>
<h3 id="parameter-overriding">Parameter Overriding</h3>
<p>Now, let’s say we want to add another Process Group that uses all the same Parameters as <code class="language-plaintext highlighter-rouge">ABC Kafka to S3</code>, but with a <code class="language-plaintext highlighter-rouge">Kafka Group ID</code> of <code class="language-plaintext highlighter-rouge">MyOtherGroup</code>. Here we can use Parameter overriding by creating a new Parameter Context that inherits from <code class="language-plaintext highlighter-rouge">ABC Kafka to S3</code> and then adding a <code class="language-plaintext highlighter-rouge">Kafka Group ID</code> Parameter directly to that new context:</p>
<p><img src="/assets/images/nifi-parameter-context-inheritance/09-overridden-parameter.png" class="img-responsive" /></p>
<p>Notice that <code class="language-plaintext highlighter-rouge">Kafka Group ID</code> appears at the top of the list, even though it’s not in alphabetical order. Direct Parameters always appear at the top of the list so you can see them grouped together. Also notice that you can edit this Parameter directly from this context. You are editing the one on the new Context, rather than the inherited Parameter. And finally, notice that the value displayed is the one you just provided: <code class="language-plaintext highlighter-rouge">MyOtherGroup</code>.</p>
<p>What is the Parameter overriding order, then?</p>
<h3 id="parameter-overriding-order">Parameter Overriding Order</h3>
<ul>
<li>Direct Parameters always take precedence</li>
<li>Next, Parameters inside directly inherited Parameter Contexts take precedence, from top to bottom in the <code class="language-plaintext highlighter-rouge">Selected Parameter Context</code> portion of the Inheritance tab</li>
<li>This recursively repeats in a <em>depth-first</em> manner for as many layers of inherited Parameter Contexts exist</li>
</ul>
<p>So, for example, consider the following Parameter Context hierarchy:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">A</code> Context
<ul>
<li><code class="language-plaintext highlighter-rouge">foo</code> Parameter with value <code class="language-plaintext highlighter-rouge">A.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">B</code> Inherited Context
<ul>
<li><code class="language-plaintext highlighter-rouge">foo</code> Parameter with value <code class="language-plaintext highlighter-rouge">B.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">bar</code> Parameter with value <code class="language-plaintext highlighter-rouge">B.bar</code></li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">C</code> Inherited Context
<ul>
<li><code class="language-plaintext highlighter-rouge">foo</code> Parameter with value <code class="language-plaintext highlighter-rouge">C.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">bar</code> Parameter with value <code class="language-plaintext highlighter-rouge">C.bar</code></li>
<li><code class="language-plaintext highlighter-rouge">grandchild</code> Inherited Context
<ul>
<li><code class="language-plaintext highlighter-rouge">foo</code> Parameter with value <code class="language-plaintext highlighter-rouge">grandchild.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">bar</code> Parameter with value <code class="language-plaintext highlighter-rouge">grandchild.bar</code></li>
<li><code class="language-plaintext highlighter-rouge">baz</code> Parameter with value <code class="language-plaintext highlighter-rouge">grandchild.baz</code></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Then, <code class="language-plaintext highlighter-rouge">A</code> effectively has the following Parameters:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">A.foo</code> (overrides all others since it is at the top level)</li>
<li><code class="language-plaintext highlighter-rouge">B.bar</code> (overrides <code class="language-plaintext highlighter-rouge">C.bar</code> because it is first in the list of A’s inherited contexts)</li>
<li><code class="language-plaintext highlighter-rouge">grandchild.baz</code> (because nothing overrides this)</li>
</ul>
<p>If we rearranged Contexts <code class="language-plaintext highlighter-rouge">B</code> and <code class="language-plaintext highlighter-rouge">C</code> so that <code class="language-plaintext highlighter-rouge">C</code> was at the top of the list of A’s inherited Parameter Contexts, the new effective list would be:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">A.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">C.bar</code></li>
<li><code class="language-plaintext highlighter-rouge">grandchild.baz</code></li>
</ul>
<p>And if we then deleted <code class="language-plaintext highlighter-rouge">C.bar</code>, the effective list would be:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">A.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">grandchild.bar</code> (because <code class="language-plaintext highlighter-rouge">grandchild.bar</code> is in the effective list of the parameters of <code class="language-plaintext highlighter-rouge">C</code>, which appears above <code class="language-plaintext highlighter-rouge">B</code> in the inherited contexts of <code class="language-plaintext highlighter-rouge">A</code>)</li>
<li><code class="language-plaintext highlighter-rouge">grandchild.baz</code></li>
</ul>
<p>Finally, if we deleted <code class="language-plaintext highlighter-rouge">grandchild.bar</code>, we’d end up with:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">A.foo</code></li>
<li><code class="language-plaintext highlighter-rouge">b.bar</code></li>
<li><code class="language-plaintext highlighter-rouge">grandchild.baz</code></li>
</ul>
<p>Notice that as we move these inherited Parameter Contexts around, we are actually changing the effective values of the parameters. Once we Apply these changes, all referenced components are stopped and restarted just as if we were changing these Parameter values like usual.</p>
<p>Now, as far as best practices go, as we increase the number of overridden Parameters, we likely increase the potential for confusion, so it is probably best to use overriding sparingly if possible.</p>
<h3 id="wrapping-up">Wrapping Up</h3>
<p>So, taking our example from the Introduction as implemented with Inherited Parameter Contexts, we can now make granular updates to our Parameter Contexts and have the changes propagate to all Contexts that inherit from them. So, if we need to change the <code class="language-plaintext highlighter-rouge">Site Identifier</code>, we only have to change this in one location: the <code class="language-plaintext highlighter-rouge">Site Properties</code> Parameter Context. If we need to change our <code class="language-plaintext highlighter-rouge">Kafka Broker</code>, we need only change it in the <code class="language-plaintext highlighter-rouge">Kafka</code> Context. We can also add additional Parameter Contexts through inheritance as our flows evolve. And each Process Group now has only exactly the Parameters it needs, now that we’ve done away with the Monolithic Parameter Context.</p>
<p>In conclusion, the new Inherited Parameter Context feature of Apache NiFi 1.15.0 brings the next level of flexibility and maintainability to NiFi’s already powerful Parameter Context framework.</p>
Apache NiFi 1.15.0 - Protecting NiFi Configuration using HashiCorp Vault K/V Secrets2021-11-08T00:00:00+00:00http://bbende.github.io/development/2021/11/08/apache-nifi-1-15-0-hashicorp-vault-secrets
<h4 id="guest-author-joe-gresock">(Guest author: Joe Gresock)</h4>
<p>This is a follow-on post to <a href="https://bryanbende.com/development/2021/07/20/apache-nifi-1-14-0-hashicorp-vault">Apache NiFi 1.14.0 - HashiCorp Vault Integration</a>, demonstrating protecting Apache NiFi 1.15.0 sensitive configuration properties by storing them as HashiCorp Vault Key/Value Secrets.</p>
<h3 id="introduction">Introduction</h3>
<p>In the <a href="https://bryanbende.com/development/2021/07/20/apache-nifi-1-14-0-hashicorp-vault">last post</a>, we walked through protecting NiFi configuration properties using HashiCorp Vault’s Transit Secrets Engine. This process outsourced the encryption of sensitive configuration properties to Vault. Here we will slightly alter the procedure in order to store these sensitive properties as actual Key/Value Secrets in a Vault server.</p>
<p>Rather than simply pointing out the differences, we’ll walk through the entire procedure, which will be necessary in order to pick up the latest Apache NiFi distribution anyway.</p>
<h3 id="setup">Setup</h3>
<p>To get started, we’ll need to download and install HashiCorp Vault, NiFi 1.15.0, and nifi-toolkit.</p>
<ul>
<li>First, follow the relevant Vault installation guide for your system: <a href="https://www.vaultproject.io/downloads">https://www.vaultproject.io/downloads</a></li>
<li>Then head over to the Apache nifi downloads page: <a href="https://nifi.apache.org/download.html">https://nifi.apache.org/download.html</a>
<ul>
<li>Download <code class="language-plaintext highlighter-rouge">nifi-toolkit-1.15.0-bin.zip</code></li>
<li>Download <code class="language-plaintext highlighter-rouge">nifi-1.15.0-bin.zip</code></li>
</ul>
</li>
</ul>
<p>For the purposes of this guide, we’ll unzip both <code class="language-plaintext highlighter-rouge">nifi-toolkit-1.15.0-bin.zip</code> and <code class="language-plaintext highlighter-rouge">nifi-1.15.0-bin.zip</code> in the same parent directory, so that you should see this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls
nifi-1.15.0 nifi-toolkit-1.15.0
</code></pre></div></div>
<p>We’ll start Vault using development mode, which allows us to easily interact with the Vault server without having to <a href="https://www.vaultproject.io/docs/concepts/seal">unseal</a> the server. Run the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault server -dev > init.log
</code></pre></div></div>
<p>This will launch Vault in development mode, and will write something like the following to init.log:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.
You may need to set the following environment variable:
$ export VAULT_ADDR='http://127.0.0.1:8200'
The unseal key and root token are displayed below in case you want to
seal/unseal the Vault or re-authenticate.
Unseal Key: fUJ6zxqlo+vsWFlCxMK9MKrhgpsmxdNla727mu+5nuY=
Root Token: s.3qsioJdA3YXptvEQURDCijUw
Development mode should NOT be used in production installations!
</code></pre></div></div>
<p>In another terminal, create a file containing this root token, which we’ll use later:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>TOKEN=$(grep "Root Token" init.log | cut -d ":" -f2 | xargs)
echo "vault.token=$TOKEN" > ~/vault-auth.properties
</code></pre></div></div>
<p>Run the following to enable the K/V (version 1) Secrets Engine, which actually stores sensitive values in the Vault server:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault secrets enable -path nifi-kv kv
</code></pre></div></div>
<p>This last command created a Vault path called <code class="language-plaintext highlighter-rouge">"nifi-kv"</code>. Take note of this value, because we’ll be seeing it again later. Note that we could have simply used the command <code class="language-plaintext highlighter-rouge">vault secrets enable kv</code>, which would have enabled the K/V secrets engine at the path <code class="language-plaintext highlighter-rouge">kv</code>, but here we use a specific path in order to see how it is customized.</p>
<h3 id="integrating-vault-into-nifi-and-nifi-tookit">Integrating Vault into nifi and nifi-tookit</h3>
<p>Now that we have Vault running and configured, we’ll configure nifi-toolkit to talk to Vault. Since the toolkit operates on actual NiFi configuration files, that’s where we’ll start.</p>
<p>First, open up <code class="language-plaintext highlighter-rouge">nifi-1.15.0/conf/bootstrap-hashicorp-vault.conf</code> in your preferred editor, and configure it as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># HTTP or HTTPS URI for HashiCorp Vault is required to enable the Sensitive Properties Provider
vault.uri=http://127.0.0.1:8200
# Transit Path is required to enable the Sensitive Properties Provider Protection Scheme 'hashicorp/vault/transit/{path}'
vault.transit.path=
# Key/Value Path is required to enable the Sensitive Properties Provider Protection Scheme 'hashicorp/vault/kv/{path}'
vault.kv.path=nifi-kv
# Token Authentication example properties
# vault.authentication=TOKEN
# vault.token=<token value>
# Optional file supports authentication properties described in the Spring Vault Environment Configuration
# https://docs.spring.io/spring-vault/docs/2.3.x/reference/html/#vault.core.environment-vault-configuration
#
# All authentication properties must be included in bootstrap-hashicorp-vault.conf when this property is not specified.
# Properties in bootstrap-hashicorp-vault.conf take precedence when the same values are defined in both files.
# Token Authentication is the default when the 'vault.authentication' property is not specified.
vault.authentication.properties.file=<full/path/to/vault-auth.properties>
</code></pre></div></div>
<p>Make sure you’ve specified the full path to the <code class="language-plaintext highlighter-rouge">vault-auth.properties</code> file that you created when installing Vault. Notice that we could have actually stored the <code class="language-plaintext highlighter-rouge">vault.token</code> directly in the <code class="language-plaintext highlighter-rouge">bootstrap-hashicorp-vault.conf</code> file, but it’s considered better practice to keep your authentication properties separate, in case you decide to rotate your Vault authentication token or other credentials.</p>
<p>At this point, we’ve integrated Vault into NiFi. Now copy this configuration file into nifi-toolkit and we’ll be ready to start encrypting files:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp nifi-1.15.0/conf/bootstrap-hashicorp-vault.conf nifi-toolkit-1.15.0/conf/
</code></pre></div></div>
<h3 id="encrypting-files-using-the-hashicorp_vault_kv-protection-scheme">Encrypting files using the HASHICORP_VAULT_KV protection scheme</h3>
<p>We will have some sensitive values to protect in <code class="language-plaintext highlighter-rouge">nifi-1.15.0/conf/nifi.properties</code> once we start NiFi:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd nifi-1.15.0
./bin/nifi.sh start
tail -f logs/nifi-app.log
</code></pre></div></div>
<p>Once you see the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2021-10-29 15:12:32,167 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2021-10-29 15:12:32,167 INFO [main] org.apache.nifi.web.server.JettyServer https://127.0.0.1:8443/nifi
</code></pre></div></div>
<p>Then <code class="language-plaintext highlighter-rouge">Ctrl-C</code> from tailing the log and stop NiFi:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./bin/nifi.sh stop
</code></pre></div></div>
<p>Now there should be some generated passwords in our <code class="language-plaintext highlighter-rouge">nifi-1.15.0/conf/nifi.properties</code> file. We can encrypt them using the following commands:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ../nifi-toolkit-1.15.0
# The breakdown of this command is as follows:
# -b specifies the NiFi bootstrap.conf, which specifies nifi.bootstrap.protection.hashicorp.vault.conf=./conf/bootstrap-hashicorp-vault.conf
# -n specifies the nifi.properties file to encrypt
# -o specifies the location to output the encrypted nifi.properties file. If we left out this argument, it would simply encrypt nifi.properties in place.
# -S specifies the protection scheme. If we left out this argument, it would use the default AES_GCM protection scheme.
./bin/encrypt-config.sh -b ../nifi-1.15.0/conf/bootstrap.conf \
-n ../nifi-1.15.0/conf/nifi.properties \
-o nifi.properties.encrypted \
-S HASHICORP_VAULT_KV
</code></pre></div></div>
<p>We’re prompted to enter a password (must be at least 12 characters), which will be used to configure the <code class="language-plaintext highlighter-rouge">nifi-1.15.0/conf/bootstrap.conf</code> with a <code class="language-plaintext highlighter-rouge">nifi.bootstrap.sensitive.key</code>. This value is not used by the HashiCorp Vault protection scheme, but is required for other protection schemes. Since different properties are permitted to be protected by different protection schemes at the same time, the key is still generated from the password you enter.</p>
<p>After you enter a password, you should see output like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2021/10/28 16:02:33 INFO [main] org.apache.nifi.properties.NiFiPropertiesLoader: Loaded 202 properties from ../nifi-1.15.0/conf/nifi.properties
2021/10/28 16:02:33 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Loaded NiFiProperties instance with 202 properties
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Protected nifi.security.keyPasswd with hashicorp/vault/kv/nifi-kv -> nifi-kv/default/nifi.security.keyPasswd
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Updated protection key nifi.security.keyPasswd.protected
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Protected nifi.security.keystorePasswd with hashicorp/vault/kv/nifi-kv -> nifi-kv/default/nifi.security.keystorePasswd
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Updated protection key nifi.security.keystorePasswd.protected
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Protected nifi.security.truststorePasswd with hashicorp/vault/kv/nifi-kv -> nifi-kv/default/nifi.security.truststorePasswd
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Updated protection key nifi.security.truststorePasswd.protected
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Protected nifi.sensitive.props.key with hashicorp/vault/kv/nifi-kv -> nifi-kv/default/nifi.sensitive.props.key
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Updated protection key nifi.sensitive.props.key.protected
2021/10/28 16:02:35 INFO [main] org.apache.nifi.properties.ConfigEncryptionTool: Final result: 205 keys including 4 protected keys
</code></pre></div></div>
<p>Our new protected properties can be seen in <code class="language-plaintext highlighter-rouge">nifi.properties.encrypted</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nifi.security.keystore=./conf/keystore.p12
nifi.security.keystoreType=PKCS12
nifi.security.keystorePasswd=nifi-kv/default/nifi.security.keystorePasswd
nifi.security.keystorePasswd.protected=hashicorp/vault/kv/nifi-kv
nifi.security.keyPasswd=nifi-kv/default/nifi.security.keyPasswd
nifi.security.keyPasswd.protected=hashicorp/vault/kv/nifi-kv
nifi.security.truststore=./conf/truststore.p12
nifi.security.truststoreType=PKCS12
nifi.security.truststorePasswd=nifi-kv/default/nifi.security.truststorePasswd
nifi.security.truststorePasswd.protected=hashicorp/vault/kv/nifi-kv
</code></pre></div></div>
<p>Notice that the <code class="language-plaintext highlighter-rouge">.protected</code> properties indicate <code class="language-plaintext highlighter-rouge">hashicorp/vault/kv/nifi-kv</code>, which tells NiFi to use the HashiCorp Vault K/V Sensitive Property Provider using a Vault path prefix of <code class="language-plaintext highlighter-rouge">"nifi-kv"</code>.</p>
<p>Also observe that the actual property values are simply the Vault paths of the relevant secrets. For example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nifi.security.keystorePasswd=nifi-kv/default/nifi.security.keystorePasswd
</code></pre></div></div>
<p>This tells us that the keystore password is now stored in a Vault Secret named <code class="language-plaintext highlighter-rouge">nifi-kv/default/nifi.security.keystorePasswd</code>. Well, let’s test it out!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault kv get nifi-kv/default/nifi.security.keystorePasswd
</code></pre></div></div>
<p>This produces output like the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>==== Data ====
Key Value
--- -----
value [REDACTED]
</code></pre></div></div>
<h3 id="starting-nifi">Starting NiFi</h3>
<p>Now that we’ve successfully encrypted our <code class="language-plaintext highlighter-rouge">nifi.properties</code> using HashiCorp Vault, let’s start it up and see it decrypt the values.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp nifi.properties.encrypted ../nifi-1.15.0/conf/nifi.properties
cd ../nifi-1.15.0
./bin/nifi.sh start
tail -f logs/nifi-app.logs
</code></pre></div></div>
<p>Since you’ve already configured <code class="language-plaintext highlighter-rouge">nifi-1.15.0/conf/bootstrap-hashicorp-vault.conf</code>, NiFi should use that configuration to connect to Vault when it encounters the <code class="language-plaintext highlighter-rouge">hashicorp/vault/kv/nifi-kv</code> values in <code class="language-plaintext highlighter-rouge">nifi.properties</code>, allowing it to retrieve the protected properties from the Vault server. If everything works as planned, you should be able to view NiFi’s UI at <a href="https://localhost:8443/nifi">https://localhost:8443/nifi</a>, this time with your properties pulled from the Vault Server at startup.</p>
<h3 id="other-secrets-managers">Other Secrets Managers</h3>
<p>With the Apache NiFi 1.15.0 release also come some new sensitive property providers for Azure KeyVault Secrets (<code class="language-plaintext highlighter-rouge">AZURE_KEYVAULT_SECRET</code> nifi-toolkit protection scheme) and AWS Secrets Manager (<code class="language-plaintext highlighter-rouge">AWS_SECRETSMANAGER</code> nifi-toolkit protection scheme). These providers function very similarly to the one we just saw, storing secrets in their respective cloud provider secrets managers.</p>