Data Collection

Node Run and Audit Data Collection

Nodes can send their run data to Chef Automate. You can also use the Audit Cookbook to send audit information about your nodes to Compliance. There are two steps to getting data collection running in Chef Automate:

  1. You must first have an API token. You have two options.

  2. Once you have an API token, you can either

Set Up an Existing Chef Automate 1 Data Collector Token in Chef Automate 2

Porting the Existing Chef Automate 1 Data Collector Token to Chef Automate 2

If you are migrating from Chef Automate 1, you probably have already deployed a data collector token on either your Chef Servers or your Chef Clients. To re-use your existing data collector token from your Chef Automate 1 installation, you need to perform the configuration change outlined here.

For this process, you need the existing token (let’s call it A1_DC_TOKEN), and access to the machine running the chef-automate CLI client.

Create a file (in this example, data-collector-token.toml) containing your existing token:

[auth_n.v1.sys.service]
a1_data_collector_token = "<A1_DC_TOKEN>"

Now apply that configuration to your Chef Automate 2 deployment:

# chef-automate config patch data-collector-token.toml

[...output omitted...]

Success: Configuration patched

The system will notice that configuration change after a short interval. From that point on, requests using the x-data-collector-token: <A1_DC_TOKEN> header will be accepted. When logged in with admin permissions, you will also find your added token in https://automate.example.com/admin/tokens, under the name

Legacy data collector token ported from A1

Now that you have a valid API token, you’ll need to update your Chef Server data collector configuration if you are using a Chef Server. Otherwise, you must configure your Chef Clients to send data directly to Chef Automate.

Configure your Chef Server to Send Data to Chef Automate

Note: Multiple Chef Servers can send data to a single Chef Automate server.

In addition to forwarding Chef run data to Chef Automate, Chef Server will send messages to Chef Automate whenever an action is taken on a Chef Server object, such as when a cookbook is uploaded to the Chef Server or when a user edits a role.

In order to have Chef Server send run data from connected Chef Clients, set the data collection proxy attribute to true.

Setting Up Data Collection on Chef Server Versions 12.14 and Higher

Instead of setting the token directly in /etc/opscode/chef-server.rb as was done in older versions of the Chef Server, we’ll use the set-secret command, so that your API token does not live in plaintext in a file:

sudo chef-server-ctl set-secret data_collector token '<API_TOKEN_FROM_STEP_1>'
sudo chef-server-ctl restart nginx
sudo chef-server-ctl restart opscode-erchef

Next, configure the Chef Server for data collection forwarding by adding the following setting to /etc/opscode/chef-server.rb:

data_collector['root_url'] = 'https://automate.example.com/data-collector/v0/'
# Add for chef client run forwarding
data_collector['proxy'] = true
# Add for compliance scanning
profiles['root_url'] = 'https://automate.example.com'
# Save and close the file

To apply the changes, run:

sudo chef-server-ctl reconfigure

Setting Up Data Collection on Chef Server Versions 12.13 and Lower

On versions 12.13 and prior, simply add the root_url and token values in /etc/opscode/chef-server.rb:

data_collector['root_url'] = 'https://automate.example.com/data-collector/v0/'
data_collector['token'] = '<API_TOKEN_FROM_STEP_1>'
# Add for chef client run forwarding
data_collector['proxy']= true
# Add for compliance scanning
profiles['root_url'] = 'https://automate.example.com'
# Save and close the file

To apply the changes, run:

chef-server-ctl reconfigure

Setting Up Chef Client to Send Compliance Scan Data Through the Chef Server to Chef Automate

Now that the Chef Server is configured for data collection, you can also enable Compliance Scanning on your Chef Clients via the Audit Cookbook.

  • Set the following attributes for the audit cookbook:
default['audit']['reporter'] = 'chef-server-automate'
default['audit']['fetcher'] = 'chef-server'
default['audit']['profiles'].push(
  'name': 'cis-centos7-level2',
  'compliance': 'user-name/cis-centos7-level2' # in the ui for automate, this value is the identifier for the profile
)
default['audit']['interval'] = {
  'enabled': true
  'time': 1440  # once a day, the default value
}

Now, any node with audit::default its runlist will fetch and report data to and from Chef Automate via the Chef Server. Please see the audit cookbook for an exhaustive list of configuration options.

Additional Chef Server Data Collection Configuration Options

Option Description Default
data_collector['proxy'] If set to true, Chef Server will proxy all requests sent to /data-collector to the configured Chef Automate data_collector['root_url']. Note that this route does not check the request signature and add the right data_collector token, but just proxies the Chef Automate endpoint as-is. Default: nil
data_collector['timeout'] Timeout in milliseconds to abort an attempt to send a message to the Chef Automate server. Default: 30000
data_collector['http_init_count'] Number of Chef Automate HTTP workers Chef server should start. Default: 25
data_collector['http_max_count'] Maximum number of Chef Automate HTTP workers Chef server should allow to exist at any time. Default: 100
data_collector['http_max_age'] Maximum age a Chef Automate HTTP worker should be allowed to live, specified as an Erlang tuple. Default: {70, sec}
data_collector['http_cull_interval'] How often Chef server should cull aged-out Chef Automate HTTP workers that have exceeded their http_max_age, specified as an Erlang tuple. Default: {1, min}
data_collector['http_max_connection_duration'] Maximum duration an HTTP connection is allowed to exist before it is terminated, specified as an Erlang tuple. Default: {70, sec}

Configure your Chef Client to Send Data to Chef Automate without Chef Server

If you do not use a Chef Server in your environment (if you only use chef-solo, for example), you can configure your Chef clients to send their run data to Chef Automate directly by performing the following:

  1. Add Chef Automate SSL certificate to trusted_certs directory.

  2. Configure Chef Client to use the Data Collector endpoint and API token in Chef Automate.

Add Chef Automate certificate to trusted_certs directory

Note: This step only applies to self-signed SSL certificates. If you are using an SSL certificate signed by a valid certificate authority, you may skip this step.

Chef requires that the self-signed Chef Automate SSL certificate (HOSTNAME.crt) is located in the /etc/chef/trusted_certs directory on any node that wants to send data to Chef Automate. This directory is the location into which SSL certificates are placed when a node has been bootstrapped with chef-client.

To fetch the certificate onto your workstation, use knife ssl fetch and pass in the URL of the Chef Automate server. You can then use utilities such as scp or rsync to copy the downloaded cert files from your .chef/trusted_certs directory to the /etc/chef/trusted_certs directory on the nodes in your infrastructure that will be sending data directly to the Chef Automate server.

Configure Chef Client to Use the Data Collector Endpoint in Chef Automate

Note: Chef version 12.12.15 or greater is required.

The data collector functionality is used by the Chef Client to send node and converge data to Chef Automate. This feature works for Chef Client, as well as both the default and legacy modes of Chef solo.

To send node, converge, and compliance data to Chef Automate, modify your Chef config (that is client.rb, solo.rb, or add an additional config file in an appropriate directory, such as client.d) to contain the following configuration:

data_collector.server_url "https://automate.example.com/data-collector/v0/"
data_collector.token '<API_TOKEN_FROM_STEP_1>'

Setting Up Chef Client to Send Compliance Scan Data Directly to Chef Automate

Now that the Chef Client is configured for data collection, you can also enable Compliance Scanning on via the Audit Cookbook.

  • Set the following attributes for the audit cookbook:
default['audit']['reporter'] = 'chef-automate'
default['audit']['fetcher'] = 'chef-automate'
default['audit']['token'] = '<API_TOKEN_FROM_STEP_1>'
default['audit']['profiles'].push(
  'name': 'cis-centos7-level2',
  'compliance': 'user-name/cis-centos7-level2' # in the ui for automate, this value is the identifier for the profile
)
default['audit']['interval'] = {
  'enabled': true
  'time': 1440  # once a day, the default value
}

Now, any node with audit::default its runlist will fetch and report data directly to and from Chef Automate. Please see the audit cookbook for an exhaustive list of configuration options.

Additional Chef Client Data Collection Configuration Options

Configuration Description Options Default
data_collector.mode The mode in which the data collector is allowed to operate. This can be used to run data collector only when running as Chef solo but not when using Chef client. :solo, :client, or :both :both
data_collector.raise_on_failure When the data collector cannot send the “starting a run” message to the data collector server, the data collector will be disabled for that run. In some situations, such as highly-regulated environments, it may be more reasonable to Prevents data collection when the data collector cannot send the “starting a run” message to the data collector server. In these situations, setting this value to true will cause the Chef run to raise an exception before starting any converge activities. true, false false
data_collector.organization A user-supplied organization string that can be sent in payloads generated by the data collector when Chef is run in Solo mode. This allows users to associate their Solo nodes with faux organizations without the nodes being connected to an actual Chef server. string none

Troubleshooting: My Data Does Not Show Up in the User Interface

Organizations without associated nodes will not show up on the Chef Automate Nodes page. A node is not associated with automate until a Chef Client Run has completed. This is also true for roles, cookbooks, recipes, attributes, resources, node names, and environments but does not highlight them in the UI. This is designed to keep the UI focused on the nodes in your cluster.