This post will be in the context of running FluentD on a VM using the td-agent and filebeat packages.

Background

I've been looking into how to optimize FluentD. I want Aggregators that can handle a lot of throughput and utilize each CPU core. FluentD is written in Ruby, and is thus subject to the constraints of the Global Interpretor Lock like Python.

This way seems to work well. As long as you break out each <process> block in your /etc/td-agent/td-agent.conf to handle one input plugin (although not necessary), it'll work for you. You can break it out into many sub-configurations. Its just a bit harder to maintain and scale.

I tried using the more native workers configuration property, but plugins must explicitly support it.

Beats is a common way to ship logs. I was surprised to find that the FluentD Beats Plugin doesn't support Multiple Workers. In order to test some changes to the plugin, I needed to be able to compile from source. And off we go with my forked repo...

The Process

Download, Build, Compile

  1. Prep the file system and clone the plugin repo:
mkdir -p /tmp/fluent-plugin-beats && \
cd /tmp/fluent-plugin-beats && \
git clone --single-branch -b multi-workers https://github.com/chicken231/fluent-plugin-beats
  1. Use td-agent's gem wrapper to build the gem:
td-agent-gem build fluent-plugin-beats.gemspec
  1. The build command generates a gem and adds a version number into the file name. Now you can install the gem to make it available to FluentD:
td-agent-gem install fluent-plugin-beats-0.1.4.gem

Configure, Install, Test

Assumptions:

  • td-agent is installed and configured. Here's an excerpt from my /etc/td-agent/td-agent.conf:
# general system config. Note the log format for later.
<system>
  workers 4
  suppress_config_dump
  <log>
    format json
  </log>
</system>

# beats input
<source>
  @type beats
  metadata_as_tag
  port 5044
  bind 0.0.0.0
</source>

# use this when testing to print to stdout and thus the log file
<match **>
  @type stdout
</match>
  • Filebeat is installed, configured, and enabled to point to Logstash output on localhost:5044 and is capturing files matching /var/log/*.log (default).
  1. Start td-agent and filebeat:
systemctl start td-agent filebeat
  1. Anything that comes in as an input to FluentD will write into /var/log/td-agent/td-agent.log. So tail it:
tail -f /var/log/td-agent/td-agent.log
  1. echo some stuff and append to a file in a directory that filebeat is monitoring:
echo WOOOO $(date) >> /var/log/temp.log
  1. Observe a log printed to td-agent's log. Note the logs are in JSON. An unflattened example of a message from filebeat:
{
  "@timestamp": "2018-10-11T02:26:50.313Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.4.2"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "name": "centos-beats",
    "hostname": "centos-beats",
    "version": "6.4.2"
  },
  "host": {
    "name": "centos-beats"
  },
  "source": "/var/log/temp.log",
  "offset": 754,
  "message": "WOOOO Thu Oct 11 02:26:45 UTC 2018",
  "prospector": {
    "type": "log"
  }
}

And there we are. We downloaded a fork of a plugin that had changes to enable multiple workers. Built it, compiled it, tested it, and watched the messages from Filebeat stream through the FluentD logs.

References