This article was originally posted here on September 10 2018.
With the expansion and growth of the native public cloud, cloud native and devops principles, application logging has gone through a maturation phase. Thanks to open source projects like Logstash and Fluentd, the opportunities to improve logging while maintaining security and operations have improved.
This article guides us through the benefits of using Fluentd as a node and aggregator for an application deployed on Amazon EC2. It also applies to multi-cloud operations and hybrid-cloud deployments. Think about having a decentralized but standardized method for forwarding logs from nodes (application servers) to aggregators (jumpbox, managment server, etc) and ultimately ending up in a central repository. ElasticSearch, Amazon S3, Google StackDriver, Hadoop and VMware Log Intelligence are few examples for centralized log collection.
Fluentd is an open source project with the backing of the Cloud Native Computing Foundation (CNCF). Fluentd is an open source data collector for unified logging layer that allows for unification of data collection and consumption for a better use and understanding of data. For this example; Fluentd, will act as a log collector and aggregator. Fluentd is utilized to maintain security segmentation while forwarding logs (applications and operating system) from nine servers associated with the Fit Cycle Application to four separate locations through a single management/jump box! Rather than cover all of the components of the application, I will provide a high level overview and highlight that Fluentd is setup as follows:
Fluentd Node Configuration
The first step is understanding how each application server or Fluentd Node is configured.
Input Configuration
# Input from Syslog
<source> @type syslog port 42185 bind 127.0.0.1 tag syslog </source>
Output Configuration
# Log Forwarding and Local Copy
@type forward
send_timeout 60s
recover_wait 10s
hard_timeout 60s
name mgmt1
host 172.100.2.41
port 24224
</server>
@type file
path /tmp/collectedm
</secondary>
</match>
Notice the IP Address listed under the <server> section, this is the local IP address of the management/jumpbox that is the Fluentd aggregator within the VPC construct.
Fluentd Aggregator Configuration
Input Configuration
# Input from local Syslog <source> @type syslog port 42185 bind 127.0.0.1 tag syslog </source>
# Input from Nodes <source> @type forward port 24224 bind 0.0.0.0 </source> Note the input from Syslog locally and forwarded from existing nodes running Fluentd within the VPC.
Output Configuration
The following configuration items is broken down in sections based upon on forwarding location.
Amazon S3v
This is a simple addition to any Fluentd configuration and the documentation can be found here. Below is the output in Amazon S3.
VMware Log Intelligence
In the following section I utilized the Fluentd out-http-ext plugin found on github. It is also listed on the Fluentd plugin page found here. My peers published a blog a few months ago entitled “Using Fluentd to Send Logs from Any Cloud to VMware Log Intelligence” and it is meant to help you get a base understanding of using Fluentd with application servers. I wont go to far into detail related to this forwarder but my configuration forwards logs to separate instances of VMware Log Intelligence.
<store> @type http_ext endpoint_url https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream http_method post serializer json rate_limit_msec 200 raise_on_error true raise_on_http_failure true authentication none use_ssl true verify_ssl false <headers> Authorization Bearer lZXGxe2hURIDXMlPvvryMlAA2aMzNtU8 Content-Type application/json format syslog structure default </headers> </store> <store> @type http_ext endpoint_url https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream http_method post serializer json rate_limit_msec 200 raise_on_error true raise_on_http_failure true authentication none use_ssl true verify_ssl false <headers> Authorization Bearer 2sIvyJ76Imh9dlHWYq98ol4CRe2ZC3vU Content-Type application/json format syslog structure default </headers> </store>
Below is an example of the output as displayed in Log Intelligence:
Local File
The following example provides little to no value in my environment except my own sanity! Notice I am writing to /tmp and because I am a good systems administrator that directory gets cleared each reboot! Check out the Fluentd documentation for additional detail.
<store> @type file path /tmp/fluentd/local compress gzip <buffer> timekey 1d timekey_use_utc true timekey_wait 10m </buffer> </store> Below is an example of the /tmp directory after the output of logs to file:
Output (Complete) Configuration — Aggregator
Fluentd supports the ability of copying logs to multiple locations in one simple process. The configuration example below includes the “copy” output option along with the S3, VMware Log Intelligence and File methods. Read more about the Copy output plugin here.
# Output to S3, VMware Log Intelligence (2x) and Local File <match **> <store> @type file path /tmp/fluentd/local compress gzip <buffer> timekey 1d timekey_use_utc true timekey_wait 10m </buffer> </store> <store> @type s3 aws_key_id AKIAJGD3JBHWE2IFX65Q aws_sec_key FvjlY91mFWfCkbAtMpD301mYZfAdllS3aW8p/LcA s3_bucket fit-b-a-us-w1-00-m s3_region us-west-1 path vpc-5726398/logs <buffer tag,time> @type file path /tmp/fluentd/s3 timekey 3600 # 1 hour partition timekey_wait 10m timekey_use_utc true # use utc chunk_limit_size 256m </buffer> </store> <store> @type http_ext endpoint_url https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream http_method post serializer json rate_limit_msec 200 raise_on_error true raise_on_http_failure true authentication none use_ssl true verify_ssl false <headers> Authorization Bearer lZXGxe2hIMIDXMlPvvryMlFF2aMzNtU8 Content-Type application/json format syslog structure default </headers> </store> <store> @type http_ext endpoint_url https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream http_method post serializer json rate_limit_msec 200 raise_on_error true raise_on_http_failure true authentication none use_ssl true verify_ssl false <headers> Authorization Bearer 2sIvyN76Imh9dlHWYqO5ol4LRe2ZC3vU Content-Type application/json format syslog structure default </headers> </store> </match>
Fluentd is a powerful open source solution! In the previous example. Fluentd is utilized to maintain security segmentation while forwarding logs (applications and operating system) from nine servers associated with the Fit Cycle App to four separate locations through a single management/jump box!
If you have interest in logging for Kubernetes based applications, take a look at Bill Shetti’s blog found here.