Parsing Logs with Regular Expressions using Fluentd

Overview

Fluentd is a powerful tool for log collection and processing. One of its most useful features is the ability to parse logs using regular expressions (regex). This allows you to extract specific information from your logs and structure them in a way that makes them easier to analyze.

In this post, we'll go through some examples of how to use regex with Fluentd to parse logs.

System Environments for this Exercise

The system environment used in the exercise below is as following.

  • Rocky Linux release 8.6

  • td-agent 4.5.0 (fluentd 1.16.1)

Basic Regex Parsing

Let's start with a basic example. Suppose you have the following log message.

[2023/07/20 15:30:45] [ info] [input:tail:tail.0] inotify_fs_add(): inode=12345678 watch_fd=1 name=/var/log/example.log

You can use the following Fluentd configuration to parse this log.

<source>
  @type tail
  path /var/log/sample.log
  pos_file /var/log/td-agent/sample.pos
  tag sample-topic
  read_from_head true
  <parse>
    @type regexp
    expression /(?<message>\[\d{4}\/\d{2}\/\d{2}\s+\d{2}:\d{2}:\d{2}\]\s+\[[\s\w]+\]\s+.*)/
  </parse>
</source>

<match **>
  @type stdout
</match>

The output will look something like this.

2023-07-20 23:47:56.810795577 +0000 sample-topic: {"message":"[2023/07/20 15:30:45] [ info] [input:tail:tail.0] inotify_fs_add(): inode=12345678 watch_fd=1 name=/var/log/example.log"}

This configuration uses the regexp parser to match the log message against the provided regex pattern. The (?<message>...) part of the pattern is a named capture group that extracts the entire log message and assigns it to the message field in the resulting record.

Handling Unmatched Logs

Sometimes your logs may contain messages that don't match your regex pattern. By default, Fluentd will display a warning when this happens. For example, if your logs contain the message "Hello World", Fluentd will issue the following warning.

2023-07-21 00:14:57.301616862 +0000 fluent.warn: {"message":"pattern not matched: \"Hello World\""}

You can suppress these warnings by adding the emit_invalid_record_to_error false option to your configuration.

Note: This option is not available in the source directive. Please add this option to the filter directive.

<source>
  @type tail
  path /var/log/sample.log
  pos_file /var/log/td-agent/sample.pos
  tag sample-topic
  read_from_head true
  <parse>
    @type none
    message_key prod
  </parse>
</source>

<filter sample-topic*>
  @type parser
  key_name prod
  emit_invalid_record_to_error false
  <parse>
     @type regexp
     expression /(?<message>\[\d{4}\/\d{2}\/\d{2}\s+\d{2}:\d{2}:\d{2}\]\s+\[[\s\w]+\]\s+.*)/
  </parse>
</filter>

<match **>
  @type stdout
</match>

Parsing Multiline Logs

Fluentd also supports parsing multiline logs. This is useful when your logs contain messages that span multiple lines. For example, consider the following log message.

[2021/12/07 21:49:04] [ info] Hello
from
Fluentd
!!

You can use the multiline parser to handle this kind of log.

<source>
  @type tail
  path /var/log/sample.log
  pos_file /var/log/td-agent/sample.pos
  tag sample_log
  read_from_head true
  <parse>
    @type multiline
    format_firstline /\[\d{4}\/\d{2}\/\d{2}\s+\d{2}:\d{2}:\d{2}\]\s+\[[\s\w]+\]\s+.*/
    format1 /^(?<message>.*)/
  </parse>
</source>

<match **>
  @type stdout
</match>

As you can see, the output will look like this.

2023-07-21 23:16:53.222715003 +0000 sample_log: {"message":"[2021/12/07 21:49:04] [ info] Hello\nfrom\nFluentd\n!!"}

This configuration uses the multiline parser to match the first line of each log message against the format_firstline pattern. It then uses the format1 pattern to extract the entire message, including any additional lines.

Conclusion

Fluentd's regex parsing capabilities make it a powerful tool for processing logs. Whether you're dealing with simple single line messages or complex multiline logs, Fluentd can help you extract the information you need.

Happy logging!

Need more help? - We are here for you.

In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.

Previous
Previous

Practical Tips for Data Volume Reduction with Fluentd

Next
Next

Visualizing Fluent Bit agent with Prometheus/Grafana