Fluentd Subscription Network

View Original

Parsing in Fluent Bit using Regular Expression

Background and Overview

Without proper parsing, log data can be difficult to interpret and use effectively, leading to potential issues such as increased troubleshooting time or missed opportunities to detect important events. Parsing data in Fluent Bit is important because it allows the user to extract relevant information from unstructured or semi-structured log data. This extracted data can then be further processed, analyzed, and sent to various outputs. It helps to make sense of log data and turn it into valuable insights.

When parsing data, regular expressions allow users to define complex patterns that match specific parts of the log data, such as timestamps, error messages, or IP addresses. This provides a powerful and flexible way to extract and organize relevant information from log data, making it easier to understand, analyze, and act upon.

In this blog, as a second exercise of the use case of creating a flow using Fluent Bit and Fluentd, we will parse the obtained log data using regular expression. If you want to check out this use case from the beginning, where we covered the basics of the ‘tail’ plugin in Fluent Bit, feel free to check out the first blog from the link below.

  1. ’tail’ in Fluent Bit - Standard Configuration

  2. Parsing in Fluent Bit using Regular Expression
    ↑ This blog will cover this section!

  3. Multiline Parsing with Fluent Bit

System Environments for this Exercise

The system environment used in the exercise below is as following:

  • CentOS8

  • Fluent Bit v2.0.6

  • VM specs: 2 CPU cores / 2GB memory

Directory Structure and Log file

Just like in our first blog, we will use ‘tail’ plugin in Fluent Bit to obtain data from a typical Linux log file. The directory structure and log file will remain the same as well.

Here is the directory structure of this case:

See this content in the original post

Here is the sample Linux log we’ll try to parse.

  • sample01_linux_messages.txt:

See this content in the original post

Now let’s move on to the exercise to parse this data!

Exercise

In the previous exercise, Fluent Bit just read the log messages from the target file and flushed them to stdout. The whole message is nested under the ‘log’ key by default but you may want to capture specific information in some cases. In the following log there are information such as:

  • Timestamp : Oct 27 16:14:31

  • Hostname : fluent01

  • Ident : systemd

  • Process ID : 1

  • Message Body : Started dnf makecache.

See this content in the original post

Step 1: Create Regular Expression

Here is a way to define custom parsing rules with regular expression (“regex”). A sample regex and an output for the log pattern are shown below.

  • Regex:

See this content in the original post
  • Output:

See this content in the original post

Rubular and Regex101 are useful to try out your regular expressions.

Step 2: Define Custom Parser

Once your regex is ready, the next step is to define custom parser for Fluent Bit. We typically prepare ‘custom_parsers.conf’ and specify it in the ‘[SERVICE]’ section.

Here is a sample custom parser definition for Linux OS log messages. You can find the custom regex in the option ‘Regex.’

See this content in the original post

As described in our first blog, Fluent Bit uses timestamp based on the time that Fluent Bit read the log file, and that potentially causes a mismatch between timestamp in the raw messages. There are time settings, ‘Time_key,’ ‘Time_format’ and ‘Time_keep’ which are useful to avoid the mismatch.

  • ‘Time_Key’ : Specify the name of the field which provides time information.

  • ‘Time_Format’ : Specify the format of the time field so it can be recognized and analyzed properly.

  • ‘Time_keep’ : By default when a time key is recognized and parsed, the parser will drop the original time field. By enabling this option Fluent Bit will not overwrite the timestamp with the time that the log message was processed, but instead keep the timestamp that was recorded in the log message itself.

In the sample parser, we capture timestamp as ‘time’ key and the time format is ‘Oct 27 16:14:31’ which can be normalized into ‘%b %d %H:%M:%S.’ For ways to normalize the time format, try checking out this page.

Step 3: Run Fluent Bit with Custom Parser

Let’s run Fluent Bit with the custom parser!

  • custom_parsers.conf

See this content in the original post
  • sample02_flb_tail_custom_parser.conf

    • Add ‘Parser syslog-messages’ in ‘tail’ section

    • Make sure that the custom parser file is specified properly, such as:
      Parsers_File /fluentbit/conf/custom_parsers.conf

See this content in the original post
  • Let’s run Fluent Bit with the sample config

    • You might need to remove ‘DB’ files /fluentbit/tail_linux_messages.db to make Fluent Bit read the line from the beginning.

See this content in the original post
  • You can see parsed messages in stdout.

    • There are ‘time’, ‘host’, ‘ident’, ‘pid’ and ‘message’ keys as expected.

    • Also, the timestamp written by Fluent Bit is the same with the timestamp in the raw messages.

See this content in the original post

Great, you finished the second exercise! You now know how to parse the log messages using the ‘tail’ plugin and regex. In the next blog, you will learn how to parse log messages that have multiple log lines.

Need more help? - We are here for you.

In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.