Parsing in Fluent Bit using Regular Expression

Background and Overview

Without proper parsing, log data can be difficult to interpret and use effectively, leading to potential issues such as increased troubleshooting time or missed opportunities to detect important events. Parsing data in Fluent Bit is important because it allows the user to extract relevant information from unstructured or semi-structured log data. This extracted data can then be further processed, analyzed, and sent to various outputs. It helps to make sense of log data and turn it into valuable insights.

When parsing data, regular expressions allow users to define complex patterns that match specific parts of the log data, such as timestamps, error messages, or IP addresses. This provides a powerful and flexible way to extract and organize relevant information from log data, making it easier to understand, analyze, and act upon.

In this blog, as a second exercise of the use case of creating a flow using Fluent Bit and Fluentd, we will parse the obtained log data using regular expression. If you want to check out this use case from the beginning, where we covered the basics of the ‘tail’ plugin in Fluent Bit, feel free to check out the first blog from the link below.

  1. ’tail’ in Fluent Bit - Standard Configuration

  2. Parsing in Fluent Bit using Regular Expression
    ↑ This blog will cover this section!

  3. Multiline Parsing with Fluent Bit

System Environments for this Exercise

The system environment used in the exercise below is as following:

  • CentOS8

  • Fluent Bit v2.0.6

  • VM specs: 2 CPU cores / 2GB memory

Directory Structure and Log file

Just like in our first blog, we will use ‘tail’ plugin in Fluent Bit to obtain data from a typical Linux log file. The directory structure and log file will remain the same as well.

Here is the directory structure of this case:

/fluentbit : root directory
  |--- conf
    |--- custom_parsers.conf
    |--- Lab01
      |-- (Lab01 configuration files)
      |-- sample
        |-- (Sample log files for exercise)
  |--- log
  |--- buffer

Here is the sample Linux log we’ll try to parse.

  • sample01_linux_messages.txt:

Oct 27 16:14:31 fluent01 systemd[1]: Started dnf makecache.
Oct 27 16:20:29 fluent01 systemd[1]: Starting system activity accounting tool...
Oct 27 16:20:29 fluent01 systemd[1]: Started system activity accounting tool.
Oct 27 16:40:29 fluent01 kubelet[896]: W1027 16:40:29.280967     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service: no such file or directory
Oct 27 16:40:29 fluent01 kubelet[896]: W1027 16:40:29.281027     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/sysstat-collect.service: no such file or directory
Oct 27 16:40:29 fluent01 kubelet[896]: W1027 16:40:29.281048     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/memory/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/sysstat-collect.service: no such file or directory

Now let’s move on to the exercise to parse this data!

Exercise

In the previous exercise, Fluent Bit just read the log messages from the target file and flushed them to stdout. The whole message is nested under the ‘log’ key by default but you may want to capture specific information in some cases. In the following log there are information such as:

  • Timestamp : Oct 27 16:14:31

  • Hostname : fluent01

  • Ident : systemd

  • Process ID : 1

  • Message Body : Started dnf makecache.

Oct 27 16:14:31 fluent01 systemd[1]: Started dnf makecache.

Step 1: Create Regular Expression

Here is a way to define custom parsing rules with regular expression (“regex”). A sample regex and an output for the log pattern are shown below.

  • Regex:

/^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
  • Output:

{
    "time" : "Oct 27 16:14:31",
    "host" : "fluent01",
    "ident" : "systemd"
    "pid" : "1"
    "message" : "Started dnf makecache."
}

Rubular and Regex101 are useful to try out your regular expressions.

Step 2: Define Custom Parser

Once your regex is ready, the next step is to define custom parser for Fluent Bit. We typically prepare ‘custom_parsers.conf’ and specify it in the ‘[SERVICE]’ section.

Here is a sample custom parser definition for Linux OS log messages. You can find the custom regex in the option ‘Regex.’

[PARSER]
   Name        syslog-messages
   Format      regex
   Regex       /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
   Time_Key    time
   Time_Format %b %d %H:%M:%S
   Time_Keep   On

As described in our first blog, Fluent Bit uses timestamp based on the time that Fluent Bit read the log file, and that potentially causes a mismatch between timestamp in the raw messages. There are time settings, ‘Time_key,’ ‘Time_format’ and ‘Time_keep’ which are useful to avoid the mismatch.

  • ‘Time_Key’ : Specify the name of the field which provides time information.

  • ‘Time_Format’ : Specify the format of the time field so it can be recognized and analyzed properly.

  • ‘Time_keep’ : By default when a time key is recognized and parsed, the parser will drop the original time field. By enabling this option Fluent Bit will not overwrite the timestamp with the time that the log message was processed, but instead keep the timestamp that was recorded in the log message itself.

In the sample parser, we capture timestamp as ‘time’ key and the time format is ‘Oct 27 16:14:31’ which can be normalized into ‘%b %d %H:%M:%S.’ For ways to normalize the time format, try checking out this page.

Step 3: Run Fluent Bit with Custom Parser

Let’s run Fluent Bit with the custom parser!

  • custom_parsers.conf

[PARSER]
   Name        syslog-messages
   Format      regex
   Regex       /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
   Time_Key    time
   Time_Format %b %d %H:%M:%S
   Time_Keep   On
  • sample02_flb_tail_custom_parser.conf

    • Add ‘Parser syslog-messages’ in ‘tail’ section

    • Make sure that the custom parser file is specified properly, such as:
      Parsers_File /fluentbit/conf/custom_parsers.conf

[SERVICE]
    ## General settings
    Flush                     5
    Log_Level                 Info
    Daemon                    off
    Log_File                  /fluentbit/log/fluentbit.log
    Parsers_File              /fluentbit/conf/custom_parsers.conf

    ## Buffering and Storage
    Storage.path              /fluentbit/buffer/
    Storage.sync              normal
    Storage.checksum          Off
    Storage.backlog.mem_limit 5M
    Storage.metrics           On

    ## Monitoring (if required)
    HTTP_Server               true
    HTTP_Listen               0.0.0.0
    HTTP_Port                 2020
    Health_Check              On
    HC_Errors_Count           5
    HC_Retry_Failure_Count    5
    HC_Period                 60

[INPUT]
    Name   tail
    Tag    linux.messages
    Path   /fluentbit/conf/Lab01/sample/sample01_linux_messages.txt
    Storage.type   filesystem
    Read_from_head true
    DB     /fluentbit/tail_linux_messages.db
    Parser      syslog-messages

[OUTPUT]
    Name   stdout
    Match  linux.messages
  • Let’s run Fluent Bit with the sample config

    • You might need to remove ‘DB’ files /fluentbit/tail_linux_messages.db to make Fluent Bit read the line from the beginning.

$ fluent-bit -c sample02_flb_tail_custom_parser.conf
  • You can see parsed messages in stdout.

    • There are ‘time’, ‘host’, ‘ident’, ‘pid’ and ‘message’ keys as expected.

    • Also, the timestamp written by Fluent Bit is the same with the timestamp in the raw messages.

[0] linux.messages: [1666887271.000000000, {"time"=>"Oct 27 16:14:31", "host"=>"fluent01", "ident"=>"systemd", "pid"=>"1", "message"=>"Started dnf makecache."}]
[1] linux.messages: [1666887629.000000000, {"time"=>"Oct 27 16:20:29", "host"=>"fluent01", "ident"=>"systemd", "pid"=>"1", "message"=>"Starting system activity accounting tool..."}]
[2] linux.messages: [1666887629.000000000, {"time"=>"Oct 27 16:20:29", "host"=>"fluent01", "ident"=>"systemd", "pid"=>"1", "message"=>"Started system activity accounting tool."}]
[3] linux.messages: [1666888829.000000000, {"time"=>"Oct 27 16:40:29", "host"=>"fluent01", "ident"=>"kubelet", "pid"=>"896", "message"=>"W1027 16:40:29.280967     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/sysstat-collect.service: no such file or directory"}]
[4] linux.messages: [1666888829.000000000, {"time"=>"Oct 27 16:40:29", "host"=>"fluent01", "ident"=>"kubelet", "pid"=>"896", "message"=>"W1027 16:40:29.281027     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/sysstat-collect.service: no such file or directory"}]
[5] linux.messages: [1666888829.000000000, {"time"=>"Oct 27 16:40:29", "host"=>"fluent01", "ident"=>"kubelet", "pid"=>"896", "message"=>"W1027 16:40:29.281048     896 watcher.go:95] Error while processing event ("/sys/fs/cgroup/memory/system.slice/sysstat-collect.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/sysstat-collect.service: no such file or directory"}]

Great, you finished the second exercise! You now know how to parse the log messages using the ‘tail’ plugin and regex. In the next blog, you will learn how to parse log messages that have multiple log lines.

Need more help? - We are here for you.

In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.

Previous
Previous

Multiline Parsing Best Practice in Fluent Bit

Next
Next

‘tail’ in Fluent Bit - Standard Configuration