Multiline Parsing Best Practice in Fluent Bit

Background and Overview

It is important to parse multiline log data using Fluent Bit because many log files contain log events that span multiple lines, and parsing these logs correctly can improve the accuracy and usefulness of the data extracted from them. When multiline logs are not properly parsed, it can result in errors, inconsistencies, and incomplete or inaccurate information in the data being extracted.

By accurately parsing multiline logs, users can gain a more comprehensive understanding of their log data, identify patterns and anomalies that may not be apparent with single-line logs, and gain insights into application performance and potential issues. This can help organizations troubleshoot and optimize their applications and infrastructure, improving reliability and reducing downtime.

This blog post would be the third and last section of our Fluent Bit use case as shown below.

  1. ’tail’ in Fluent Bit - Standard Configuration

  2. Parsing in Fluent Bit using Regular Expression

  3. Multiline Parsing in Fluent Bit
    ↑ This blog will cover this section!

System Environments for this Exercise

The system environment used in the exercise below is as following:

  • CentOS8

  • Fluent Bit v2.0.6

  • VM specs: 2 CPU cores / 2GB memory

Exercise

The directory structure would remain the same as the two exercises in our earlier blog posts:

/fluentbit : root directory
  |--- conf
    |--- custom_parsers.conf
    |--- Lab01
      |-- (Lab01 configuration files)
      |-- sample
        |-- (Sample log files for exercise)
  |--- log
  |--- buffer

In the previous blog post, we parsed log data that had multiple lines in the same format using regular expression (“regex”). Having similar formats in all the lines made it relatively easy to parse the log data.

However in some cases, you would like to merge multiple log lines into a single line. The following Fluentd log file for instance, has stack trace messages from line #3 to #22. These lines should be treated as a single log event to make log message meaningful. This is where ‘Multiline Parsing’ feature comes in.

  • sample02_multiline.txt (Fluentd log file example)

2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2022-10-21 23:42:04 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
	from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
	from /opt/fluentd/bin/fluentd:25:in `load'
	from /opt/fluentd/bin/fluentd:25:in `<main>'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.2.9'
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'

The steps to enable ‘multiline parsing’ feature is almost the same with custom parsing. The first step is to create the custom regex for the first line and also define the parsing rule. Here is the parser rule for the Fluentd log shown earlier:

[PARSER]
   Name        FLUENTD_LOG
   Format      regex
   Regex        /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*)\s+\[(?<level>[\s\w]*)\]\:\s+(?<message>.*)$/
   Time_Key  time
   Time_Format %Y-%m-%d %H:%M:%S
   Time_Keep   On

Then configure ‘multiline parsing’ settings in the tail section.

  • ‘Multiline’ : ‘On’ to enable multiline parsing feature

  • ‘Parser_Firstline’ : Specify parsing rule for the first line

[INPUT]
    Name   tail
    Tag    linux.messages
    Path   /fluentbit/conf/Lab4/sample/sample02_multiline.txt
    Storage.type   filesystem
    Read_from_head true
    #DB     /fluentbit/tail_linux_messages.db
    Multiline         On
    Parser_Firstline  FLUENTD_LOG

The whole sample of the configuration file is as below:

  • sample03_flb_tail_multiline_parser.conf

[SERVICE]
    ## General settings
    Flush                     5
    Log_Level                 Info
    Daemon                    off
    Log_File                  /fluentbit/log/fluentbit.log
    Parsers_File              /fluentbit/conf/custom_parsers.conf

    ## Buffering and Storage
    Storage.path              /fluentbit/buffer/
    Storage.sync              normal
    Storage.checksum          Off
    Storage.backlog.mem_limit 5M
    Storage.metrics           On

    ## Monitoring (if required)
    HTTP_Server               true
    HTTP_Listen               0.0.0.0
    HTTP_Port                 2020
    Health_Check              On
    HC_Errors_Count           5
    HC_Retry_Failure_Count    5
    HC_Period                 60

[INPUT]
    Name   tail
    Tag    linux.messages
    Path   /fluentbit/conf/Lab01/sample/sample02_multiline.txt
    Storage.type   filesystem
    Read_from_head true
    #DB     /fluentbit/tail_linux_messages.db
    Multiline         On
    Parser_Firstline  FLUENTD_LOG

[OUTPUT]
    Name   stdout
    Match  linux.messages

Let's run Fluent Bit with the sample configuration.

  • Run Fluent Bit

$ fluent-bit -c sample03_flb_tail_multiline_parser.conf
  • Check the output

    • As you can see, the stack traces from line #3 to #22 in the original file were merged into a single event as expected.

[0] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-utmpx' version '0.5.0'"}]
[1] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-webhdfs' version '1.5.0'"}]
[2] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"warn", "message"=>"For security reason, setting private_key_passphrase is recommended when cert_path is specified
/opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT)
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>'
        from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>'
        from /opt/fluentd/bin/fluentd:25:in `load'
        from /opt/fluentd/bin/fluentd:25:in `<main>'"}]
[3] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-splunk-hec' version '1.2.9'"}]
[4] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-systemd' version '1.0.5'%                  "}]

Congratulations! You finished the last exercise of this use case.

In this blog we shared one of the simplest ways to parse log data using Fluent Bit. When ‘Multiline On’ is set in the ‘[INPUT]’ section, just like in this blog, Fluent Bit will apply the same multiline configuration to all logs coming through that input. This means that all logs in that input will be parsed using the same pattern, regardless of their content or format. On the other hand, if you use ‘[MULTILINE_PARSER]’ section to parse your data, which is another option you could use to parse data, you can define multiple parsing rules for different log formats or sources. This allows you to have more fine-grained control over how the logs are parsed, and apply different parsing configurations to different inputs. For example, you can define one parsing rule for Apache logs and another one for Nginx logs, each with its own pattern and configuration.

Want to learn more? - Let’s get in touch.

In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.

Previous
Previous

Fluentd vs Fluent Bit: Understanding the Differences

Next
Next

Parsing in Fluent Bit using Regular Expression