Multiline Parsing Best Practice in Fluent Bit
Background and Overview
It is important to parse multiline log data using Fluent Bit because many log files contain log events that span multiple lines, and parsing these logs correctly can improve the accuracy and usefulness of the data extracted from them. When multiline logs are not properly parsed, it can result in errors, inconsistencies, and incomplete or inaccurate information in the data being extracted.
By accurately parsing multiline logs, users can gain a more comprehensive understanding of their log data, identify patterns and anomalies that may not be apparent with single-line logs, and gain insights into application performance and potential issues. This can help organizations troubleshoot and optimize their applications and infrastructure, improving reliability and reducing downtime.
This blog post would be the third and last section of our Fluent Bit use case as shown below.
Multiline Parsing in Fluent Bit
↑ This blog will cover this section!
System Environments for this Exercise
The system environment used in the exercise below is as following:
CentOS8
Fluent Bit v2.0.6
VM specs: 2 CPU cores / 2GB memory
Exercise
The directory structure would remain the same as the two exercises in our earlier blog posts:
/fluentbit : root directory |--- conf |--- custom_parsers.conf |--- Lab01 |-- (Lab01 configuration files) |-- sample |-- (Sample log files for exercise) |--- log |--- buffer
In the previous blog post, we parsed log data that had multiple lines in the same format using regular expression (“regex”). Having similar formats in all the lines made it relatively easy to parse the log data.
However in some cases, you would like to merge multiple log lines into a single line. The following Fluentd log file for instance, has stack trace messages from line #3 to #22. These lines should be treated as a single log event to make log message meaningful. This is where ‘Multiline Parsing’ feature comes in.
sample02_multiline.txt (Fluentd log file example)
2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0' 2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0' 2022-10-21 23:42:04 +0000 [warn]: For security reason, setting private_key_passphrase is recommended when cert_path is specified /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT) from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>' from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>' from /opt/fluentd/bin/fluentd:25:in `load' from /opt/fluentd/bin/fluentd:25:in `<main>' 2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.2.9' 2022-10-21 23:42:04 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
The steps to enable ‘multiline parsing’ feature is almost the same with custom parsing. The first step is to create the custom regex for the first line and also define the parsing rule. Here is the parser rule for the Fluentd log shown earlier:
[PARSER] Name FLUENTD_LOG Format regex Regex /^(?<time>[^ ]* {1,2}[^ ]* [^ ]*)\s+\[(?<level>[\s\w]*)\]\:\s+(?<message>.*)$/ Time_Key time Time_Format %Y-%m-%d %H:%M:%S Time_Keep On
Then configure ‘multiline parsing’ settings in the tail section.
‘Multiline’ : ‘On’ to enable multiline parsing feature
‘Parser_Firstline’ : Specify parsing rule for the first line
[INPUT] Name tail Tag linux.messages Path /fluentbit/conf/Lab4/sample/sample02_multiline.txt Storage.type filesystem Read_from_head true #DB /fluentbit/tail_linux_messages.db Multiline On Parser_Firstline FLUENTD_LOG
The whole sample of the configuration file is as below:
sample03_flb_tail_multiline_parser.conf
[SERVICE] ## General settings Flush 5 Log_Level Info Daemon off Log_File /fluentbit/log/fluentbit.log Parsers_File /fluentbit/conf/custom_parsers.conf ## Buffering and Storage Storage.path /fluentbit/buffer/ Storage.sync normal Storage.checksum Off Storage.backlog.mem_limit 5M Storage.metrics On ## Monitoring (if required) HTTP_Server true HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On HC_Errors_Count 5 HC_Retry_Failure_Count 5 HC_Period 60 [INPUT] Name tail Tag linux.messages Path /fluentbit/conf/Lab01/sample/sample02_multiline.txt Storage.type filesystem Read_from_head true #DB /fluentbit/tail_linux_messages.db Multiline On Parser_Firstline FLUENTD_LOG [OUTPUT] Name stdout Match linux.messages
Let's run Fluent Bit with the sample configuration.
Run Fluent Bit
$ fluent-bit -c sample03_flb_tail_multiline_parser.conf
Check the output
As you can see, the stack traces from line #3 to #22 in the original file were merged into a single event as expected.
[0] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-utmpx' version '0.5.0'"}] [1] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-webhdfs' version '1.5.0'"}] [2] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"warn", "message"=>"For security reason, setting private_key_passphrase is recommended when cert_path is specified /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `read': No such file or directory @ rb_sysopen - ./cert/fluent01.key.pem (Errno::ENOENT) from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:89:in `cert_option_load' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/cert_option.rb:65:in `cert_option_server_validate!' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/server.rb:330:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin/in_forward.rb:102:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/plugin.rb:187:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:320:in `add_source' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:161:in `block in configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `each' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/root_agent.rb:155:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:105:in `configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/engine.rb:80:in `run_configure' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/supervisor.rb:668:in `run_supervisor' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/lib/fluent/command/fluentd.rb:356:in `<top (required)>' from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from <internal:/opt/fluentd/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from /opt/fluentd/lib/ruby/gems/3.0.0/gems/fluentd-1.14.6/bin/fluentd:15:in `<top (required)>' from /opt/fluentd/bin/fluentd:25:in `load' from /opt/fluentd/bin/fluentd:25:in `<main>'"}] [3] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-splunk-hec' version '1.2.9'"}] [4] linux.messages: [1666395724.000000000, {"time"=>"2022-10-21 23:42:04 +0000", "level"=>"info", "message"=>"gem 'fluent-plugin-systemd' version '1.0.5'% "}]
Congratulations! You finished the last exercise of this use case.
In this blog we shared one of the simplest ways to parse log data using Fluent Bit. When ‘Multiline On’ is set in the ‘[INPUT]’ section, just like in this blog, Fluent Bit will apply the same multiline configuration to all logs coming through that input. This means that all logs in that input will be parsed using the same pattern, regardless of their content or format. On the other hand, if you use ‘[MULTILINE_PARSER]’ section to parse your data, which is another option you could use to parse data, you can define multiple parsing rules for different log formats or sources. This allows you to have more fine-grained control over how the logs are parsed, and apply different parsing configurations to different inputs. For example, you can define one parsing rule for Apache logs and another one for Nginx logs, each with its own pattern and configuration.
Want to learn more? - Let’s get in touch.
In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.