Parsing in Fluent Bit using Regular Expression
Background and Overview
Without proper parsing, log data can be difficult to interpret and use effectively, leading to potential issues such as increased troubleshooting time or missed opportunities to detect important events. Parsing data in Fluent Bit is important because it allows the user to extract relevant information from unstructured or semi-structured log data. This extracted data can then be further processed, analyzed, and sent to various outputs. It helps to make sense of log data and turn it into valuable insights.
When parsing data, regular expressions allow users to define complex patterns that match specific parts of the log data, such as timestamps, error messages, or IP addresses. This provides a powerful and flexible way to extract and organize relevant information from log data, making it easier to understand, analyze, and act upon.
In this blog, as a second exercise of the use case of creating a flow using Fluent Bit and Fluentd, we will parse the obtained log data using regular expression. If you want to check out this use case from the beginning, where we covered the basics of the ‘tail’ plugin in Fluent Bit, feel free to check out the first blog from the link below.
Parsing in Fluent Bit using Regular Expression
↑ This blog will cover this section!
System Environments for this Exercise
The system environment used in the exercise below is as following:
CentOS8
Fluent Bit v2.0.6
VM specs: 2 CPU cores / 2GB memory
Directory Structure and Log file
Just like in our first blog, we will use ‘tail’ plugin in Fluent Bit to obtain data from a typical Linux log file. The directory structure and log file will remain the same as well.
Here is the directory structure of this case:
Here is the sample Linux log we’ll try to parse.
sample01_linux_messages.txt:
Now let’s move on to the exercise to parse this data!
Exercise
In the previous exercise, Fluent Bit just read the log messages from the target file and flushed them to stdout. The whole message is nested under the ‘log’ key by default but you may want to capture specific information in some cases. In the following log there are information such as:
Timestamp : Oct 27 16:14:31
Hostname : fluent01
Ident : systemd
Process ID : 1
Message Body : Started dnf makecache.
Step 1: Create Regular Expression
Here is a way to define custom parsing rules with regular expression (“regex”). A sample regex and an output for the log pattern are shown below.
Regex:
Output:
Rubular and Regex101 are useful to try out your regular expressions.
Step 2: Define Custom Parser
Once your regex is ready, the next step is to define custom parser for Fluent Bit. We typically prepare ‘custom_parsers.conf’ and specify it in the ‘[SERVICE]’ section.
Here is a sample custom parser definition for Linux OS log messages. You can find the custom regex in the option ‘Regex.’
As described in our first blog, Fluent Bit uses timestamp based on the time that Fluent Bit read the log file, and that potentially causes a mismatch between timestamp in the raw messages. There are time settings, ‘Time_key,’ ‘Time_format’ and ‘Time_keep’ which are useful to avoid the mismatch.
‘Time_Key’ : Specify the name of the field which provides time information.
‘Time_Format’ : Specify the format of the time field so it can be recognized and analyzed properly.
‘Time_keep’ : By default when a time key is recognized and parsed, the parser will drop the original time field. By enabling this option Fluent Bit will not overwrite the timestamp with the time that the log message was processed, but instead keep the timestamp that was recorded in the log message itself.
In the sample parser, we capture timestamp as ‘time’ key and the time format is ‘Oct 27 16:14:31’ which can be normalized into ‘%b %d %H:%M:%S.’ For ways to normalize the time format, try checking out this page.
Step 3: Run Fluent Bit with Custom Parser
Let’s run Fluent Bit with the custom parser!
custom_parsers.conf
sample02_flb_tail_custom_parser.conf
Add ‘Parser syslog-messages’ in ‘tail’ section
Make sure that the custom parser file is specified properly, such as:
Parsers_File /fluentbit/conf/custom_parsers.conf
Let’s run Fluent Bit with the sample config
You might need to remove ‘DB’ files /fluentbit/tail_linux_messages.db to make Fluent Bit read the line from the beginning.
You can see parsed messages in stdout.
There are ‘time’, ‘host’, ‘ident’, ‘pid’ and ‘message’ keys as expected.
Also, the timestamp written by Fluent Bit is the same with the timestamp in the raw messages.
Great, you finished the second exercise! You now know how to parse the log messages using the ‘tail’ plugin and regex. In the next blog, you will learn how to parse log messages that have multiple log lines.
Need more help? - We are here for you.
In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and the knowledge of tips running Fluent Bit/Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.