Sanitizer - Mask sensitive data on the fly

February 10, 2021 Engineering

System operation with sensitive information

Huge amounts of data has been collected from applications, systems and IoT devices. Collected data is consumed for several purposes such as AI and big data to get insights. For system operators, providing data accessibility for such applications is an important task. At the same time, system operators need to consider the security since even log data from applications might contain sensitive information such as user credentials and confidential system information.

One of the approaches to make data secure is to mask sensitive information. Masking sensitive information helps you to solve security incidents such as data breach and exfiltration by making data meaningless for attackers. Masking sensitive information is also beneficial for internal consumers by allowing data sharing with authorized users without risk of unexpected data exposure.

Combination of Fluentd and Sanitizer delivers capability to mask sensitive information in data on the fly. With Fluentd, you can have great flexibility in parsing, filtering and routing incoming data. Sanitizer works with Fluentd as a filter plugin which allows you to mask sensitive data with custom rules such as regular expressions and keywords. Sanitizer also has built-in options to mask IP address and FQDN which is useful for secure system operations.

Sanitizer

Sanitizer, “fluent-plugin-sanitizer“, is Fluentd filter plugin to mask sensitive information. With Sanitizer, you can mask information based on key-value pairs on the fly in between Fluentd processes. Sanitizer provides options which enable you to mask values with custom rules. In custom rules, you can specify patterns such as IP addresses, hostnames in FQDN style, regular expressions and keywords. In terms of IP addresses and hostnames, Sanitizer delivers useful options which allows you to easily mask IP addresses and hostnames in complex messages.

Here are examples of how each option works.

Option : pattern_ipv4

Example#1 : “192.168.10.10” → “IPv4_{hash}”
Example#2 : “https://192.168.10.10:8000/event” → “https://IPv4_{hash}:8000/event”
Example#3 : “Access from 192.168.10.10 was blocked” → “Access from IPv4_{hash1} was blocked”

Option : pattern_fqdn

Example#1 : “app.demo.com” → “FQDN_{hash}”
Example#2 : “SASL_SSL://app.demo.com:9092” → “SASL_SSL://FQDN_{hash}:9092”
Example#3 : “app1.demo.com, app2.demo.com” → “FQDN_{hash1}, FQDN_{hash2}”

Option : pattern_regex

Example#1 : SSN “123-45-6789“ → “SSN_{hash}“
Example#2 : Phone number “123-456-7890“ → “Phone_{hash}“

Option : pattern_keyword

Example#1 : Custom keyword “password“ → “Passwd_{value}“
Example#2 : Custom keyword “system“ → “System_{value}“

How to run Sanitizer

Installation

Sanitizer is provided as RubyGem package and you can install sanitize plugin on both td-agent and OSS Fluentd easily. You can find details about dependencies in RubyGem web page.

### For td-agent user :
$ td-agent-gem install fluent-plugin-sanitizer

### For OSS Fluentd user :
$ fluent-gem install fluent-plugin-sanitizer

Defined custom rules

Here are configuration parameters and options available in Sanitizer.

Configuration parameters and options

hash_salt (optional) : hash salt used when calculating hash value with original information.
rule options :
- keys (mandatory) : Name of keys whose values will be masked. You can specify multiple keys. When keys are nested, you can use {parent key}.{child key} like "kubernetes.master_url".
- pattern_ipv4 (optional) : Mask IP addresses in IPv4 format. You can use “true” or “false”.
- pattern_fqdn (optional) : Mask hostname in FQDN style. You can use “true” or “false”.
- pattern_regex (optional) : Mask value mactches custom regular expression. You need provide regular expression in this options.
- pattern_regex_prefix (optional) : Define prefix used for masking vales.
- pattern_keywords (optional) : Mask values match custom keywords. You can specify multiple keywords.
- pattern_keywords_prefix (optional) : Define prefix used for masking vales.

You can specify multiple rules in a single configuration. It is also possible to define multiple pattern options in a single rule like following sample.

<filter **>
    @type sanitizer
    hash_salt mysalt
    <rule>
        keys source, kubernetes.master_url
        pattern_ipv4 true
        pattern_fqdn true
    </rule>
    <rule>
        keys hostname, host
        pattern_fqdn true
    </rule>
    <rule>
        keys message
        pattern_regex /^Hello World!$/
        pattern_keywords password, passwd
    </rule>
</filter>

Use cases

Mask IP addresses and Hostnames

Masking IP addresses and hostnames is one of the typical use cases of security operation. You just need to specify name of keys potentially have IP addresses and hostnames in value. Here is configuration sample as well as input and output samples.

Configuration sample

<filter **>
    @type sanitizer
    hash_salt mysalt
    <rule>
        keys ip
        pattern_ipv4 true
    </rule>
    <rule>
        keys host
        pattern_fqdn true
    </rule>
    <rule>
        keys system.url, system.log
        pattern_ipv4 true
        pattern_fqdn true
    </rule>
</filter>

Input sample

{
     "ip" : "192.168.10.10",
     "host" : "test01.demo.com",
      "system" : {
         "url" : "https://test02.demo.com:8000/event",
         "log" : "access from 192.168.10.100 was blocked"
     }
  }

Output sample

{
     "ip" : "IPv4_94712b06963e277fe28469388323665d",
     "host" : "FQDN_37de34e3d799de477c742d8d7bb35550",
     "system" : {
         "url" : "https://FQDN_e9a59624f555d02f06209c9942dded19:8000/event"
         "log" : "access from IPv4_f7374d61e6d21dc1105f70358a5f8e8f was blocked"
     }
 }

Mask words match custom keyword and regular expression

In case log messages including sensitive information such as SSN and phone number, Sanitizer could also help you. If you know exact keyword need to be masked, you can use keyword option. You can also use regex option if you like to mask information which matches custom regular expression.

Configuration sample

<filter **>
    @type sanitizer
    hash_salt mysalt
    <rule>
        keys user.ssn
        pattern_regex /^(?!(000|666|9))\d{3}-(?!00)\d{2}-(?!0000)\d{4}$/
        pattern_regex_prefix SSN
    </rule>
    <rule>
        keys user.phone
        pattern_regex /^\d{3}-?\d{3}-?\d{4}$/
        pattern_regex_prefix Phone
    </rule>
</filter>

Input sample

{
     "user" : {
         "ssn" : "123-45-6789"
         "phone" : "123-456-7890"
     }
 }

Output sample

{
     "user" : {
         "ssn" : "SSN_f6b6430343a9a749e12db8a112ca74e9"
         "phone" : "Phone_0a25187902a0cf755627397eb085d736"
     }
 }

Tips : Debug how sanitizer works

When you design custom rules in configuration file, you might need information how Sanitizer masks original values into hash values for debugging purpose. You can check that information if you run td-agent/Fluentd with debug option enabled. The debug information is shown in log file of td-agent/Fluentd like following log message sample.

Log message sample

YYYY-MM-DD Time fluent.debug: {"message":"[pattern_regex] sanitize '123-45-6789' to 'SSN_f6b6430343a9a749e12db8a112ca74e9'"}
YYYY-MM-DD Time fluent.debug: {"message":"[pattern_regex] sanitize '123-456-7890' to 'Phone_0a25187902a0cf755627397eb085d736'"}

Now, you are ready to use Sanitizer with Fluentd. Happy logging !!

Commercial Service - We are here for you.

In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and knowledge of tips running Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.

See this content in the original post