Diagtool - automated Fluentd troubleshooting.
Introduction of “diagtool”
Diagtool is a simple, yet powerful tool to simplify troubleshooting process and also make it more secure. Diagtool delivers the functions you need to automate your process and accelerate the problem determination :
The code has been contributed in the Fluentd community. You can download instructions for Diagtool here.
Collect function :
Automate the data collection needed to diagnose the symptom of Fluentd and its base instance.
Collect not only the Fluentd configuration and log but also information about the OS such as resource usage, processes running in the instance and network connectivity.
Validate function :
Inspect and validate the OS parameters to check if they meet requirements.
Mask function :
Generate masks on sensitive information such as hostname, IP address and your defined key word.
This function is optional; you can enable or disable the function using a command option.
How to install diagtool
Installation
Diagtool is a RubyGem and easy to install with the installation tool of td-agent. Please make sure you use the “/usr/bin/td-agent-gem” command instead of the ordinary gem command.
# /usr/sbin/td-agent-gem install fluent-diagtool
Fetching fluent-diagtool-1.0.1.gem
Successfully installed fluent-diagtool-1.0.9
Parsing documentation for fluent-diagtool-1.0.9
Installing ri documentation for fluent-diagtool-1.0.9
Done installing documentation for fluent-diagtool after 0 seconds
1 gem installed
When using /usr/sbin/td-agent-gem command, fluent-diagtool is installed under "/opt/td-agent/embedded/lib/ruby/gems/2.4.0/bin/" directory. You can add that directory to $PATH in .bash_profile.
Otherwise, you can install Diagtool with common gem command. In this case, Ruby version higher than 2.3 might be required to install.
# gem install fluent-diagtool
Successfully installed fluent-diagtool-1.0.1
Parsing documentation for fluent-diagtool-1.0.1
Installing ri documentation for fluent-diagtool-1.0.1
Done installing documentation for fluent-diagtool after 0 seconds
1 gem installed
Command options
There are a few options in Diagtool. You can check the options of Diagtool with "--help" options. Diagtool performs the validation function in the process by default, but you can turn on/off the mask function depending on the use cases.
# fluent-diagtool --help
Usage: fluent-diagtool -o OUTPUT_DIR -m {yes | no} -w {word1,[word2...]} -f {listfile} -s {hash seed}
--precheck | Run Precheck (Optional)
-t, --type fluentd|fluentbit | Select the type of Fluentd (Mandatory)
-o, --output DIR | Output directory (Mandatory)
-m, --mask yes|no | Enable mask function (Optional : Default=no)
-w, --word-list word1,word2 | Provide a list of user-defined words which will to be masked (Optional : Default=None)
-f, --word-file list_file | provide a file which describes a List of user-defined words (Optional : Default=None)
-s, --hash-seed seed | provide a word which will be used when generate the mask (Optional : Default=None)
-c, --conf config_file | provide a full path of td-agent configuration file (Optional : Default=None)
-l, --log log_file | provide a full path of td-agent log file (Optional : Default=None)
Pre-check
In order to run Diagtool correctly, you must ensure that Diagtool can obtain the fundamental information of Fluentd. Essentially, Diagtool automatically parses the required information from the environment values of td-agent daemon. The pre-check option is useful to confirm if Diagtool has definitely collected the information as expected.
The following output example shows the case where Diatool properly collects the required information.
# fluent-diagtool --precheck -t fluentd
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] Fluentd Type = fluentd
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] Check OS parameters...
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] operating system = CentOS Linux 7 (Core)
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] kernel version = Linux 3.10.0-1127.10.1.el7.x86_64
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] Check td-agent parameters...
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] td-agent conf path = /etc/td-agent/
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] td-agent conf file = td-agent.conf
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] td-agent log path = /var/log/td-agent/
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] td-agent log = td-agent.log
2020-10-07 21:20:33 +0000: [Diagtool] [INFO] [Precheck] Precheck completed. You can run diagtool command without -c and -l options
In some cases, Diagtool, with custom command line options, may fail to identify the path of the Fluentd configuration and log files. In this case, you will need to specify this information manually with the “-c” and “-l” options.
The following example shows a pre-check returns failure resulting in Diagtool not being able to extract the path of td-agent configuration and log files.
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] Check OS parameters...
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] operating system = CentOS Linux 8 (Core)
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] kernel version = Linux 4.18.0-147.5.1.el8_1.x86_64
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] Check td-agent parameters...
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] td-agent conf path =
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] td-agent conf file =
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] td-agent log path =
2020-05-28 05:45:14 +0000: [Diagtool] [INFO] [Precheck] td-agent log =
2020-05-28 05:45:14 +0000: [Diagtool] [WARN] [Precheck] can not find td-agent conf path: please run diagtool command with -c {/path/to/td-agent conf file}
2020-05-28 05:45:14 +0000: [Diagtool] [WARN] [Precheck] can not find td-agent log path: please run diagtool command with -l {/path/to/td-agent log file}
Run Diagtool
Once the pre-check is completed, you are ready to run the tool. The “-o” is mandatory out of potential options and the output will be generated as a compressed file under the directory specified by the “-o“ option.
(*) If the pre-check results report that it can’t find “td-agent conf path” and “td-agent log path“, you need to use “-c“ and “-l” respectively to specify the file path manually.
# fluent-diagtool -t fluentd -o /tmp -w passwd1,passwd2 -m yes
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Parsing command options...
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Option : Output directory = /tmp
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Option : Mask = yes
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Option : Word list = ["passwd1", "passwd2"]
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Option : Hash Seed =
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] Initializing parameters...
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] Loading the environment parameters...
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] operating system = CentOS Linux 7 (Core)
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] kernel version = Linux 3.10.0-1127.10.1.el7.x86_64
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] td-agent conf path = /etc/td-agent/
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] td-agent conf file = td-agent.conf
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] td-agent log path = /var/log/td-agent/
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] td-agent log = td-agent.log
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] Collecting log files of td-agent...
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] Collecting config file of td-agent...
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] config file is stored in ["/tmp/20201007212928/etc/td-agent/td-agent.conf", "/tmp/20201007212928/etc/td-agent/http_fld_system.conf"]
2020-10-07 21:29:28 +0000: [Diagtool] [INFO] [Collect] Collecting td-agent gem information...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] td-agent gem information is stored in /tmp/20201007212928/output/tdgem_list.output
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting config file of OS log...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Mask] Masking OS log file : /tmp/20201007212928/var/log/messages...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] config file is stored in /tmp/20201007212928/var/log/messages.mask
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting date/time information...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] date/time information is stored in /tmp/20201007212928/output/chronyc_sources.txt
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting command output : command = ps -eo pid,ppid,stime,time,%mem,%cpu,cmd
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Mask] Masking command output file : /tmp/20201007212928/output/ps_-eo_pid_ppid_stime_time_%mem_%cpu_cmd.txt...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting command output ps stored in /tmp/20201007212928/output/ps_-eo_pid_ppid_stime_time_%mem_%cpu_cmd.txt.mask
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting command output : command = cat /proc/meminfo
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Mask] Masking command output file : /tmp/20201007212928/output/cat_-proc-meminfo.txt...
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting command output cat stored in /tmp/20201007212928/output/cat_-proc-meminfo.txt.mask
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Collect] Collecting command output : command = netstat -plan
2020-10-07 21:29:29 +0000: [Diagtool] [INFO] [Mask] Masking command output file : /tmp/20201007212928/output/netstat_-plan.txt...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Collecting command output netstat stored in /tmp/20201007212928/output/netstat_-plan.txt.mask
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Collecting command output : command = netstat -s
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Mask] Masking command output file : /tmp/20201007212928/output/netstat_-s.txt...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Collecting command output netstat stored in /tmp/20201007212928/output/netstat_-s.txt.mask
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Collecting systctl information...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] sysctl information is stored in /tmp/20201007212928/output/sysctl_-a.txt
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Validating systctl information...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_core_netdev_max_backlog => 5000 is correct (recommendation is 5000)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_core_rmem_max => 16777216 is correct (recommendation is 16777216)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_core_somaxconn => 1024 is correct (recommendation is 1024)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_core_wmem_max => 16777216 is correct (recommendation is 16777216)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_ip_local_port_range => ["10240", "65535"] is correct (recommendation is ["10240", "65535"])
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_tcp_max_syn_backlog => 8096 is correct (recommendation is 8096)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_tcp_rmem => ["4096", "12582912", "16777216"] is correct (recommendation is ["4096", "12582912", "16777216"])
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_tcp_slow_start_after_idle => 0 is correct (recommendation is 0)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_tcp_tw_reuse => 1 is correct (recommendation is 1)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Sysctl: net_ipv4_tcp_wmem => ["4096", "12582912", "16777216"] is correct (recommendation is ["4096", "12582912", "16777216"])
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Collecting ulimit information...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] ulimit information is stored in /tmp/20201007212928/output/sh_-c_'ulimit_-n'.txt
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] Validating ulimit information...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Valid] ulimit => 65536 is correct (recommendation is >65535)
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Mask] Masking td-agent config file : /tmp/20201007212928/etc/td-agent/td-agent.conf...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Mask] Masking td-agent config file : /tmp/20201007212928/etc/td-agent/http_fld_system.conf...
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Mask] Export mask log file : ./mask_20201007212928.json
2020-10-07 21:29:30 +0000: [Diagtool] [INFO] [Collect] Generate tar file /tmp/diagout-20201007212928.tar.gz
You can find the learn more about how Diagtool works in the following video cast.
Question?
You can always download instruction on how to use the Diagtool here.
Also, feel free to contact us any time by making a request via Github or send any questions and requests for enhancement directly to us. We’d love to hear from you!
Where is the tool heading?
Today we support fluentd (td-agent core) for Linux. We are working to enhance the coverage to Windows OS as well as FluentBit and plugins. We are always welcoming more people to contribute the tool.
Commercial Service - We are here for you.
In the Fluentd Subscription Network, we will provide you consultancy and professional services to help you run Fluentd and Fluent Bit with confidence by solving your pains. Service desk is also available for your operation and the team is equipped with the Diagtool and knowledge of tips running Fluentd in production. Contact us anytime if you would like to learn more about our service offerings.