Logging
Structured Audit Logging
Hotrod is a distributed system with remote structured logging, saved in a database. The various log commands can help you track the server, the targets and the pipes running on those targets.
We'll start with a simple pipe:
# uptimex.yml
name: uptimex
context:
interval: 1s
input:
exec:
command: uptime
interval: '{{interval}}'
actions:
- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
- convert:
- m1: num
- m5: num
- m15: num
- script:
let:
- k: count()
- filter:
condition: k % 2 == 0
output:
exec:
command: cat
Assume we have a target, have loaded this pipe, and attached it to the target:
$ hotrod targets add test0 --id 0
$ hotrod pipes add --file uptimex.yml
$ hotrod targets update test0 --add-pipe uptimex
[INFO] user=admin adding pipe=uptimex to target=test0
At this point, the server logs look like this - you have an audit trail detailing these operations and the user responsible.
$ hotrod daemon logs
timestamp | level | message
---------------------+-------+--------------------------------------------------
2020-06-16T14:17:50 | INFO | user=admin successful login from 127.0.0.1:38212
2020-06-16T14:17:59 | INFO | user=admin assigning id=0 to target='test0'
2020-06-16T14:18:22 | INFO | user=admin adding pipe=uptimex
2020-06-16T14:18:22 | INFO | user=admin rebuilding targets: []
2020-06-16T14:18:22 | INFO | user=admin rebuilding targets: []
2020-06-16T14:18:36 | INFO | user=admin adding pipe=uptimex to target=test0
The target logs show the pipe being attached:
$ hotrod targets logs
timestamp | target | level | message
---------------------+--------+-------+----------------------------------------
2020-06-16T14:18:13 | test0 | INFO | current []
2020-06-16T14:18:13 | test0 | INFO | allowed []
2020-06-16T14:18:13 | test0 | INFO | agent connected to hotrodd
2020-06-16T14:18:22 | test0 | INFO | current []
2020-06-16T14:18:22 | test0 | INFO | allowed []
2020-06-16T14:18:38 | test0 | INFO | New Pipe detected: hotrod-pipe-uptimex
2020-06-16T14:18:39 | test0 | INFO | current ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | allowed ["hotrod-pipe-uptimex"]
2020-06-16T14:18:39 | test0 | INFO | Pipes Updated...
And the pipe logs show the pipe being launched:
$ hotrod pipes logs
timestamp | target | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------+----------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
Pipe Logging
Logging from pipes themselves is not enabled by default. It is opt-in, on a target-by-target basis.
$ hotrod targets settings --enable-pipe-logging test0
Can now switch debug mode on for the pipe:
$ hotrod pipes trace uptimex --enable-debugging --target test0
The pipe may only be scheduled to run in a few minutes time, but you can always force a restart:
$ hotrod pipes trace uptimex --command restart --target test0
After a few seconds waiting for a few samples to collect, can inspect the pipe logs:
$ hotrod pipes logs
timestamp | target | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
Debug mode prints out the event that will be written out. In this case, events just go to the journal, but they may be sent using HTTP POST, plain TCP, etc. In any case, inspecting pipe data should not require shell access to remote devices. 'step' is the zero-based index of the pipe step, where the input is 0 and here the output is 5.
The field k
comes from the Lua function count()
which increments endlessly, and it is even because we use filter
to drop
out odd values.
Debugging needs to be switched off explicitly with --disable-debugging
.
Tracing gives a lot more insight into the operation of a pipe. It prints out the trace id which thereafter references the result of this trace.
$ hotrod pipes trace --enable-tracing uptimex -t test0
trace id 1
By default, this will trace 5 samples and then switch off. After a few seconds, pass the trace id to see the resulting pipe logs:
$ hotrod pipes logs --trace-id 1
timestamp | target | pipe | level | step | function | message
---------------------+--------+---------+-------+------+----------------+------------------------------------------------------------------------------------------------------------------------------
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:14 up 50 days, 21:20, 1 user, load average: 0.42, 0.44, 0.44"} -> [{"m1":"0.42","m5":"0.44","m15":"0.44"}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.42","m5":"0.44","m15":"0.44"} -> [{"m1":0.42,"m5":0.44,"m15":0.44}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.42,"m5":0.44,"m15":0.44} -> [{"m1":0.42,"m5":0.44,"m15":0.44,"k":4957}]
2020-06-16T15:41:14 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.42,"m5":0.44,"m15":0.44,"k":4957} -> []
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 1 | action-extract | {"_raw":" 15:41:15 up 50 days, 21:20, 1 user, load average: 0.39, 0.43, 0.44"} -> [{"m1":"0.39","m5":"0.43","m15":"0.44"}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 2 | action-convert | {"m1":"0.39","m5":"0.43","m15":"0.44"} -> [{"m1":0.39,"m5":0.43,"m15":0.44}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 3 | action-script | {"m1":0.39,"m5":0.43,"m15":0.44} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 4 | action-filter | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
2020-06-16T15:41:15 | test0 | uptimex | TRACE | 5 | output-exec | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}
...
This literally provides a step-by-step view of the data flow; we can see that when t
is 4957, it is not passed through
by action-filter
. In the next sample, data arrives at the output.
Traces can be enabled for pipes running on multiple targets. How can the outputs be distinguished? hotrod targets logs
and hotrod pipes logs
both have a --target
flag to specify one or more targets to search for.
Pipe Errors
The general principle with the pipe framework is: do not discard data. This leads to a tolerant way of viewing errors. But without good diagnostics this makes it hard to troubleshoot pipe problems, since we are often only one typo away from success.
Consider this pipe, with a flaw. The intention is to extract the number of days this particular machine has been up.
# err.yml
name: err
input:
exec:
command: uptime
interval: 2s
actions:
- extract:
warning: true
pattern: UP (\d+)
output-fields: [up]
- convert:
- up: num
output:
write: console
Why warning: true
? Because not matching a pattern is not always an error. However, we want to be sure in this case.
After adding this pipe and attaching it, here are the logs (with --pipe err
to exclude uptimex
logs):
$ hotrod pipes logs --pipe err
timestamp | target | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------
2020-06-16T16:10:19 | test0 | err | INFO | 0 | | systemd pipe started
2020-06-16T16:10:20 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:22 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:24 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:26 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:10:28 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
To get all the complaints, switch on debugging as before:
$ hotrod pipes logs --pipe err
timestamp | target | pipe | level | step | function | message
---------------------+--------+------+-------+------+----------------+----------------------------------------------------------------------------------
2020-06-16T16:06:14 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:14 up 50 days, 21:45, 1 user, load average: 0.59, 0.57, 0.51"}
2020-06-16T16:06:16 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:16 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:16 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:16 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
2020-06-16T16:06:18 | test0 | err | WARN | 1 | action-extract | extract: no captures with regex
2020-06-16T16:06:18 | test0 | err | WARN | 2 | action-convert | fields not found up
2020-06-16T16:06:18 | test0 | err | DEBUG | 3 | output-exec | {"_raw":" 16:06:18 up 50 days, 21:45, 1 user, load average: 0.94, 0.64, 0.53"}
We have two problems, flowing from the fact that 'UP' does not match 'up' - the pattern extraction, and then the failed conversion.
Again, a field not found in convert
might be fine, if we are dealing with events with different schemas.
So, the raw data - which may be precious - will always flow through.
Controlling Log Output
Logs have a level: trace, debug, warn, error. The minimum logging level is set with --level
(default 'trace').
Use --level error
to only show errors.
It is always possible to display logs in CSV format - this is one of the strengths of structured logging.
$ hotrod pipes logs --file csv
timestamp,target,pipe,level,step,function,message
2020-06-16T16:32:56,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:32:58,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:00,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:02,test0,err,WARN,1,action-extract,extract: no captures with regex
...
By default, you will get the last 25 records - use --num-records
to change this.
A start time can be specified, in a fairly relaxed format similar to that used by the Unix date
command:
$ hotrod pipes logs --start-time '16 June' --num-records 10
timestamp | target | pipe | level | step | function | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
2020-06-16T14:18:38 | test0 | uptimex | WARN | 0 | | pipe started
2020-06-16T14:18:39 | test0 | uptimex | INFO | 0 | | systemd pipe started
2020-06-16T14:42:49 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
2020-06-16T14:42:51 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
2020-06-16T14:42:53 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
2020-06-16T14:42:55 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1458}
2020-06-16T14:42:57 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1460}
2020-06-16T14:42:59 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1462}
2020-06-16T14:43:01 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1464}
2020-06-16T14:43:03 | test0 | uptimex | DEBUG | 5 | output-exec | {"m1":0.3,"m5":0.37,"m15":0.44,"k":1466}
$
Specifying a date implies the time starts from midnight - in this case the data was first created at 14:00.
Can also specify a time-range explicitly. Times without dates are assumed to refer to the current day:
$ hotrod pipes logs --start-time '14:00' --end-time '17:00' --file csv > logs.csv
Metrics and Volumes
$ hotrod daemon traffic
metrics 43.47 kiB
logs 2.80 MiB
unsent logs 0.00 B
tarballs sent 599.50 kiB
$ hotrod volumes
date | targets | bboxes | pipes | lines | volume
------------+---------+--------+-------+-------+------------
2020.06.17 | 2 | 0 | 3 | 3629 | 287.11 kiB
2020.06.16 | 1 | 0 | 2 | 22992 | 1.78 MiB