Working With Data

JSON

Generally all actions operate on valid JSON data. Each input line is a JSON document delimited by a line feed, which we call an 'event'. The JSON document is composed of keys followed by values, e.g. "key":"value". Values can be text, numbers or booleans (true or false). All numbers are stored as double-precision floating-point numbers (no 'integer or float' distinction).

Also generally, all inputs provide JSON data. A line of output is made into a JSON document like so: {"_raw":"the line"}. (There may be other fields as well, like with TCP/UDP inputs).

So the default output of exec with command uptime will be something like {"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}.

Using extract to Extract Fields using Patterns

This can be passed to the extract action like so:

- extract:
    input-field: _raw
    remove: true
    pattern: 'load average: (\S+), (\S+), (\S+)'
    output-fields: [m1, m5, m15]
# {"m1":"0.48","m5","0.39","m15","0.31"}

(If we did not say remove: true then the output event would still contain _raw.)

The important point is that usually the input data starts as JSON, and continues to be processed as JSON.

By default, extract is tolerant: if it cannot match data it will let it pass through unaltered, unless you say drop: true.

This is the most general way to convert data, and requires some familiarity with regular expressions. If possible, use expand for delimited data.

Number and Unit Conversion

extract does not automatically convert strings into numbers. That is the job of convert.

# {"m1":"0.48","m5","0.39","m15","0.31"}
- convert
  - m1: num
  - m5: num
  - m15: num
# {"m1":0.48,"m5",0.39,"m15",0.31}

Working with Raw Text with raw

Sometimes data needs to enter the Pipe as raw text.

Suppose there is a tool with output like this:

netter v0.1
copyright Netter Corp
output
port,throughput
1334,45552
1335,5666

Suppose also that we would like to treat it as CSV (and assume there's no --shutup flag). So we need to skip until that header line. After that, just wrap up as _raw for later processing.

I've put this text into netter.txt and run this pipe. We skip until the line that starts with "port,". raw: true to stop exec 'quoting' the line, raw-discard-until to skip, and raw-to-json to quote the line as JSON.

name: netter
input:
    exec:
        command: 'cat netter.txt'
        raw: true
actions:
- raw:
    discard-until: '^port,'
- raw:
    to-json: _raw
output:
    write: console
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}

Converting from CSV

Once input data is in this form, can use expand to convert CSV data.

# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}
- expand:
    input-field: _raw
    remove: true
    csv:
        header: true
# {"port":1334,"throughput":45552}
# {"port":1335,"throughput":5666}

Using an existing header is convenient but the actual types of the fields are worked out by auto-conversion. This may not be what you want.

Alternatively, fields will specify the name and the type of the columns. The allowed types are "str", "num", "null" or "bool". Finally, field-file is a file containing "name:type" lines. Provide either fields or field-file.

Headers may also be specified as a field header-field containing the column names separated by the deimiter. If header-field-types: true then the format is 'name:type'.

This header-field only needs to be specified at the start, but can be specified again when the schema changes (i.e. names and/or types of columns changes.). collapse with header-field-on-change: true will write events with this format.

In the total absence of any column information, we can use gen_headers and the column names will be "_0", "_1", etc.

Some formats use a special marker to indicate null fields, like "-"; this is the purpose of nul, which is an array.

If the fields were separated with space then we would add delim: ' ' to the csv section. (This is a special case and will skip any whitespace between fields.) '\t' is also understood for tab-separated fields.

So expand takes a field containing some data, and converts it into JSON, possibly removing the original field.

Converting from Key-Value Pairs

A fairly popular data format is 'key-value pairs'.

# {"_raw":"a=1 b=2"}
- expand:
    input-field: _raw
    remove: true
    delim: ' '
    key-value:
        autoconvert: true
# output: {"a":1,"b":2}

Can also set the separator between the key and the value:

# {"_raw":"name:\"Arthur\",age:42"}
- expand:
    input-field: _raw
    remove: true
    delim: ','
    key-value:
        autoconvert: true
        key-value-delim: ':'
# output: {"name":"Arthur","age":42}

Working with Input JSON

If a field contains quoted JSON, then expand with json: true will parse and extract the fields, merging with the existing event.

Another option is expand events. This is different, because it converts one event into multiple events by splitting the value of input-field with the delimiter.

# json: {"family":"baggins","data":"frodo bilbo"}
- expand:
    input-field: data
    remove: true
    delim: ' '
    events:
        output-split-field: name
# output:
{"family":"baggins","name":"frodo"}
{"family":"baggins","name":"bilbo"}

Output as Raw

Generally we pass on the final events as JSON, but sometimes the situation requires lines which are more unstructured. For instance, 'classic' Hotrod 2 pipes have their output captured by systemd, passed to the server through rsyslog, unpacked using logstash and routed into Elasticsearch.

To send events back using this route, you will need to prepend the event with "@cee: " using the raw action.

Like so, as the final action:

- raw:
    extract:
        replace: "@cee: $0"

($0 is the full match over the whole line.)

Outputs usually receive events as JSON documents separated by lines (so-called 'streaming JSON'), but this is not essential - single lines of text can be passed in most cases.

But creating and passing multi-line data is possible.

With add, if template-result-field is provided, then the template can be in some arbitrary format like YAML (note the ${field} expansions.)

# {"one":1,"two":2}
- add:
    template-result-field: result
    template: |
        results:
            one: ${one}
            two: ${two}
# {"one":1,"two": 2,"result":"results:\n    one: 1\n    two: 2\n"}

Say you need to POST this arbitrary data to a server - then set body-field to be the 'result' field:

output:
    http-post:
        body-field: result
        url: 'http://localhost:3030'

Simularly, exec has input-field:

input:
    text: '{"name":"dolly"}'
actions:
- time:
    output-field: tstamp
- add:
      template-result-field: greeting
      template: |
            time: ${tstamp}
            hello ${name}
            goodbye ${name}
            ----------------
output:
    exec:
        command: 'cat'
        input-field: greeting
# output
time: 2019-02-19T09:27:03.943Z
hello dolly
goodbye dolly
----------------

The command itself can contain field expansions, like ${name}.

Assume there is also a field called 'file', then the document will be appended to that file:

output:
    exec:
        command: 'cat >> ${file}'
        input-field: greeting