Enriching Data

After converting the raw data into the desired JSON format (see Working with Data), data is then enriched: reshaped and annotated so it can be consumed and stored more easily.

Fields with Target-specific Values

Data needs to be tagged with its location at source. There are standard context variables which are different for each target, and more can be added using to the pipe contexts:

- add:
    output-fields:
    - site: '{{name}}'
    - pipe: '{{pipe}}'

Generated Fields

All data needs a timestamp - this is the processing time at the target:

- time:
    output-field: '@timestamp'

A sequence number can be added to each event (although this is not persistent across restarts):

- script:
    let:
    - seq: 'count()'

A better method would be to use uuid() which would give a unique id.

Calculated Fields

The script action can be used to calculate values for fields. For example, using script let one can anonymize data using hash functions.

script:
    let:
    - name_hash: md5(name)
    - address_hash: md5(address)
remove:
    - name
    - address

This can be used to clean up data that will be stored and further processed outside your private networks.

Hashes are one-way functions, but encryping sensitive fields with encrypt() can be done.

Conditional Fields

If condition is defined, then script will only add fields if the condition is true.

When adding literal strings, it is easier to use set than let. Since add and script never overwrite existing fields by default, this snippet has the effect of adding the field quality to the event with "good" if the field a value > 1, "bad" otherwise.

- script:
    condition: a > 1
    set:
    - quality: good
- script:
    set:
    - quality: bad

The cond function provides a more elegant solution:

- script:
    let:
    - quality: cond(a > 1,"good","bad")

Table Lookup

enrich is an efficient and general way to enrich data with tables read from a CSV file. In the simplest case, if the value of an event matches a column then we take another column's value on the same row as a new field.

id,name
23,Alice
12,Bob
13,John

So if iden in the event is the same as id in the table, then we will set nice_name to the value of name.

# input: {"iden":12}
# output: {"iden":12,"nice_name":"Bob"}
- enrich
  - lookup-file: names.csv
    match:
    - type: num
      event-field: iden
      lookup-field: id
    add:
        event-field: nice_name
        lookup-field: name

You have to specify a type for the match:

  • str text values
  • num numbers
  • ip IPv4 addresses
  • cidr IPv4 address ranges, like '192.168.1.0/16')
  • num-list separated by commas, like '10,20,30'
  • str-list separated by commas, like 'office,home'
  • num-range ranges, like '10-23'

There can be multiple matches that must be satisfied.