It is very straightforward to extract columnar output from `ps``:

name: ps
input:
    exec:
        command: ps ax --no-header -o pid,cputime,rss,comm
        interval: 2s
actions:
- extract:
    input-field: _raw
    remove: true
    pattern: '(\S+)\s+(\S+)\s+(\S+)\s+(.+)'
    output-fields: [PID,TIME,RSS,CMD]
- filter:
    exclude:
    - RSS: '0'
output:
    write: console

We pull out the columns with extract pattern and filter out kernel workers etc with filter exclude.

This happens every 2 seconds, so this is a lot of data. So this will show how to only respond to changes in values.

The first issue is that cputime is (DAY-)HOUR:MIN:SEC, so extract these. The 'day' may be empty!

- extract:
    input-field: TIME
    pattern: '^(\d+-)*(\d+):(\d+):(\d+)'
    output-fields: [day,hour,min,sec]
- add:
    output-fields:
    - day: 0
- convert:
  - RSS: num
  - day: num
  - hour: num
  - min: num
  - sec: num    

If the 'day' part is empty, then we will not capture the day field. In this case, the add step will provide a default value, since add by default never overwrites existing fields.

From these numbers, we can work out the total number of CPU seconds used by a process. We also get RSS in MiB. (script also does not overwrite by default, unless overwrite: true)

- script:
    overwrite: true
    let:
    - sec: sec + 60*(min + 60*(hour + 24*day))
    - RSS: round(RSS/1024)  # in Mb
- remove: [hour,min,day,TIME]

Now for the interesting part - we don't want to see processes unless they have used at least a second of CPU time. stream delta looks for changes in the field it is "watching", here sec. With only-changes: true the elapsed field will contain the time in milliseconds since the actual change occured.

- stream:
    operation: delta
    group-by: PID
    elapsed-field: 'elapsed'
    only-changes: true
    watch: sec 
- filter:
    condition: delta > 0

Once the initial data passes, you will only get events when a process has used more than one second.

The effective CPU utilitization for that process can now be calculated from delta and elapsed.

The full pipe definition:

name: ps
input:
    exec:
        command: ps ax --no-header -o pid,cputime,rss,comm
        interval: 2s
actions:
- extract:
    input-field: _raw
    remove: true
    pattern: '(\S+)\s+(\S+)\s+(\S+)\s+(.+)'
    output-fields: [PID,TIME,RSS,CMD]
- filter:
    exclude:
    - RSS: '0'
- extract:
    input-field: TIME
    pattern: '^(\d+-)*(\d+):(\d+):(\d+)'
    output-fields: [day,hour,min,sec]
- add:
    output-fields:
    - day: 0
- convert:
  - RSS: num
  - day: num
  - hour: num
  - min: num
  - sec: num
- script:
    overwrite: true
    let:
    - sec: sec + 60*(min + 60*(hour + 24*day))
    - RSS: round(RSS/1000)  # in Mb
- remove: [hour,min,day,TIME]
- stream:
    operation: delta
    group-by: PID
    elapsed-field: 'elapsed'
    only-changes: true
    watch: sec 
- filter:
    condition: delta > 0
    
output:
    write: console