Monitoring Watches

Set up monitoring and alerts of watchers

Once you have ArgusWatchers defined, you’re ready to start monitoring for notify events; perhaps you’ll even want to set up alerts on high priority events. There are generic logfiles included in both apps, and we provide out-of-the-box metrics handling with Prometheus so you’ll be able to receive time-series data that you can immediately monitor. Ultimately, it will be up to you to use your logging framework of choice to monitor the way you’re used to doing.

Topics

Logfiles

Using the glog logging framework that most Kubernetes-related projects also leverage, you can locate logfiles in both argusd and arguscontroller containers at /tmp/argusd.INFO and /tmp/arguscontroller.INFO. The argusd logs will show actual inotify events that were reported as they happened, where arguscontroller log will only include Kubernetes controller-related log messages such as receiving new watchers, connecting the controller to daemons via gRPC, and pods being created, deleted, and updated that a watcher cares about.

Monitoring with Prometheus

When running a master Prometheus server, in order to add the scraper that arguscontroller exposes, you will need to add the following definition to your prometheus.yml config:

scrape_configs:
  ...
  - job_name: 'argus'
    static_configs:
    - targets: ['argus-prometheus:2112']

After events are collected you will be able to see metrics streaming in around a couple specifications:

# Argus metrics (labels = arguswatcher, event, nodeName)
argus_events_total{arguswatcher="mywatcher",...}

# gRPC metrics (labels = grpc_method, grpc_service, grpc_type)
grpc_client_handled_total{grpc_service="argus.Argusd"...}
grpc_client_msg_received_total{grpc_service="argus.Argusd"...}
grpc_client_msg_sent_total{grpc_service="argus.Argusd"...}
grpc_client_started_total{grpc_service="argus.Argusd"...}

The metric argus_events_total is what you’ll be using to build your dashboards and alerting on priority events as they occur. The gRPC-related metrics are exposed to measure the performance between daemon and controller.

Alerting

If watching for changes on a critical section of your filesystem that should not receive any kind of event, one could set up an alert with Prometheus and Alertmanager. This could be accomplished using the increase() function, for example:

alert: PageOnModify
expr: increase(argus_events_total{arguswatcher="mywatcher",event="modify"}[1m]) > 0
labels:
  severity: page
...

Grafana Dashboards

With Prometheus as a data source in your Grafana deployment and once Prometheus starts scraping the argus-prometheus endpoint as a scrape target, you will be able to start using these collected metrics to build dashboards in Grafana.

We will provide some ready-to-go templates for Argus dashboards at a later time…