Link

Send Root Cause Detections to your Datadog Dashboards

Note: In addition to the integration described below, we have also built a custom Datadog Widget. Select Integrations in your Datadog UI and search for Zebrium for more details (contact Zebrium for further information).

Integration Overview

  1. Create an API Key in Datadog.
  2. Create a Datadog Integration in Zebrium using the information from step 1.
  3. Add Zebrium Root Cause Report events and Log metrics to your Datadog Dashboard.

Integration Details

STEP 1: Create an API Key in Datadog

  1. From the Main Navigation panel in Datadog, hover over your Datadog Login Name and select Organization Settings.
  2. Click on API Keys.
  3. Click the + New Key button.
  4. Enter a Name for the API Key and click Create Key.
  5. Copy and save the Key for use in STEP 2.

STEP 2: Create a Datadog Integration in Zebrium to Send Detections to Datadog

  1. From the User menu area in Zebrium, click on the Settings (hamburger) Menu.
  2. Select Integrations.
  3. Scroll to the Observability Dashboards section and click on Datadog Events and Metrics.
  4. Click on the Create a New Integration button.
  5. Click on the General tab.
  6. Enter an Integration Name for this integration.
  7. Select the Deployment for the integration.
  8. Select the Service Group(s) for the integration.
  9. Click on the Send Detections tab.
  10. Click on the Enabled button.
  11. Enter the API Key created in STEP 1 above.
  12. Click the Save button.

STEP 3: Add Zebrium Root Cause Report Detections and Log count metrics to any of your Datadog Dashboards

Zebrum sends events and metrics to Datadog as follows:

  1. Events - events are sent each time a Zebrium Root Cause Report Detection occurs.
  2. Metrics - metrics are sent for counts of all log events, error log events and anomaly log events

Visualizing Zebrium Data in Datadog

Here is a sample Chart visualization showing:

  1. A Root Cause Finder panel that displays a vertical bar whenever a Zebrium detection occurs. This allows you to easily see detections that are aligned with other metrics on your dashboards.
  2. A Root Cause Reports Summary panel that list summary information for each Zebrium detection.

Here is a screen shot showing the definition of the Root Cause Finder Panel:

Here is a screen shot showing the definition of the Root Cause Reports Summary Panel:

Table of Important Metric Names

Metric Name Description
zebrium.logs.all.count Count of all log events received in a one minute duration (per service_group and deployment)
zebrium.logs.anomalies.count Count of anomaly log events received in a one minute duration (per service_group and deployment)
zebrium.logs.errors.count Count of error log events received in a one minute duration (per service_group and deployment)
ze_service_group Zebrium service group name for the corresponding metric or event
ze_deployment Zebrium deployment name for the corresponding metric or event
ze_significance Significance of the Root Cause Report (low, medium or high)

Support

If you need help with this integration, please contact Zebrium by sending email to support@zebrium.com