Fresh/Brewed

When Systems Fail

This is not a sharp picture - not even that good. I was flying early in the morning friday July 19th when I realized something was really wrong. The ticket counter was in a flurry of checking lists and creating paper tickets. I would learn within 30m that Spirit Airlines was SOL in the airport and the TSA was checking my paper ticket against printed manifests. I couldn't be prouder of my own teams at work for having dug in and weathered the Crowdstrike storm - it's a testament of having skilled, available and committed SREs. By the time I landed and was back at my desk to help, the main production systems were healthy. I can't really speak to the rest - but my day wrapped later than usual friday but it was an amazing team effort and really, I couldn't be prouder of the people I work with at WellSky
Ansible: Fixing k3s Reset

Ansible: Fixing k3s Reset

Published May 5, 2023 by Isaac Johnson

I had grand plans for a telemetry deep dive today. But things just kept not sorting out for me. But sometimes, it’s in the problems we can find something worthwhile to share.

Spacelift.io

Spacelift.io

Published May 2, 2023 by Isaac Johnson

A colleague pointed out spacelift.io as a potential solution for Ansible and Terraform. It has an always-free tier. Today we’ll set up an account, do a demo and try out context, worker pools, and...

SigNoz: TLS, Pagerduty and Webhooks

SigNoz: TLS, Pagerduty and Webhooks

Published Apr 27, 2023 by Isaac Johnson

In the last post, we focused on setting up a simple SigNoz system in the test cluster and trying Slack alert channels. Today, we’ll launch SigNoz in the main on-prem cluster, configure Ingress wit...

SigNoz: Setup, APM, Logs and Slack

SigNoz: Setup, APM, Logs and Slack

Published Apr 25, 2023 by Isaac Johnson

At its core, SigNoz is an open-source application performance monitoring (APM) and observability tool. Built on open standards like Open Telemetry, Jaeger and Prometheus it makes it easy to collec...

Prometheus and Grafana

Prometheus and Grafana

Published Apr 20, 2023 by Isaac Johnson

We all know that Prometheus and Grafana can be used to monitor Kubernetes workloads. But today we will explore the other features of Grafana/Prom to monitor Databases, Hosts and more.

Zabbix: Part 2: Configuration and Usage

Zabbix: Part 2: Configuration and Usage

Published Apr 18, 2023 by Isaac Johnson

We spoke of Zabbix last week showing setup and basic usage. Let’s continue that into a more full formed implementation.

Theme built by C.S. Rhymes