SigNoz: Setup, APM, Logs and Slack

Published: Apr 25, 2023 by Isaac Johnson

At its core, SigNoz is an open-source application performance monitoring (APM) and observability tool. Built on open standards like Open Telemetry, Jaeger and Prometheus it makes it easy to collect logs, traces and metrics as well as other insights into the health, performance and behavior of one’s stack.

There is, of course, a commercial side to SigNoz which has a team plan starting at $200/mo and adds in SSO and SAML as well as “SigNoz Cloud” as a hosted PaS option.

Pricing

There are two main paid plans - Team (US$200/mo) and Enterprise (US$1000/mo).

A quick breakdown

/content/images/2023/04/signoz-01.png

The pricing, for say logs, is comparable to other providers:

Vendor Free per mo price per Gb beyond addtional costs retention
SigNoz 0 $0.40 - 15-day
Datadog 0 $0.10 $1.70/million log events/mo 15-day
New Relic 100Gb $0.30 $0.50/gb for advanced features - 90d 30-days
Mezmo (LogDNA) 0 $1.30 $0.80/3-day to $1.80/30-day retention 3 to 30 (14d shown)
Sumo Logic 0 $3.00 free tier at 1Gb/day you set retention 7d to 365d (pay per Gb)
GCP Stackdriver (Logm Monitor, Operations Suite) 50Gb $0.50 1 mill monitored instances free, then $0.05c/instance beyond traces and logs combined
Azure Logging 5Gb/day $0.50 - $0.615 - basic types of logs vary: 0.50 pay as you go basic, $2.30gb for analytics logs, $0.005/gb for search queries 30d

As we will be looking into the self-hosted option, the only costs will be my local infrastructure.

Setup

We’ll follow the guide for ‘other’ kubernetes. AWS and GCP have other guides.

We’ll set a var to our Default Storage class

$ DEFAULT_STORAGE_CLASS=$(kubectl get storageclass -o=jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}')
$ echo $DEFAULT_STORAGE_CLASS
local-path

Now we will patch it to allow Volume Expansion

$ kubectl patch storageclass "$DEFAULT_STORAGE_CLASS" -p '{"allowVolumeExpansion": true}'
storageclass.storage.k8s.io/local-path patched

Now let’s add the SigNoz repo and update

$ helm repo add signoz https://charts.signoz.io
"signoz" has been added to your repositories

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "signoz" chart repository
...Successfully got an update from the "freshbrewed" chart repository
...Successfully got an update from the "myharbor" chart repository
...Successfully got an update from the "confluentinc" chart repository
...Successfully got an update from the "dapr" chart repository
...Successfully got an update from the "jfelten" chart repository
...Successfully got an update from the "ngrok" chart repository
...Successfully got an update from the "zabbix-community" chart repository
...Successfully got an update from the "adwerx" chart repository
...Successfully got an update from the "actions-runner-controller" chart repository
...Successfully got an update from the "novum-rgi-helm" chart repository
...Successfully got an update from the "kube-state-metrics" chart repository
...Successfully got an update from the "rhcharts" chart repository
...Successfully got an update from the "uptime-kuma" chart repository
...Successfully got an update from the "sumologic" chart repository
...Successfully got an update from the "hashicorp" chart repository
...Successfully got an update from the "castai-helm" chart repository
...Successfully got an update from the "azure-samples" chart repository
...Successfully got an update from the "rook-release" chart repository
...Successfully got an update from the "kuma" chart repository
...Successfully got an update from the "longhorn" chart repository
...Successfully got an update from the "epsagon" chart repository
...Successfully got an update from the "nginx-stable" chart repository
...Successfully got an update from the "lifen-charts" chart repository
...Successfully got an update from the "kubecost" chart repository
...Successfully got an update from the "sonarqube" chart repository
...Successfully got an update from the "elastic" chart repository
...Successfully got an update from the "open-telemetry" chart repository
...Successfully got an update from the "harbor" chart repository
...Successfully got an update from the "datadog" chart repository
...Successfully got an update from the "argo-cd" chart repository
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "crossplane-stable" chart repository
...Successfully got an update from the "newrelic" chart repository
...Successfully got an update from the "incubator" chart repository
...Successfully got an update from the "gitlab" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "rancher-latest" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈

We can now install it into a new “Platform” namespace

$ helm --namespace platform install my-release signoz/signoz --create-namespace
coalesce.go:175: warning: skipped value for zookeeper.initContainers: Not a table.
NAME: my-release
LAST DEPLOYED: Sun Apr 23 11:29:57 2023
NAMESPACE: platform
STATUS: deployed
REVISION: 1
NOTES:
1. You have just deployed SigNoz cluster:

- frontend version: '0.18.2'
- query-service version: '0.18.2'
- alertmanager version: '0.23.0-0.2'
- otel-collector version: '0.66.7'
- otel-collector-metrics version: '0.66.7'

2. Get the application URL by running these commands:

  export POD_NAME=$(kubectl get pods --namespace platform -l "app.kubernetes.io/name=signoz,app.kubernetes.io/instance=my-release,app.kubernetes.io/component=frontend" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:3301 to use your application"
  kubectl --namespace platform port-forward $POD_NAME 3301:3301


If you have any ideas, questions, or any feedback, please share on our Github Discussions:
  https://github.com/SigNoz/signoz/discussions/713

It takes a few minutes to spin up

$ kubectl get pods -n platform
NAME                                                        READY   STATUS     RESTARTS   AGE
my-release-signoz-frontend-85d47bcc96-xctlr                 0/1     Init:0/1   0          2m6s
my-release-signoz-query-service-0                           0/1     Init:0/1   0          2m6s
my-release-signoz-otel-collector-metrics-5f57c54f9f-z25kj   0/1     Init:0/1   0          2m6s
my-release-signoz-otel-collector-565477699c-2b5zj           0/1     Init:0/1   0          2m6s
my-release-signoz-alertmanager-0                            0/1     Init:0/1   0          2m6s
my-release-k8s-infra-otel-agent-b9p6b                       1/1     Running    0          2m6s
my-release-k8s-infra-otel-deployment-7ffd8dfbfb-kfrg4       1/1     Running    0          2m6s
my-release-k8s-infra-otel-agent-959f9                       1/1     Running    0          2m6s
my-release-k8s-infra-otel-agent-j45tp                       1/1     Running    0          2m6s
my-release-clickhouse-operator-5c5c5ddcf7-hs8ch             2/2     Running    0          2m6s
my-release-zookeeper-0                                      1/1     Running    0          2m6s
chi-my-release-clickhouse-cluster-0-0-0                     1/1     Running    0          66s

I was surprised to see it was based on busybox. Seems a strange choice for a platform base

$ kubectl describe pod my-release-signoz-frontend-85d47bcc96-xctlr -nplatform | tail -n7
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m57s  default-scheduler  Successfully assigned platform/my-release-signoz-frontend-85d47bcc96-xctlr to anna-macbookair
  Normal  Pulling    2m57s  kubelet            Pulling image "docker.io/busybox:1.35"
  Normal  Pulled     2m49s  kubelet            Successfully pulled image "docker.io/busybox:1.35" in 7.526860406s
  Normal  Created    2m49s  kubelet            Created container my-release-signoz-frontend-init
  Normal  Started    2m49s  kubelet            Started container my-release-signoz-frontend-init

We can see it coming up

$ kubectl get pods -n platform
NAME                                                        READY   STATUS     RESTARTS   AGE
my-release-signoz-frontend-85d47bcc96-xctlr                 0/1     Init:0/1   0          4m22s
my-release-signoz-alertmanager-0                            0/1     Init:0/1   0          4m22s
my-release-k8s-infra-otel-agent-b9p6b                       1/1     Running    0          4m22s
my-release-k8s-infra-otel-deployment-7ffd8dfbfb-kfrg4       1/1     Running    0          4m22s
my-release-k8s-infra-otel-agent-959f9                       1/1     Running    0          4m22s
my-release-k8s-infra-otel-agent-j45tp                       1/1     Running    0          4m22s
my-release-clickhouse-operator-5c5c5ddcf7-hs8ch             2/2     Running    0          4m22s
my-release-zookeeper-0                                      1/1     Running    0          4m22s
chi-my-release-clickhouse-cluster-0-0-0                     1/1     Running    0          3m22s
my-release-signoz-query-service-0                           1/1     Running    0          4m22s
my-release-signoz-otel-collector-metrics-5f57c54f9f-z25kj   1/1     Running    0          4m22s
my-release-signoz-otel-collector-565477699c-2b5zj           1/1     Running    0          4m22s

$ kubectl get pods -n platform
NAME                                                        READY   STATUS    RESTARTS   AGE
my-release-k8s-infra-otel-agent-b9p6b                       1/1     Running   0          8m20s
my-release-k8s-infra-otel-deployment-7ffd8dfbfb-kfrg4       1/1     Running   0          8m20s
my-release-k8s-infra-otel-agent-959f9                       1/1     Running   0          8m20s
my-release-k8s-infra-otel-agent-j45tp                       1/1     Running   0          8m20s
my-release-clickhouse-operator-5c5c5ddcf7-hs8ch             2/2     Running   0          8m20s
my-release-zookeeper-0                                      1/1     Running   0          8m20s
chi-my-release-clickhouse-cluster-0-0-0                     1/1     Running   0          7m20s
my-release-signoz-query-service-0                           1/1     Running   0          8m20s
my-release-signoz-otel-collector-metrics-5f57c54f9f-z25kj   1/1     Running   0          8m20s
my-release-signoz-otel-collector-565477699c-2b5zj           1/1     Running   0          8m20s
my-release-signoz-frontend-85d47bcc96-xctlr                 1/1     Running   0          8m20s
my-release-signoz-alertmanager-0                            1/1     Running   0          8m20s

We can now port-forward to our SIgNoz instance

$ kubectl --namespace platform port-forward svc/`kubectl get svc --namespace platform -l "app.kubernetes.io/component=frontend" -o jsonpath="{.items[0].metadata.name}"` 3301:3301
Forwarding from 127.0.0.1:3301 -> 3301
Forwarding from [::1]:3301 -> 3301
Handling connection for 3301
Handling connection for 3301
Handling connection for 3301
Handling connection for 3301
...

/content/images/2023/04/signoz-02.png

I’ll put in some details

/content/images/2023/04/signoz-03.png

We can now see the dashboard

/content/images/2023/04/signoz-04.png

Logs

Right off, we can see logs collected

/content/images/2023/04/signoz-05.png

I can use filters. For instance, we can filter to just a namespace k8s_namespace_name CONTAINS 'default'

/content/images/2023/04/signoz-06.png

Live Logs

Let’s say I wanted to monitor just one pod, such as the vote front end

$ kubectl get pods | grep vote
vote-back-azure-vote-1678278477-7dc4cf9668-gfxvw       1/1     Running   0             23d
vote-front-azure-vote-1678278477-67bb8bb55c-k9cxj      1/1     Running   0             14d

I can port-forward to the Vote App and click some choices

$ kubectl port-forward vote-front-azure-vote-1678278477-67bb8bb55c-k9cxj 8888:80
Forwarding from 127.0.0.1:8888 -> 80
Forwarding from [::1]:8888 -> 80
Handling connection for 8888
Handling connection for 8888
Handling connection for 8888

And see that reflected in Live Logs

/content/images/2023/04/signoz-07.png

Log based alerts

Before we can get an alert created, we need to pause and create an alert channel.

Alert Channels

In settings, choose “Alert Channels” and select “+ New Alert Channel”

/content/images/2023/04/signoz-15.png

I’ll add a slack channel (our choices being Slack, Webhook and Pagerduty)

On a slack channel, view Integrations then add Integration. Here we can search for “Incoming Webhook” (Careful, there is an “Outgoing” as well and there is now a limit on free accounts of total integrations)

/content/images/2023/04/signoz-16.png

We then click “Add to Slack”

/content/images/2023/04/signoz-17.png

My main Slack was over the limit, so i used a rarely touched slack instance that had “room”

/content/images/2023/04/signoz-17.png

After I clicked “Add Incoming WebHooks Integration”

/content/images/2023/04/signoz-18.png

I saw both a note in that “#Random” channel and was presented with a page that had a URL

/content/images/2023/04/signoz-19.png

I then used that in the SigNoz page and clicked test

/content/images/2023/04/signoz-20.png

I could see an uninteresting test message posted.

/content/images/2023/04/signoz-21.png

This means we have at least one integration. I’ll click “Save” at the bottom in SigNoz

Now I can see at least one alert channel defined

/content/images/2023/04/signoz-22.png

Log Based Alerts

Going to Alerts, we could do “+ New Alert”

/content/images/2023/04/signoz-08.png

We’ll pick “Log-based Alert” from the choices

/content/images/2023/04/signoz-09.png

This sets up a ClickHouse Query that essentially selects all things in a 30m interval from logs.

select 
toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 30 MINUTE) AS interval, 
toFloat64(count()) as value 
FROM signoz_logs.distributed_logs  
WHERE timestamp BETWEEN {{.start_timestamp_nano}} AND {{.end_timestamp_nano}}  
GROUP BY interval;

We can see by clicking Run Query, at present, the cluster has 2.9K logs in the last day

/content/images/2023/04/signoz-10.png

Let’s change that to loglines per minute to see a more interesting graph

SELECT toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS interval, 
toFloat64(count()) AS value 
FROM signoz_logs.logs  
WHERE timestamp > toUnixTimestamp64Nano(now64() - INTERVAL 30 MINUTE)  
GROUP BY interval 
ORDER BY interval ASC;

/content/images/2023/04/signoz-11.png

I could trim the query to just logs from the vote pod

SELECT toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS interval, 
toFloat64(count()) AS value 
FROM signoz_logs.logs  
WHERE (timestamp > toUnixTimestamp64Nano(now64() - INTERVAL 30 MINUTE) AND 
       k8s_pod_name ILIKE 'vote-front-azure-vote%')
GROUP BY interval 
ORDER BY interval ASC;

/content/images/2023/04/signoz-12.png

The logical query for an alert might be to check if the body contains “error”

SELECT toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS interval, 
toFloat64(count()) AS value 
FROM signoz_logs.logs  
WHERE (timestamp > toUnixTimestamp64Nano(now64() - INTERVAL 30 MINUTE) AND 
       k8s_pod_name ILIKE 'vote-front-azure-vote%' AND
       body ILIKE '%error%')
GROUP BY interval 
ORDER BY interval ASC;

/content/images/2023/04/signoz-13.png

Now that I’m happy with that alert, I’ll define the rest of the fields.

I’ll want a condition that says to trigger when over 0 for 5m

/content/images/2023/04/signoz-14.png

Lastly, I’ll give it a name and select an alert channel. We can use the Slack one we defined earlier and click “Create Rule”

/content/images/2023/04/signoz-23.png

I can now see the alert defined and in a state of “OK”

/content/images/2023/04/signoz-24.png

Dashboards

Under dashboards, let’s create a new blank dashboard

/content/images/2023/04/signoz-25.png

I’ll choose configure so we can give it a name

/content/images/2023/04/signoz-26.png

I’ll give it a name and description and click save

/content/images/2023/04/signoz-27.png

Next, I’ll choose “+ Add Panel” and click the “Value” panel

/content/images/2023/04/signoz-28.png

I’ll want just a simple count this time of the log errors for the Azure Vote app

SELECT count() 
FROM signoz_logs.logs  
WHERE (timestamp > toUnixTimestamp64Nano(now64() - INTERVAL 30 MINUTE) AND 
       k8s_pod_name ILIKE 'vote-front-azure-vote%' AND
       body ILIKE '%error%');

You’ll see I gave it Panel title and description on the right then filled in the query in “ClickHouse Query” below. I clicked “Stage and Run Query” to check it was accurate

/content/images/2023/04/signoz-29.png

Now when I click “Save”, I can see the panel in the page

/content/images/2023/04/signoz-30.png

From here, I’ll click “Save Layout” to save this initial dashboard. Now when I go to Dashboards, I can see “My First Dashboard” listed

/content/images/2023/04/signoz-31.png

APM Data

I won’t cover the full setup for Dapr and the Calculator App. You can find most of the samples here.

However, in short, I’m routing Zipkin traces to the IP of the Open Telemetry pod by way of dapr via a Configuration

$ kubectl get configuration appconfig -o yaml | tail -n 8
  metric:
    enabled: true
  metrics:
    enabled: true
  tracing:
    samplingRate: "1"
    zipkin:
      endpointAddress: http://10.43.156.140:9411/api/v2/spans

The OTel config is pointed out to Datadog to receive APM trace data. Poking around the Calculator a bit, I can see the output here

/content/images/2023/04/signoz-32.png

If we lookup services in the SigNoz namespace, we can see some collectors

$ kubectl get svc -n platform | grep otel
my-release-k8s-infra-otel-deployment       ClusterIP   10.43.100.12    <none>        13133/TCP                                        2d5h
my-release-signoz-otel-collector-metrics   ClusterIP   10.43.173.102   <none>        13133/TCP                                        2d5h
my-release-k8s-infra-otel-agent            ClusterIP   10.43.118.130   <none>        13133/TCP,8888/TCP,4317/TCP,4318/TCP             2d5h
my-release-signoz-otel-collector           ClusterIP   10.43.246.225   <none>        14250/TCP,14268/TCP,8888/TCP,4317/TCP,4318/TCP   2d5h

While they don’t have a native zipkin (9411), I do see OTLP (4317, 4318)

I updated the appconfig then rotated all the pods

$ kubectl get svc -n platform
NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                          AGE
my-release-signoz-alertmanager-headless    ClusterIP   None            <none>        9093/TCP                                         2d8h
my-release-zookeeper-headless              ClusterIP   None            <none>        2181/TCP,2888/TCP,3888/TCP                       2d8h
my-release-signoz-query-service            ClusterIP   10.43.81.119    <none>        8080/TCP,8085/TCP                                2d8h
my-release-signoz-alertmanager             ClusterIP   10.43.215.188   <none>        9093/TCP                                         2d8h
my-release-signoz-frontend                 ClusterIP   10.43.59.191    <none>        3301/TCP                                         2d8h
my-release-clickhouse-operator-metrics     ClusterIP   10.43.104.64    <none>        8888/TCP                                         2d8h
my-release-k8s-infra-otel-deployment       ClusterIP   10.43.100.12    <none>        13133/TCP                                        2d8h
my-release-signoz-otel-collector-metrics   ClusterIP   10.43.173.102   <none>        13133/TCP                                        2d8h
my-release-k8s-infra-otel-agent            ClusterIP   10.43.118.130   <none>        13133/TCP,8888/TCP,4317/TCP,4318/TCP             2d8h
my-release-zookeeper                       ClusterIP   10.43.9.137     <none>        2181/TCP,2888/TCP,3888/TCP                       2d8h
my-release-signoz-otel-collector           ClusterIP   10.43.246.225   <none>        14250/TCP,14268/TCP,8888/TCP,4317/TCP,4318/TCP   2d8h
my-release-clickhouse                      ClusterIP   10.43.219.182   <none>        8123/TCP,9000/TCP                                2d8h
chi-my-release-clickhouse-cluster-0-0      ClusterIP   None            <none>        8123/TCP,9000/TCP,9009/TCP                       2d8h

$ cat appconfig.yaml
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"dapr.io/v1alpha1","kind":"Configuration","metadata":{"annotations":{},"creationTimestamp":"2023-04-25T19:36:08Z","generation":4,"name":"appconfig","namespace":"default","resourceVersion":"4981104","uid":"f4ba482b-9213-4704-a2e6-73d157f4b69f"},"spec":{"metric":{"enabled":true},"metrics":{"enabled":true},"tracing":{"samplingRate":"1","zipkin":{"endpointAddress":"http://10.43.156.140:9411/api/v2/spans"}}}}
  creationTimestamp: "2023-04-25T19:36:08Z"
  generation: 5
  name: appconfig
  namespace: default
  resourceVersion: "4982167"
  uid: f4ba482b-9213-4704-a2e6-73d157f4b69f
spec:
  metric:
    enabled: true
  metrics:
    enabled: true
  tracing:
    samplingRate: "1"
    otel:
      endpointAddress: "10.43.246.225:4317"
      isSecure: false
      protocol: grpc

$ kubectl apply -f appconfig.yaml

Then rotate all the pods with Dapr enabled

$ kubectl get pod -o=jsonpath='{.items[?(@.metadata.annotations.dapr\.io/enabled=="true")].metadata.name}' | xargs kubectl delete pod

After poking around the Calculator app

/content/images/2023/04/signoz-33.png

And I can now see it in SigNoz

/content/images/2023/04/signoz-34.png

We can see specific details on one of the microservices

/content/images/2023/04/signoz-35.png

We can also review all the traces

/content/images/2023/04/signoz-36.png

We can do all the things you would expect, such as break down results by time frame, call and graph by Rate/Sec

/content/images/2023/04/signoz-37.png

Under usage, we can see spans over time by service

/content/images/2023/04/signoz-38.png

There is a nice Animated Map as well

PVCs

We can set our retention which will affect total storage

/content/images/2023/04/signoz-39.png

The data lands in our PVCs in the namespace

$ kubectl get pvc -n platform
NAME                                                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-release-zookeeper-0                                         Bound    pvc-19c35a26-0913-4084-a9bf-c8398b55fec9   8Gi        RWO            local-path     2d9h
storage-my-release-signoz-alertmanager-0                            Bound    pvc-c7767051-bb48-41b0-b96f-9a39d07ec3bf   100Mi      RWO            local-path     2d9h
signoz-db-my-release-signoz-query-service-0                         Bound    pvc-0dc7624d-114d-49dc-bec1-c22241b54569   1Gi        RWO            local-path     2d9h
data-volumeclaim-template-chi-my-release-clickhouse-cluster-0-0-0   Bound    pvc-f0f9383a-c7ee-4e75-8de7-4e697cf73254   20Gi       RWO            local-path     2d9h

I can check the disk consumed by running df on the container which is about 10%

/content/images/2023/04/signoz-40.png

Summary

SigNoz was founded in 2020 by Ankit Nayan (current CTO) and Pranay Prateek and named after a play on the phrase “Signal vs Noise”.

They have an office in San Francisco and of the developers listed on LinkedIn it would appear that most of their developers are in Bengaluru (which matches this WorkAtAStartup listing).

As I see it, SigNoz has created a refreshingly complete open-source offering. I found it easy to consume logs out of the box as well as traces. I did not explore metrics or routing logs from external sources (though they have guides for FluentBit, FluentD and LogStash)

I checked Datadog and found it didn’t overburden the cluster

/content/images/2023/04/signoz-41.png

I could expose SigNoz externally, but port-forwarding is easy

$ kubectl --namespace platform port-forward svc/`kubectl get svc --namespace platform -l "app.kubernetes.io/component=frontend" -o jsonpath="{.items[0].metadata.name}"` 3301:3301
Forwarding from 127.0.0.1:3301 -> 3301
Forwarding from [::1]:3301 -> 3301
Handling connection for 3301
Handling connection for 3301
Handling connection for 3301
Handling connection for 3301
Handling connection for 3301

My next steps will likely be externalizing SigNoz, adding a proper webhook to discord, and seeing how to route external cluster logs into a centralized SigNoz instance.

SigNoz APM Logging

Have something to add? Feedback? Try our new forums

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes