Published: Jan 3, 2023 by Isaac Johnson
We’ve covered much of Dynatrace in our last two posts, but we have yet to use in production. Today we’ll install into a primary cluster and test.
We will look at profiling and optimization, then alerting with PagerDuty, MS Teams, and the mobile app. Lastly, we’ll wrap the series discussing usage and thoughts on costs and “buy now”.
Testing on Production Cluster
Having seen that I was able to remove the test cluster, I’ll apply to my production cluster
$ kubectl create namespace dynatrace
namespace/dynatrace created
$ kubectl apply -f https://github.com/Dynatrace/dynatrace-operator/releases/download/v0.10.0/kubernetes.yaml
poddisruptionbudget.policy/dynatrace-webhook created
serviceaccount/dynatrace-activegate created
serviceaccount/dynatrace-kubernetes-monitoring created
serviceaccount/dynatrace-dynakube-oneagent-privileged created
serviceaccount/dynatrace-dynakube-oneagent-unprivileged created
serviceaccount/dynatrace-operator created
serviceaccount/dynatrace-webhook created
customresourcedefinition.apiextensions.k8s.io/dynakubes.dynatrace.com created
clusterrole.rbac.authorization.k8s.io/dynatrace-kubernetes-monitoring created
clusterrole.rbac.authorization.k8s.io/dynatrace-operator created
clusterrole.rbac.authorization.k8s.io/dynatrace-webhook created
clusterrolebinding.rbac.authorization.k8s.io/dynatrace-kubernetes-monitoring created
clusterrolebinding.rbac.authorization.k8s.io/dynatrace-operator created
clusterrolebinding.rbac.authorization.k8s.io/dynatrace-webhook created
role.rbac.authorization.k8s.io/dynatrace-operator created
role.rbac.authorization.k8s.io/dynatrace-webhook created
rolebinding.rbac.authorization.k8s.io/dynatrace-operator created
rolebinding.rbac.authorization.k8s.io/dynatrace-webhook created
service/dynatrace-webhook created
deployment.apps/dynatrace-operator created
deployment.apps/dynatrace-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/dynatrace-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/dynatrace-webhook created
$ kubectl -n dynatrace wait pod --for=condition=ready --selector=app.kubernetes.io/name=dynatrace-operator,app.kubernetes.io/component=webhook --timeout=300s
pod/dynatrace-webhook-b9c6bd86b-lgwnx condition met
pod/dynatrace-webhook-b9c6bd86b-4fxvl condition met
$ kubectl apply -f /mnt/c/Users/isaac/Downloads/dynakube77.yaml
secret/k3smac77 created
dynakube.dynatrace.com/k3smac77 created
My first test will be to see that the GH Actions runner works (frankly by pushing this commit).
I can see all the pods are fine
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
test kaniko 0/1 Completed 0 146d
kube-system svclb-azure-vote-front-789db446-6dqnj 0/1 Pending 0 152d
kube-system svclb-azure-vote-front-789db446-qks9x 0/1 Pending 0 152d
kube-system svclb-azure-vote-front-789db446-rrt74 0/1 Pending 0 145d
kube-system svclb-sonarqube-ce-bc1d9b4c-znxdd 0/2 Pending 0 94d
kube-system svclb-react-form-25f016ef-bthrm 0/1 Pending 0 145d
kube-system svclb-react-form-25f016ef-th7xl 0/1 Pending 0 153d
kube-system svclb-sonarqube-ce-bc1d9b4c-r47p4 0/2 Pending 0 94d
kube-system svclb-react-form-25f016ef-lvghd 0/1 Pending 0 153d
kube-system svclb-sonarqube-ce-bc1d9b4c-mt55s 0/2 Pending 0 94d
kube-system svclb-sonarqube-ce-bc1d9b4c-2hqc7 0/2 Pending 0 94d
kube-system svclb-azure-vote-front-789db446-jpq9t 0/1 Pending 0 152d
dapr-system dapr-dashboard-9445ffcb5-2tws5 1/1 Running 0 153d
default nginx-ingress-release-nginx-ingress-5bb8867c98-pjdkr 1/1 Running 0 151d
default vote-front-azure-vote-7ddd5967c8-s4zvn 1/1 Running 0 152d
default node-subscriber-6d99bd4bd7-42wnw 2/2 Running 0 152d
default react-form-764468d8b-fjfmz 2/2 Running 0 152d
default my-dd-release-datadog-blp8l 4/4 Running 0 153d
default new-kafka-release-0 1/1 Running 1 (152d ago) 152d
default azure-vote-back-6fcdc5cbd5-wgj8q 1/1 Running 0 151d
default redis-replicas-0 1/1 Running 14 (25d ago) 153d
default argo-cd-argocd-repo-server-75c6f9c555-b9cwf 1/1 Running 0 51d
kube-system local-path-provisioner-6c79684f77-9qvpf 1/1 Running 0 151d
kube-system svclb-next-kafka-release-0-external-3983744f-gzdq2 1/1 Running 0 36d
dapr-system dapr-placement-server-0 1/1 Running 0 153d
test my-nginx-5dc8c6c4cd-qpbt4 1/1 Running 0 146d
default argo-cd-argocd-redis-75f748669c-9gcng 1/1 Running 0 51d
dapr-system dapr-sentry-74cdff5467-n6zdb 1/1 Running 0 153d
loft loft-844859c9c7-xfdq5 1/1 Running 0 51d
default new-kafka-release-zookeeper-0 1/1 Running 0 152d
kube-system coredns-d76bd69b-k57p5 1/1 Running 0 151d
default new-kafka-release-client 1/1 Running 0 39d
dapr-system dapr-sidecar-injector-85bc6b4597-gml88 1/1 Running 0 153d
kube-system svclb-nginx-ingress-release-nginx-ingress-d0903822-5xlqt 2/2 Running 0 153d
default my-dd-release-datadog-clusterchecks-854946dcd8-5b7n7 1/1 Running 0 151d
kube-system svclb-react-form-25f016ef-8k27h 0/1 Pending 0 153d
vcluster-codefresh-working-2-p5gby nginx-ingress-release-nginx-ingress-5bb8867c98-6s7kr-15fb39c8aa 1/1 Running 5 (51d ago) 51d
default next-kafka-release-0 1/1 Running 1 (36d ago) 36d
default next-kafka-release-zookeeper-0 1/1 Running 0 36d
default python-subscriber-79986596f9-78pcd 2/2 Running 0 152d
default harbor-registry2-core-6bd7984ffb-ffzff 1/1 Running 0 56d
default next-kafka-release-client 1/1 Running 0 36d
default argo-cd-argocd-server-55d686dbcb-g2b7z 1/1 Running 0 51d
default my-dd-release-datadog-cluster-agent-6d7c4cdcd4-cgx6g 1/1 Running 4 (51d ago) 153d
default redis-master-0 1/1 Running 0 153d
default harbor-registry2-chartmuseum-7577686667-pkbqp 1/1 Running 0 56d
default azure-vote-front-5f4b8d498-7dj5v 1/1 Running 0 151d
default harbor-registry2-exporter-648f957c7c-zs2cw 1/1 Running 0 56d
kube-system svclb-next-kafka-release-0-external-3983744f-d6pcl 1/1 Running 0 36d
test my-nginx-5dc8c6c4cd-9hhbc 1/1 Running 0 146d
default my-dd-release-datadog-clusterchecks-854946dcd8-h9xxp 1/1 Running 0 153d
default harbor-registry2-jobservice-57bfcc8bc8-zcsm7 1/1 Running 0 56d
default csharp-subscriber-66b7c5bcbc-gpt4c 2/2 Running 0 153d
cert-manager cert-manager-webhook-6c9dd55dc8-gdwsv 1/1 Running 0 151d
default harbor-registry2-redis-0 1/1 Running 0 56d
kube-system svclb-nginx-ingress-release-nginx-ingress-d0903822-9gcxr 2/2 Running 0 153d
default my-dd-release-datadog-hnbl4 4/4 Running 0 153d
default redis-replicas-1 1/1 Running 13 (25d ago) 153d
default harbor-registry2-notary-signer-845658c5bc-s2cf8 1/1 Running 0 51d
default kasarest-deployment-85d4cfbc94-ltcg8 1/1 Running 0 55d
default my-dd-release-kube-state-metrics-6754b98bfd-lxkw9 1/1 Running 0 153d
default vote-back-azure-vote-7ffdcdbb9d-gpqw5 1/1 Running 0 152d
default ubuntu 1/1 Running 0 35d
default my-dd-release-datadog-4l6sw 4/4 Running 8 (35d ago) 145d
kube-system svclb-next-kafka-release-0-external-3983744f-pnqzk 1/1 Running 0 35d
kube-system svclb-nginx-ingress-release-nginx-ingress-d0903822-22662 2/2 Running 4 (35d ago) 145d
default harbor-registry2-trivy-0 1/1 Running 0 35d
default gbwebui-7d8986b8b8-jdjpd 1/1 Running 0 22d
default mypostgres-postgresql-0 1/1 Running 2 (22d ago) 35d
default sonarqube-ce-7f4d8997cb-zrlqr 1/1 Running 7558 (22d ago) 51d
cert-manager cert-manager-64d9bc8b74-kpkjk 1/1 Running 60 (10d ago) 151d
default my-redis-release-redis-cluster-1 0/1 Running 3123 (52d ago) 151d
kube-system svclb-nginx-ingress-release-nginx-ingress-d0903822-jdgtv 2/2 Running 6 (10d ago) 153d
kube-system metrics-server-7cd5fcb6b7-5pq7t 1/1 Running 0 151d
default harbor-registry2-portal-7878768b86-r9qgw 1/1 Running 33 (10d ago) 56d
default kafka-release-2-client 0/1 Unknown 0 39d
kube-system svclb-next-kafka-release-0-external-3983744f-xz9ld 1/1 Running 1 (10d ago) 36d
default nfs-server-provisioner-1658802767-0 1/1 Running 244 (10d ago) 56d
actions-runner-system actions-runner-controller-6d64557877-f5rkc 2/2 Running 204 (10d ago) 51d
default harbor-registry2-registry-78fd5b8f56-fgnn7 2/2 Running 77 (10d ago) 56d
default argo-cd-argocd-application-controller-7f4bd87d6f-gvqb6 1/1 Running 60 (10d ago) 51d
vcluster-codefresh-working-2-p5gby codefresh-working-2-etcd-0 1/1 Running 6 (10d ago) 145d
default nfs-client-provisioner-544798bc88-vkv4d 1/1 Running 215 (10d ago) 51d
loft loft-agent-85d5b44d8b-rd9jt 1/1 Running 90 (10d ago) 145d
default my-dd-release-datadog-62n4p 4/4 Running 20 (10d ago) 153d
vcluster-codefresh-working-2-p5gby codefresh-working-2-controller-6b76577759-bv5ck 1/1 Running 4 (10d ago) 145d
cert-manager cert-manager-cainjector-6db6b64d5f-5dtxc 1/1 Running 140 (10d ago) 151d
default harbor-registry2-notary-server-6b4b47bb86-wlpj8 1/1 Running 12 (10d ago) 56d
vcluster-codefresh-working-2-p5gby codefresh-working-2-api-7f56ccd68f-n5gj9 1/1 Running 24 (10d ago) 145d
vcluster-codefresh-working-2-p5gby codefresh-working-2-75987c48b4-624xw 1/1 Running 328 (10d ago) 145d
default my-redis-release-redis-cluster-3 1/1 Running 11 (10d ago) 151d
adwerx adwerxawx-postgresql-0 1/1 Running 130 (10d ago) 35d
default redis-replicas-2 1/1 Running 12 (10d ago) 151d
default my-redis-release-redis-cluster-5 0/1 Running 13 (10d ago) 151d
default my-redis-release-redis-cluster-2 0/1 Running 8 (10d ago) 151d
vcluster-codefresh-working-2-p5gby coredns-5df468b6b7-6jtvk-x-kube-system-x-codefresh-working-2 1/1 Running 99 (10d ago) 145d
default my-redis-release-redis-cluster-0 0/1 Running 12 (10d ago) 151d
adwerx adwerxawx-68889fdd67-6ch9f 3/3 Running 3 (10d ago) 51d
dapr-system dapr-operator-747cd9748-kdt88 1/1 Running 683 (10d ago) 153d
default python-crfunction-7d44797b8b-g6bfh 2/2 Running 44 (10d ago) 51d
default my-redis-release-redis-cluster-4 0/1 CreateContainerError 3123 (10d ago) 151d
default new-jekyllrunner-deployment-tbq6z-z9q7t 2/2 Running 0 5d2h
default new-jekyllrunner-deployment-tbq6z-dqqjv 2/2 Running 0 40h
default my-dd-release-datadog-cluster-agent-6d7c4cdcd4-vt6j7 1/1 Running 5 (16h ago) 151d
test my-nginx-5dc8c6c4cd-hdfnh 0/1 ImagePullBackOff 0 (52d ago) 146d
dynatrace dynatrace-webhook-b9c6bd86b-9g899 1/1 Running 0 2m8s
dynatrace dynatrace-operator-766c7f4778-2vbpt 1/1 Running 0 2m8s
dynatrace dynatrace-webhook-b9c6bd86b-9sh7h 1/1 Running 0 2m8s
In running the test before, it was my GH Runner that fell down, so I’ll check my pods
$ kubectl get pods --all-namespaces | grep runner
actions-runner-system actions-runner-controller-6d64557877-f5rkc 2/2 Running 204 (10d ago) 51d
default new-jekyllrunner-deployment-tbq6z-dqqjv 2/2 Running 0 40h
default new-jekyllrunner-deployment-tbq6z-h9svg 2/2 Running 0 77s
# checking mid build
$ kubectl get pods --all-namespaces | grep runner
actions-runner-system actions-runner-controller-6d64557877-f5rkc 2/2 Running 204 (10d ago) 51d
default new-jekyllrunner-deployment-tbq6z-dqqjv 2/2 Running 0 40h
default new-jekyllrunner-deployment-tbq6z-h9svg 2/2 Running 0 3m24s
I can see the old cluster stopping and the new cluster starting in my Dynatrace dashboard
That said, after a while, I did not see data coming in
Perhaps I need to use the wizard to add a fresh cluster. I would have assumed I could reuse tokens.
Then I’ll apply it
$ kubectl apply -f /mnt/c/Users/isaac/Downloads/dynakube\ \(1\).yaml
secret/mac77 created
dynakube.dynatrace.com/mac77 created
This time it took
That said, I was getting a lot of DT based PagerDuty Alerts
They had detected hosts were now unavailable
which I could see in more detail in PD
In Dynatrace, we can either find the window the hosts went offline and if they are active then “Close Problem”
Or, if we do not wish to be alerting about hosts missing anymore, we can disable the Host Anomaly checks.
Sadly, you cannot exclude specific hosts. You can only really modify alerts based on few predefined filters (and host name isn’t amongst them)
If I were to handle this in production, I might modify my receiver (e.g. PagerDuty) to ignore hosts of a type or name.
In the “Do these things”, pick Suppress Alert
However, as the day progressed, I saw more and more PD alerts crop up.
Many came from the Dynatrace oneagent pod itself crashing
I can see which two pods are crashing, but again, Dynatrace captured no pod logs
I can see the same when I check with kubectl
$ kubectl get pods --all-namespaces | grep oneagent
dynatrace mac77-oneagent-5wzp2 1/1 Running 0 12h
dynatrace mac77-oneagent-mth94 1/1 Running 0 12h
dynatrace mac77-oneagent-fbdjz 0/1 CrashLoopBackOff 144 (57s ago) 12h
dynatrace mac77-oneagent-wzpml 0/1 CrashLoopBackOff 145 (25s ago) 12h
When I check the logs, I see it’s from a parameter I added:
03:29:09 Starting installer...
21:29:10 Error: Unrecognized parameter: '--set-system-logs-access-enabled'. Did you forget '='?
Usage: Dynatrace-OneAgent-Linux.sh [-h] [-v] [--set-server=https://server_address:server_port] [--set-tenant=tenant] [--set-tenant-token=tenant_token] [--set-proxy=proxy_address:proxy_port|no_proxy] [--set-host-group=host_group] [--set-infra-only=false|true] [INSTALL_PATH=absolute_path] [--set-app-log-content-access=false|true] [USER=username] [GROUP=groupname] [NON_ROOT_MODE=0|1] [DISABLE_ROOT_FALLBACK=0|1]
-h, --help Display this help and exit.
-v, --version Print version and exit.
INSTALL_PATH Installation path to be used, must be absolute and not contain any spaces.
DATA_STORAGE Path to the directory for large runtime data storage, must be absolute and not contain any spaces.
LOG_PATH Logs path to be used, must be absolute and not contain any spaces.
USER The name of the unprivileged user for OneAgent processes. Must contain 3-32 alphanumeric characters. Defaults to 'dtuser'
GROUP The name of the primary group for OneAgent processes, defaults to the value of USER. May only be used in conjunction with USER.
NON_ROOT_MODE Enables non-privileged mode. For details, see: https://www.dynatrace.com/support/help/shortlink/root-privileges#linux-non-root-mode
DISABLE_ROOT_FALLBACK Disables temporary elevation of the privileges in environments where ambient capabilities are unavailable.
For details, see: https://www.dynatrace.com/support/help/shortlink/linux-custom-installation
We can see I missed the “=true”
$ kubectl get dynakube -n dynatrace -o yaml | grep set-system-logs-access-enabled
{"apiVersion":"dynatrace.com/v1beta1","kind":"DynaKube","metadata":{"annotations":{"feature.dynatrace.com/automatic-kubernetes-api-monitoring":"true"},"name":"mac77","namespace":"dynatrace"},"spec":{"activeGate":{"capabilities":["routing","kubernetes-monitoring"],"group":"onprem","resources":{"limits":{"cpu":"1000m","memory":"1.5Gi"},"requests":{"cpu":"500m","memory":"512Mi"}}},"apiUrl":"https://kaz10218.live.dynatrace.com/api","networkZone":"onprem","oneAgent":{"classicFullStack":{"args":["--set-host-group=onprem","--set-system-logs-access-enabled"],"env":[{"name":"ONEAGENT_ENABLE_VOLUME_STORAGE","value":"false"}],"tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane","operator":"Exists"}]}},"skipCertCheck":false}}
- --set-system-logs-access-enabled
I’ll correct and re-apply
$ kubectl get dynakube -n dynatrace -o yaml > dynakube77.yaml
$ vi dynakube77.yaml
$ kubectl get dynakube -n dynatrace -o yaml > dynakube77.yaml.bak
$ diff dynakube77.yaml.bak dynakube77.yaml
35c35,36
< - --set-system-logs-access-enabled
---
> - --set-system-logs-access-enabled=true
> - --set-app-log-content-access=true
$ kubectl apply -f dynakube77.yaml -n dynatrace
dynakube.dynatrace.com/mac77 configured
Within a couple minutes this was fine
$ kubectl get pods -n dynatrace | grep one
mac77-oneagent-5wzp2 1/1 Running 0 12h
mac77-oneagent-8822g 0/1 Running 0 30s
mac77-oneagent-sxpcm 0/1 Running 0 29s
mac77-oneagent-q85ct 0/1 Running 0 20s
$ kubectl get pods -n dynatrace | grep one
mac77-oneagent-8822g 1/1 Running 0 2m42s
mac77-oneagent-sxpcm 1/1 Running 0 2m41s
mac77-oneagent-q85ct 1/1 Running 0 2m32s
mac77-oneagent-7jm8b 1/1 Running 0 77s
But not before trigger even more PD alerts
Simple because of the noise I’ve gotten, I decided to disable the PD notifications for now
While my updates on my production cluster still didn’t capture pod logs, I will say that having left it running for a day, no negative effects were seen on any other workload. This means whatever I had experienced prior was an aberration on my home cluster a year ago.
a day later…..
I found the notification re-enabled itself and PagerDuty started to blow up again
I decided to try another route.
This time I created an Alerting Profile “ShutupPD” in “Settings/Alerting”
Quite simply, it just alerts if an error happens and it occurs on a non-existent tag (asdfasdfasdf).
Now I’ll change my Problem Notifications in Integration to only trigger PD on the “ShutupPD” Alerting Profile
I can see it is now changed (so we’ll see if comes back)
I hopped into PD and resolved the lot of existing issues
Profiling and Optimization
I won’t get too deep into profiling, other than to say I continue to be amazed at how deep Dynatrace can go into my code just seeing containers.
I dug into my react-form and could break down the calls my method
Alerting
There is little to show for alerting. I searched the UI for anything akin to notification destinations.
It seems one can create profiles that can be consumed by other apps.
There just isn’t much there, even if you read the docs
I can create a pager duty problem notification
In PagerDuty I need to setup an integration to get the key and URL
I can then click “Send test notification”
confirmation
and see it was sent
To use it, we go to Settings, Anomaly Detection.
If we wanted to trigger on a metric anomaly.
I create one based on “Pod count”
I can set a threshold as well as a few filters, such as by entity or a dimension filter
I can set a name
and once all the fields are set, I can save changes (using the distant bottom left save)
Unlike Datadog, there is no warn vs alert threshold. Nor is there an inline preview. However, we can click the “Show in explorer” button
which will render the metric on our time window
In about a minute I saw a Problem arise
We can close a problem or make comments on it
A lot of the various alerting is via predefined areas in Anomaly Detection.
For instance, instead of doing a Metric of containers in pending, one looks to Workloads in the Kubernetes section
with some configurable settings
I still do not see why my Kubernetes alerts (Problems) do not trigger PagerDuty
I created a new Metric alert and that too didn’t trigger anything
oddly, i tracked it down to my notification ?thing? was absent
Once recreated, a new metric alert triggered PagerDuty as I would expect
Then create one with a nice Dynatrace icon
Which will give me a URL like “https://princessking.webhook.office.com/webhookb2/asdfasdfasdfsadsadfsadfasdf@asdfasdfasdfasdfasdfasdf/IncomingWebhook/asdfasdfasdfasdfsadfasdasdfsa/asdfsadfsadfasdfasdfasdf”
I can then make a “Custom Integration” in “Settings/Integration/Problem Notifications”.
A simple payload:
{
"type":"message",
"attachments":[
{
"contentType":"application/vnd.microsoft.card.adaptive",
"contentUrl":null,
"content":{
"$schema":"http://adaptivecards.io/schemas/adaptive-card.json",
"type":"AdaptiveCard",
"version":"1.2",
"body":[
{
"type": "TextBlock",
"text": "State: {State}, ProblemID: {ProblemID}, ProblemTitle: {ProblemTitle}"
}
]
}
}
]
}
I can send a test notification
and we can see it in Teams
Or a slightly more complicated version
{
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"content": {
"type": "AdaptiveCard",
"body": [
{
"type": "TextBlock",
"size": "Medium",
"weight": "Bolder",
"text": "State: {State} :: ProblemID: {ProblemID}"
},
{
"type": "TextBlock",
"text": "{ProblemTitle}"
}
],
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"version": "1.0"
}
}]
}
which renders
MS Teams
Let’s create a channel for Dynatrace. Choose Add Channel
Then create one for Dynatrace
And then create a Connector
Then configure an “Incoming Webhook”
MS Teams
Let’s create a channel for Dynatrace. Choose Add Channel
Give it a name, like “Dynatrace Notifications”
Then create a Connector
Configure an “Incoming Webhook”
I like to create one with a nice Dynatrace icon
Which will give me a URL like “https://princessking.webhook.office.com/webhookb2/adsf@asdf/IncomingWebhook/asdf/asdf”
I can then make a “Custom Integration” in “Settings/Integration/Problem Notifications”.
A simple payload:
{
"type":"message",
"attachments":[
{
"contentType":"application/vnd.microsoft.card.adaptive",
"contentUrl":null,
"content":{
"$schema":"http://adaptivecards.io/schemas/adaptive-card.json",
"type":"AdaptiveCard",
"version":"1.2",
"body":[
{
"type": "TextBlock",
"text": "State: {State}, ProblemID: {ProblemID}, ProblemTitle: {ProblemTitle}"
}
]
}
}
]
}
I can send a test notification
and we can see it in Teams
Or a slightly more complicated version
{
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"content": {
"type": "AdaptiveCard",
"body": [
{
"type": "TextBlock",
"size": "Medium",
"weight": "Bolder",
"text": "State: {State} :: ProblemID: {ProblemID}"
},
{
"type": "TextBlock",
"text": "{ProblemTitle}"
}
],
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"version": "1.0"
}
}]
}
which renders
Mobile App
There is a mobile app one can use to monitor problems.
Once you sign in, you can view the Problems for your Account
Clicking into the Environment, shows our issues
I can look at the details of an active or closed problem
Such as Pending Pods
If I click “Show in Webview”, it loads the website onto the phone
Usage
When it comes time to pay for things, we’ll want to get an idea of usage.
At any point we can head over to the “Manage/Consumption” section to view our current usage
Like one might expect, the usage details above match the time window in the upper right.
If I change to the last 30d, I can get a 30d rolling report
I like the fact you can view by “Monitored Entities” if one is trying to see what systems are using Dynatrace the most
If you click your user icon in the upper right, we can also see a top summary of usage
In my case, I very easily used up all credits in my trial window and the system stopped
Buy Now
Yet again, we are stuck with phone calls or chat. Buy Now just takes me to a “I must negotiate” page
Which I really really hate. This tells me prices are fluid and depend on what a salesperson negotiates.
Yes, they list prices, but there is no “buy” button - it’s only “start a trial” and “Contact Sales”
If we compare to Datadog
I can pay-as-you-go or I can just pick like a menu.
And while Datadog arguably lets you pick A LOT of menu items, i can use Usage to see what I really use
Or if we compare to SumoLogic, another of my APM suites, we can just buy Sumo Credits (they have a model where everything breaks down into a credit to spend)
Or New Relic (who still have rather high logging prices in my opinion)
Perhaps I’m being harsh, but I just do not trust companies that cannot be upfront with costs and require me to chat with someone to get a real price.
I’ll agree that Dynatrace lists some price guidance, but if you see “* starts at” with no more details, you know you are in for a ride.
I did some searching and again found threads about Dynatrace pricing
One of the ones that hit home was by a government-based software developer: “The solution is a SaaS. If we were to stop paying the subscription entirely, the service would end shortly afterward, based on the contractual arrangements we have with them”
This leads to another reality; New Relic, Sumo Logic and Datadog All have free tiers (which I use). If I don’t pay, I just get reduced to a free-tier plan. I don’t lose my recent data.
I guess I see Dynatrace as the Peloton of APM. It’s very nice - no argument there, feature rich and a wonderful interface. But it comes with fees that render it useless without an active contract.
To make my point, when the trial expires, you have two options “buy now” or “contact sales”
And the “Buy Now” takes me to, surprise surprise, a Chat now or call me later option
The link in the upper right “Buy or extend” also takes us to this ‘purchase’ page. They literally make it impossible to keep going without a conversation.
Trying to purchase before expiring
Trying to purchase after
I also found it interesting that they email you from a ‘natrial’ email address:
But if you reply to it
it just bounces back
And near the end of my trial, a banner clearly prompted me to “buy now”. Only, as you can see, that was impossible
Even the “Chat now” was disabled
Summary
Summary
In our first post, we covered signing up for Dynatrace then installing into a fresh on-prem K3s cluster. We looked at the various monitoring options we get out of the box from the K8s collector; services, metrics and traces. We then wrapped up by setting up the Open Telemetry collector with Dapr and directing zipkin traces to the zipkin otel Dynatrace endpoint.
In the second, we focused on Serverless monitoring of GCP Cloud Run and AWS Lambda, Agentless (JavaScript/web-based), monitoring of Hosts (Infrastructure) and then how we remove the agent from Kubernetes which was one of our tests we set out to check at the start.
Today we covered installing into a production clusters, profiling and optimization, alerting, the mobile app, usage and some wrap up thoughts on costs and “buy now”.
Dynatrace is an amazing tool for gathering application performance data including traces and metrics. It’s ability to deeply profile containers is probably the best in the business. One can find and fix issues in the stack with relative ease.
The one agent setup is a mixed bag - you have the advantage of installing this mega-agent in your cluster, infra, etc. And then knowing it picks up its configuration from the server. In a way, that is good - you can control your monitoring selections from a central control plane instead of going out to tweak helm charts or confs. However, it also means my agent could do wild things outside my control if someone in the settings starts flipping toggles. I just see it as a mixed bag; If you have the control, perhaps it’s a non-issue. If your department’s relationship with Operations is - delicate - perhaps you do not wish to cede control on configurations to the ultimate APM suite management group.
As for logging, I simply could not get logging to work. I’ll assume it can work, but it didn’t pick up the logs in my containers and I wasn’t interested in events as logs. Additionally, throughout the UI, we have a time picker, but no LIVE view. Everything is time shifted. Often, I wanted to see the relative impact of a change of a setting or a pod rotation. However, not being able to click a ‘live’ button meant I had to repeatedly refresh the UI waiting for a change to get picked up.
While they do have a form of alerting, it is weird. I have to go into settings to an Anomaly area and then create queries based on some presets. Even then, there is no live or rendered view of the data (I had to click a button to go back to data explorer to see if I was right). I can’t tailor the alerts to different audiences easily. I say easily because there is a Problem Profile transform that lets one filter problems on a profile, and I would assume somehow this plays into notifications - but there was no linkage that was obvious to me.
Lastly, my largest bone to pick is in pricing. What is it. I can see usage; I can sort of equate the “Davis” units to a “starts at” price on a price page. But I cannot buy now, or extend my trial or even use a free tier. Once the trial is up, you are done with that account lest you start to have to talk to sales people.
And I only bring this up because it was a long time ago and I think that salesperson moved on - but when I did talk to sales, that salesperson called me daily. I got to the point where I told that person (in a somewhat kidding-on-the-square voice) “If you call me again tomorrow, I’ll block your number”. First thing in the morning, that person called me and I then ghosted them for a year after. Thus, I’m very hesitant to provide contact details to what I’ve experienced to be high-pressure sales.
But perhaps you don’t worry about that kind of thing; a director or VP or Procurement person above you will have to handle it. Some things I can see could be a “me-issue”. At the end of the day, Dynatrace is great for its namesake - tracing. It’s manageable for monitoring. It’s rather weak on alerting and who-knows on logging.