Published: Mar 1, 2022 by Isaac Johnson
Kubecost started as an OSS tool in 2019 as a way to give developers insights into their Kubernetes spend. In a Techcrunch interview, Webb Brown, one of the co-founders, noted “There has been a problem in the space around cost… teams were getting benefits, but totally sacrificing visibility in spending, kind of like having a payroll of millions of dollars, but not knowing what department, team or individuals are getting paid what.”
Initially they leveraged tools like Prometheus and Grafana to accomplish the goal. Later, the founders, Webb Brown and Ajay Tripathy, left their infrastructure monitoring roles at Google to found the company seeing a path to productizing it while still keeping the core tool Open Source.
Today, Kubecost is an easy to install containerized app that has a solid free offering as well as commercial add-ons such as SSO, combined reports, and support. Stackwatch is also looking at offering a “hosted Kubecost” value-add for teams.
In this writeup, we’ll setup Kubecost both on-prem and in AKS to look at what information and recommendations it provides.
Company History and Size
Webb and Ajay started Kubecost in 2019 (the company’s legal name is Stackwatch) and according to pitchbook they have between 26 and 50 employees with 11 investors in Series A funding (Seed in Mar 2021 and a recent Series A for $25m just last week, the same day they had a Techcrunch feature). From angel.co, one can see they are actively hiring. I also noted while Webb and Ajay are the Founders, Matt Bolt is their “founding engineer”.
Kubecost Setup
Create a namespace then install the Helm Chart to get going.
$ kubectl create namespace kubecost
namespace/kubecost created
$ helm repo add kubecost https://kubecost.github.io/cost-analyzer/
"kubecost" has been added to your repositories
$ helm install kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98"
I0220 06:56:37.223735 21495 request.go:668] Waited for 1.172902322s due to client-side throttling, not priority and fairness, request: GET:https://192.168.1.77:6443/apis/rbac.authorization.k8s.io/v1?timeout=32s
W0220 06:56:38.344728 21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:38.353636 21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:39.281060 21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:39.281475 21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: kubecost
LAST DEPLOYED: Sun Feb 20 06:56:37 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:
kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090
Next, navigate to http://localhost:9090 in a web browser.
Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install
Now we can test
$ kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090
At first I saw errors, but in looking it was just because it had not had time to gather data.
After a day, I relaunched port-forward to see some new numbers:
Looking at the Overview, we can see costs broken down by time with efficiency numbers
Since this is an on-prem cluster, so presumably this is all CapX as I’m only really paying for electricity, I was curious what the costs entailed:
It seemed to be picking up mostly on CPU, RAM and Persistent Volumes.
It was extrapolating the monthly costs from what it had tracked thus far. We can see those details under assets:
I was curious about how those numbers are calculated so I read up on their site. On this blog topic, they break down how the set those initial costs:
The cost of a node of an on-premise Kubernetes cluster is more complicated to calculate since you pay for the hardware upfront as a capital expenditure and use the servers (that make up the cluster) for an estimated period of 5 years before disposing of them. You may also install licensed software such as a Windows operating system. Installing the servers in a data center also requires space, power, and cooling. In addition, you must account for the labor costs to install and maintain the server over time.
...We have arbitrarily chosen five years because of our earlier assumption that the server hardware will be disposed of after five years to purchase a newer model.
In our settings page we can see (and override) the default on-prem cost values (as well as currency):
I noticed some other neat tweaks in the settings.
For instance, AWS loves to “negotiate” a discount. Every company I have worked for in the last 10 years thought themselves to be quite special (“because we got a special discount”). I do not want to ‘yuck their yum’ as my sister-in-law puts it, but from experience I know that AWS sticker price is about as fixed as one might find at used car lot.
Attributing costs
So How do we attribute costs to teams and departments? We can actually see the lables defined there in the settings page:
- owner
- team
- department
- app
- env
I’ll try labeling some Deployments and Pods in my AKS cluster using the department label
$ kubectl label pods helloworld-go-v2-5c7f767d67-s854p department=develo
pment
pod/helloworld-go-v2-5c7f767d67-s854p labeled
$ kubectl label pods helloworld-go-v1-66cb794895-25tmf department=develo
pment
pod/helloworld-go-v1-66cb794895-25tmf labeled
$ kubectl label deployment helloworld-go-v2 department=development
deployment.apps/helloworld-go-v2 labeled
$ kubectl label deployment helloworld-go-v1 department=development
deployment.apps/helloworld-go-v1 labeled
$ kubectl label deployment waypoint-runner department=devops
deployment.apps/waypoint-runner labeled
I should add that the cost breakdown actually can look at many different factors besides label:
If I break it down by container, for instance, I can see which containers are consuming the most resources in the cluster.
After a few minutes, the department results came in showing that it is really the pods that are used for calculation
Setting up AKS
Let’s add this to an AKS cluster
$ az aks list -o table
Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn
-------- ---------- --------------- ------------------- ------------------- ---------------------------------------------------------
idjaks45 centralus idjaks45 1.21.9 Succeeded idjaks45-idjaks45-8defc6-c18d1418.hcp.centralus.azmk8s.io
$ az aks get-credentials -n idjaks45 -g idjaks45 --admin
Merged "idjaks45-admin" as current context in /home/builder/.kube/config
We really do not need to create the namespace first as Helm will do that for us, if we ask.
$ helm install kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98" --create-namespace
W0222 05:48:17.053745 17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:17.081216 17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:18.761426 17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:18.763110 17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: kubecost
LAST DEPLOYED: Tue Feb 22 05:48:16 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:
kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090
Next, navigate to http://localhost:9090 in a web browser.
Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install
In AKS it is a pretty easy to just add an external LB to route traffic to the service. Since there exists a service already of the same name, we’ll need to give it a different name.
$ kubectl expose deployment kubecost-cost-analyzer -n kubecost --port=9090 --target-port 9090 --name kubecost-external-svc --type LoadBalancer
service/kubecost-external-svc exposed
With kubectl get svc -n kubecost
, we can see the service’s external IP
$ kubectl get svc -n kubecost
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubecost-cost-analyzer ClusterIP 10.0.152.12 <none> 9001/TCP,9003/TCP,9090/TCP 7m7s
kubecost-external-svc LoadBalancer 10.0.175.247 52.242.225.5 9090:32590/TCP 55s
kubecost-grafana ClusterIP 10.0.238.65 <none> 80/TCP 7m7s
kubecost-kube-state-metrics ClusterIP 10.0.2.92 <none> 8080/TCP 7m7s
kubecost-prometheus-node-exporter ClusterIP None <none> 9100/TCP 7m7s
kubecost-prometheus-server ClusterIP 10.0.35.183 <none> 80/TCP 7m7s
This has data almost immediately when we load the page:
However, we’ll need to wait a few for specifics:
After a few minutes, we can see some details
As far as recommendations, it has a few ideas on how to save some money:
When it comes to “Right Sizing” the cluster, Kubecost suggests I drop down to B1LS for my current load
Which in the Azure Portal, I can see I’m using the default DS2_v2
While I appreciate the guidance, I can see from Azure that those nodes just don’t have enough memory to use as a deploy pool:
The second check was to see if the Nodepool is horizontally scaled properly. Here we see that it has determined that one node could be safely removed.
If we click the down arrow, we see how it came to that conclusion
Looking at abandoned workloads, It uses a default duration of 2 days to see what is wasting time and space (interesting that it identifies itself as abandoned)
Health
Another feature that is less about cost and more about cluster general health is the Health page.
Here we can see most checks pass, but two are marked, namely having a single master and lack of using multiple failure zones.
What I thought was particularly nice was the detailed explanation for why it is an issue. For instance, clicking on the “Cluster does not have replicated masters” text pops up
Assets
The asset page breaks down the Node, Disk and other infrastructure that makes up your cluster and shows what costs it has tracked thus far
Reports and alerts
The reporting page can let us create downloadable reports
which looks like this in excel
And the alert page can set up alerts on Slack or Email.
For instance, we could set up a daily alert on costs over all namespaces
You can checkout the docs for more interesting examples which include configuring alerts using the kubecost helm chart itself.
I do have some concerns about whether the alerts will stick after a pod cycling having looked at the container logs:
$ kubectl logs kubecost-cost-analyzer-865cf54dd9-gvwps -n kubecost cost-model | tail -n 10
I0222 13:32:52.265635 1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:32:52.265807 1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:02.266170 1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:02.266280 1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:12.267084 1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:12.267165 1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:22.267860 1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:22.267928 1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:32.269177 1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:32.269246 1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
To get Azure alerts, we’ll need to setup a rate card for the underlying SP
$ cat azure_rate_card.json
{
"Name": "KubecostRateCardRole",
"IsCustom": true,
"Description": "Rate Card query role",
"Actions": [
"Microsoft.Compute/virtualMachines/vmSizes/read",
"Microsoft.Resources/subscriptions/locations/read",
"Microsoft.Resources/providers/read",
"Microsoft.ContainerService/containerServices/read",
"Microsoft.Commerce/RateCard/read"
],
"AssignableScopes": [
"/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61"
]
}
then apply
$ az role definition create --verbose --role-definition @azure_rate_card.json
{
"assignableScopes": [
"/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61"
],
"description": "Rate Card query role",
"id": "/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61/providers/Microsoft.Authorization/roleDefinitions/cc93d6f0-e186-4c0f-a390-2dd235b37970",
"name": "cc93d6f0-e186-4c0f-a390-2dd235b37970",
"permissions": [
{
"actions": [
"Microsoft.Compute/virtualMachines/vmSizes/read",
"Microsoft.Resources/subscriptions/locations/read",
"Microsoft.Resources/providers/read",
"Microsoft.ContainerService/containerServices/read",
"Microsoft.Commerce/RateCard/read"
],
"dataActions": [],
"notActions": [],
"notDataActions": []
}
],
"roleName": "KubecostRateCardRole",
"roleType": "CustomRole",
"type": "Microsoft.Authorization/roleDefinitions"
}
Command ran in 1.707 seconds (init: 0.108, invoke: 1.599)
then apply
$ az ad sp create-for-rbac --name "idjaks45sp.mediwareinformationsystems.onmicrosoft.com" --role "KubecostRateCardRole" --sdk-auth true
Option '--sdk-auth' has been deprecated and will be removed in a future release.
Found an existing application instance of "b2049cf3-082e-47e5-a141-24305989b5a5". We will patch it
Creating 'KubecostRateCardRole' role assignment under scope '/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61'
The output includes credentials that you must protect. Be sure that you do not include these credentials in your code or check the credentials into your source control. For more information, see https://aka.ms/azadsp-cli
'name' property in the output is deprecated and will be removed in the future. Use 'appId' instead.
{
"clientId": "b2049cf3-082e-47e5-a141-24305989b5a5",
"clientSecret": "XZSLZO-AJ9VU.Or_GVTksG4.3xVfw9INyL",
"subscriptionId": "8defc61d-657a-453d-a6ff-cb9f91289a61",
"tenantId": "15d19784-ad58-4a57-a66f-ad1c0f826a45",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net/"
}
We can save in the UI
Or via Helm
helm install kubecost ./cost-analyzer -n kubecost --set kubecostProductConfigs.azureSubscriptionID=<> --set kubecostProductConfigs.azureClientID=<> --set kubecostProductConfigs.azureTenantID=<> --set kubecostProductConfigs.azureClientPassword=<> --set .kubecostProductConfigs.createServiceKeySecret=true
e.g.
$ helm upgrade kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98" --set kubecostProductConfigs.azureSubscriptionID="8defc61d-657a-453d-a6ff-cb9f91289a61" --set kubecostProductConfigs.azureClientID="b2049cf3-082e-47e5-a14
1-24305989b5a5" --set kubecostProductConfigs.azureTenantID="15d19784-ad58-4a57-a66f-ad1c0f826a45" --set kubecostProductConfigs.azureClientPassword="XZSLZO-AJ9VU.Or_GVTksG4.3xVfw9INyL" --set .kubecostProductConfigs.createServiceKeySecret=true --create-namespace
W0222 07:44:36.541614 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.571626 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.611000 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.633341 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.656889 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.686549 19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Release "kubecost" has been upgraded. Happy Helming!
NAME: kubecost
LAST DEPLOYED: Tue Feb 22 07:44:35 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:
kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090
Next, navigate to http://localhost:9090 in a web browser.
Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install
I gave it a fair amount of time, but the AKS integration did not seem to work
«««< HEAD
=======
Commercial offering
The one cluster monitor is free and is pretty feature rich for what it does. They have more options, however, including a Business and Enterprise tier. They have the extras one would expect including support, multiple clusters and SSO.
While the pricing page lacks details, the “Upgrade” page (which I assume by the broken image link is undergoing updates) does clue us in:
- US$200/mo for 50 nodes
- US$300 for up to 150 nodes
- US$400 for up to 250 nodes
and they have a “contact us” for 300+
Summary
Kubecost is a pretty nice start to a suite. Out of the box, I could see teams using it to find abandoned deployments and other waste. I liked the health checks and recommendations. However, I wouldn’t necessarily follow them all. I saw a demo on YouTube where they talked about Kubecost and tested it with K3s on CIVO tweaking the costs for that cloud provider. Since one has controls to tweak the values, I could see even calculating my own local onprem costs based on electricity alone.