Kubecost

Published: Mar 1, 2022 by Isaac Johnson

Kubecost started as an OSS tool in 2019 as a way to give developers insights into their Kubernetes spend. In a Techcrunch interview, Webb Brown, one of the co-founders, noted “There has been a problem in the space around cost… teams were getting benefits, but totally sacrificing visibility in spending, kind of like having a payroll of millions of dollars, but not knowing what department, team or individuals are getting paid what.”

Initially they leveraged tools like Prometheus and Grafana to accomplish the goal. Later, the founders, Webb Brown and Ajay Tripathy, left their infrastructure monitoring roles at Google to found the company seeing a path to productizing it while still keeping the core tool Open Source.

Today, Kubecost is an easy to install containerized app that has a solid free offering as well as commercial add-ons such as SSO, combined reports, and support. Stackwatch is also looking at offering a “hosted Kubecost” value-add for teams.

In this writeup, we’ll setup Kubecost both on-prem and in AKS to look at what information and recommendations it provides.

Company History and Size

Webb and Ajay started Kubecost in 2019 (the company’s legal name is Stackwatch) and according to pitchbook they have between 26 and 50 employees with 11 investors in Series A funding (Seed in Mar 2021 and a recent Series A for $25m just last week, the same day they had a Techcrunch feature). From angel.co, one can see they are actively hiring. I also noted while Webb and Ajay are the Founders, Matt Bolt is their “founding engineer”.

Kubecost Setup

Create a namespace then install the Helm Chart to get going.

$ kubectl create namespace kubecost
namespace/kubecost created

$ helm repo add kubecost https://kubecost.github.io/cost-analyzer/
"kubecost" has been added to your repositories

$ helm install kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98"
I0220 06:56:37.223735   21495 request.go:668] Waited for 1.172902322s due to client-side throttling, not priority and fairness, request: GET:https://192.168.1.77:6443/apis/rbac.authorization.k8s.io/v1?timeout=32s
W0220 06:56:38.344728   21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:38.353636   21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:39.281060   21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0220 06:56:39.281475   21495 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: kubecost
LAST DEPLOYED: Sun Feb 20 06:56:37 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:

    kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Next, navigate to http://localhost:9090 in a web browser.

Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install

Now we can test

$ kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

/content/images/2022/02/kubecost-01.png

At first I saw errors, but in looking it was just because it had not had time to gather data.

After a day, I relaunched port-forward to see some new numbers:

/content/images/2022/02/kubecost-02.png

Looking at the Overview, we can see costs broken down by time with efficiency numbers

/content/images/2022/02/kubecost-03.png

Since this is an on-prem cluster, so presumably this is all CapX as I’m only really paying for electricity, I was curious what the costs entailed:

It seemed to be picking up mostly on CPU, RAM and Persistent Volumes.

/content/images/2022/02/kubecost-04.png

It was extrapolating the monthly costs from what it had tracked thus far. We can see those details under assets:

/content/images/2022/02/kubecost-05.png

I was curious about how those numbers are calculated so I read up on their site. On this blog topic, they break down how the set those initial costs:

The cost of a node of an on-premise Kubernetes cluster is more complicated to calculate since you pay for the hardware upfront as a capital expenditure and use the servers (that make up the cluster) for an estimated period of 5 years before disposing of them. You may also install licensed software such as a Windows operating system. Installing the servers in a data center also requires space, power, and cooling. In addition, you must account for the labor costs to install and maintain the server over time.

...We have arbitrarily chosen five years because of our earlier assumption that the server hardware will be disposed of after five years to purchase a newer model.

/content/images/2022/02/kubecost-06.png

In our settings page we can see (and override) the default on-prem cost values (as well as currency):

/content/images/2022/02/kubecost-07.png

I noticed some other neat tweaks in the settings.

For instance, AWS loves to “negotiate” a discount. Every company I have worked for in the last 10 years thought themselves to be quite special (“because we got a special discount”). I do not want to ‘yuck their yum’ as my sister-in-law puts it, but from experience I know that AWS sticker price is about as fixed as one might find at used car lot.

/content/images/2022/02/kubecost-08.png

Attributing costs

So How do we attribute costs to teams and departments? We can actually see the lables defined there in the settings page:

  • owner
  • team
  • department
  • app
  • env

/content/images/2022/02/kubecost-09.png

I’ll try labeling some Deployments and Pods in my AKS cluster using the department label

$ kubectl label pods helloworld-go-v2-5c7f767d67-s854p department=develo
pment
pod/helloworld-go-v2-5c7f767d67-s854p labeled
$ kubectl label pods helloworld-go-v1-66cb794895-25tmf department=develo
pment
pod/helloworld-go-v1-66cb794895-25tmf labeled
$ kubectl label deployment helloworld-go-v2 department=development
deployment.apps/helloworld-go-v2 labeled
$ kubectl label deployment helloworld-go-v1 department=development
deployment.apps/helloworld-go-v1 labeled
$ kubectl label deployment waypoint-runner department=devops
deployment.apps/waypoint-runner labeled

I should add that the cost breakdown actually can look at many different factors besides label:

/content/images/2022/02/kubecost-23.png

If I break it down by container, for instance, I can see which containers are consuming the most resources in the cluster.

/content/images/2022/02/kubecost-24.png

After a few minutes, the department results came in showing that it is really the pods that are used for calculation

/content/images/2022/02/kubecost-25.png

Setting up AKS

Let’s add this to an AKS cluster

$ az aks list -o table
Name      Location    ResourceGroup    KubernetesVersion    ProvisioningState    Fqdn
--------  ----------  ---------------  -------------------  -------------------  ---------------------------------------------------------
idjaks45  centralus   idjaks45         1.21.9               Succeeded            idjaks45-idjaks45-8defc6-c18d1418.hcp.centralus.azmk8s.io

$ az aks get-credentials -n idjaks45 -g idjaks45 --admin
Merged "idjaks45-admin" as current context in /home/builder/.kube/config

We really do not need to create the namespace first as Helm will do that for us, if we ask.

$ helm install kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98" --create-namespace
W0222 05:48:17.053745   17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:17.081216   17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:18.761426   17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 05:48:18.763110   17740 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: kubecost
LAST DEPLOYED: Tue Feb 22 05:48:16 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:

    kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Next, navigate to http://localhost:9090 in a web browser.

Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install

In AKS it is a pretty easy to just add an external LB to route traffic to the service. Since there exists a service already of the same name, we’ll need to give it a different name.

$ kubectl expose deployment kubecost-cost-analyzer -n kubecost --port=9090 --target-port 9090 --name kubecost-external-svc --type LoadBalancer
service/kubecost-external-svc exposed

With kubectl get svc -n kubecost, we can see the service’s external IP

$ kubectl get svc -n kubecost
NAME                                TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
kubecost-cost-analyzer              ClusterIP      10.0.152.12    <none>         9001/TCP,9003/TCP,9090/TCP   7m7s
kubecost-external-svc               LoadBalancer   10.0.175.247   52.242.225.5   9090:32590/TCP               55s
kubecost-grafana                    ClusterIP      10.0.238.65    <none>         80/TCP                       7m7s
kubecost-kube-state-metrics         ClusterIP      10.0.2.92      <none>         8080/TCP                     7m7s
kubecost-prometheus-node-exporter   ClusterIP      None           <none>         9100/TCP                     7m7s
kubecost-prometheus-server          ClusterIP      10.0.35.183    <none>         80/TCP                       7m7s

This has data almost immediately when we load the page:

/content/images/2022/02/kubecost-10.png

However, we’ll need to wait a few for specifics:

/content/images/2022/02/kubecost-11.png

After a few minutes, we can see some details

/content/images/2022/02/kubecost-12.png

As far as recommendations, it has a few ideas on how to save some money:

/content/images/2022/02/kubecost-13.png

When it comes to “Right Sizing” the cluster, Kubecost suggests I drop down to B1LS for my current load

/content/images/2022/02/kubecost-14.png

Which in the Azure Portal, I can see I’m using the default DS2_v2

/content/images/2022/02/kubecost-15.png

While I appreciate the guidance, I can see from Azure that those nodes just don’t have enough memory to use as a deploy pool:

/content/images/2022/02/kubecost-16.png

The second check was to see if the Nodepool is horizontally scaled properly. Here we see that it has determined that one node could be safely removed.

/content/images/2022/02/kubecost-17.png

If we click the down arrow, we see how it came to that conclusion

/content/images/2022/02/kubecost-18.png

Looking at abandoned workloads, It uses a default duration of 2 days to see what is wasting time and space (interesting that it identifies itself as abandoned)

/content/images/2022/02/kubecost-21.png

Health

Another feature that is less about cost and more about cluster general health is the Health page.

Here we can see most checks pass, but two are marked, namely having a single master and lack of using multiple failure zones.

/content/images/2022/02/kubecost-19.png

What I thought was particularly nice was the detailed explanation for why it is an issue. For instance, clicking on the “Cluster does not have replicated masters” text pops up

/content/images/2022/02/kubecost-20.png

Assets

The asset page breaks down the Node, Disk and other infrastructure that makes up your cluster and shows what costs it has tracked thus far

/content/images/2022/02/kubecost-22.png

Reports and alerts

The reporting page can let us create downloadable reports

/content/images/2022/02/kubecost-26.png

which looks like this in excel

/content/images/2022/02/kubecost-27.png

And the alert page can set up alerts on Slack or Email.

For instance, we could set up a daily alert on costs over all namespaces

/content/images/2022/02/kubecost-28.png

You can checkout the docs for more interesting examples which include configuring alerts using the kubecost helm chart itself.

I do have some concerns about whether the alerts will stick after a pod cycling having looked at the container logs:

$ kubectl logs kubecost-cost-analyzer-865cf54dd9-gvwps -n kubecost cost-model | tail -n 10
I0222 13:32:52.265635       1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:32:52.265807       1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:02.266170       1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:02.266280       1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:12.267084       1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:12.267165       1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:22.267860       1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:22.267928       1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied
I0222 13:33:32.269177       1 log.go:47] [Info] Alert Configs Changed. Writing Updated Config to disk.
I0222 13:33:32.269246       1 log.go:32] [Warning] Failed to write alerts configuration after change: Error writing config alerts/alerts.json. Error message: open /var/configs/alerts/alerts.json: permission denied

To get Azure alerts, we’ll need to setup a rate card for the underlying SP

$ cat azure_rate_card.json
{
    "Name": "KubecostRateCardRole",
    "IsCustom": true,
    "Description": "Rate Card query role",
    "Actions": [
        "Microsoft.Compute/virtualMachines/vmSizes/read",
        "Microsoft.Resources/subscriptions/locations/read",
        "Microsoft.Resources/providers/read",
        "Microsoft.ContainerService/containerServices/read",
        "Microsoft.Commerce/RateCard/read"
    ],
    "AssignableScopes": [
        "/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61"
    ]
}

then apply

$ az role definition create --verbose --role-definition @azure_rate_card.json
{
  "assignableScopes": [
    "/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61"
  ],
  "description": "Rate Card query role",
  "id": "/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61/providers/Microsoft.Authorization/roleDefinitions/cc93d6f0-e186-4c0f-a390-2dd235b37970",
  "name": "cc93d6f0-e186-4c0f-a390-2dd235b37970",
  "permissions": [
    {
      "actions": [
        "Microsoft.Compute/virtualMachines/vmSizes/read",
        "Microsoft.Resources/subscriptions/locations/read",
        "Microsoft.Resources/providers/read",
        "Microsoft.ContainerService/containerServices/read",
        "Microsoft.Commerce/RateCard/read"
      ],
      "dataActions": [],
      "notActions": [],
      "notDataActions": []
    }
  ],
  "roleName": "KubecostRateCardRole",
  "roleType": "CustomRole",
  "type": "Microsoft.Authorization/roleDefinitions"
}
Command ran in 1.707 seconds (init: 0.108, invoke: 1.599)

then apply

$ az ad sp create-for-rbac --name "idjaks45sp.mediwareinformationsystems.onmicrosoft.com" --role "KubecostRateCardRole" --sdk-auth true
Option '--sdk-auth' has been deprecated and will be removed in a future release.
Found an existing application instance of "b2049cf3-082e-47e5-a141-24305989b5a5". We will patch it
Creating 'KubecostRateCardRole' role assignment under scope '/subscriptions/8defc61d-657a-453d-a6ff-cb9f91289a61'
The output includes credentials that you must protect. Be sure that you do not include these credentials in your code or check the credentials into your source control. For more information, see https://aka.ms/azadsp-cli
'name' property in the output is deprecated and will be removed in the future. Use 'appId' instead.
{
  "clientId": "b2049cf3-082e-47e5-a141-24305989b5a5",
  "clientSecret": "XZSLZO-AJ9VU.Or_GVTksG4.3xVfw9INyL",
  "subscriptionId": "8defc61d-657a-453d-a6ff-cb9f91289a61",
  "tenantId": "15d19784-ad58-4a57-a66f-ad1c0f826a45",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

We can save in the UI

/content/images/2022/02/kubecost-31.png

Or via Helm

helm install kubecost ./cost-analyzer -n kubecost --set kubecostProductConfigs.azureSubscriptionID=<> --set kubecostProductConfigs.azureClientID=<> --set kubecostProductConfigs.azureTenantID=<> --set kubecostProductConfigs.azureClientPassword=<> --set .kubecostProductConfigs.createServiceKeySecret=true

e.g.

$ helm upgrade kubecost kubecost/cost-analyzer --namespace kubecost --set kubecostToken="aXNhYWMuam9obnNvbkBnbWFpbC5jb20=xm343yadf98" --set kubecostProductConfigs.azureSubscriptionID="8defc61d-657a-453d-a6ff-cb9f91289a61" --set kubecostProductConfigs.azureClientID="b2049cf3-082e-47e5-a14
1-24305989b5a5" --set kubecostProductConfigs.azureTenantID="15d19784-ad58-4a57-a66f-ad1c0f826a45" --set kubecostProductConfigs.azureClientPassword="XZSLZO-AJ9VU.Or_GVTksG4.3xVfw9INyL" --set .kubecostProductConfigs.createServiceKeySecret=true --create-namespace
W0222 07:44:36.541614   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.571626   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.611000   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.633341   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.656889   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0222 07:44:36.686549   19998 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Release "kubecost" has been upgraded. Happy Helming!
NAME: kubecost
LAST DEPLOYED: Tue Feb 22 07:44:35 2022
NAMESPACE: kubecost
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
--------------------------------------------------Kubecost has been successfully installed. When pods are Ready, you can enable port-forwarding with the following command:

    kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

Next, navigate to http://localhost:9090 in a web browser.

Having installation issues? View our Troubleshooting Guide at http://docs.kubecost.com/troubleshoot-install

I gave it a fair amount of time, but the AKS integration did not seem to work

«««< HEAD

=======

main /content/images/2022/02/kubecost-32.png

Commercial offering

The one cluster monitor is free and is pretty feature rich for what it does. They have more options, however, including a Business and Enterprise tier. They have the extras one would expect including support, multiple clusters and SSO.

/content/images/2022/02/kubecost-29.png

While the pricing page lacks details, the “Upgrade” page (which I assume by the broken image link is undergoing updates) does clue us in:

  • US$200/mo for 50 nodes
  • US$300 for up to 150 nodes
  • US$400 for up to 250 nodes

and they have a “contact us” for 300+

/content/images/2022/02/kubecost-30.png

Summary

Kubecost is a pretty nice start to a suite. Out of the box, I could see teams using it to find abandoned deployments and other waste. I liked the health checks and recommendations. However, I wouldn’t necessarily follow them all. I saw a demo on YouTube where they talked about Kubecost and tested it with K3s on CIVO tweaking the costs for that cloud provider. Since one has controls to tweak the values, I could see even calculating my own local onprem costs based on electricity alone.

kubecost

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes