Published: Jul 15, 2025 by Isaac Johnson

Today we are going to explore setting up a full Grafana OS stack including Alloy for OpenTelemetry (OTLP), Tempo for Traces, Prometheus for Metrics, Loki for Logs and lastly Grafana for reviewing and reporting on all of those.

My goal for all this is to really explore what kind of telemetry data we can pull from the Gemini CLI. Can we monitor usage? Logs? Tokens? What does tracing mean when it comes to a tool like Gemini CLI?

We’ll explore all that and more below. Let’s start with Alloy first, the Grafana OS OpenTelemetry (OTEL) collector.

Grafana Alloy

As we use Kubernetes here, we’ll follow the pattern of setting this all up with Helm charts in our cluster. It should be noted that one can use docker just as well if doing this outside of Kubernetes.

Let’s add the Grafana helm repo and update

$ helm repo add grafana https://grafana.github.io/helm-charts
"grafana" already exists with the same configuration, skipping
$ helm repo update

Grafana

I already have a namespace, so I’ll first just add Grafana there

$ helm install grafana -n grafana grafana/grafana
NAME: grafana
LAST DEPLOYED: Thu Jul 10 19:53:19 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo


2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana.grafana.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:
     export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace grafana port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin
#################################################################################
######   WARNING: Persistence is disabled!!! You will lose your data when   #####
######            the Grafana pod is terminated.                            #####
#################################################################################

Alloy

I’ll upgrade my Alloy with the same values

$ cat grafana.values.yaml
alloy:
  configMap:
    create: false
    key: config.alloy
    name: alloy-config
  extraPorts:
  - name: otelgrpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: otelhttp
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: zipkinhttp
    port: 9411
    protocol: TCP
    targetPort: 9411
$ helm upgrade -n grafana alloy -f grafana.values.yaml grafana/alloy
Release "alloy" has been upgraded. Happy Helming!
NAME: alloy
LAST DEPLOYED: Thu Jul 10 19:57:09 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 3
TEST SUITE: None
NOTES:
Welcome to Grafana Alloy!

I realized my last Alloy deployment was in a bad state so I did a helm delete then changed the values to create the ConfigMap (cm)

$ cat grafana.values.yaml
alloy:
  configMap:
    create: true
    key: config.alloy
    name: alloy-config
  extraPorts:
  - name: otelgrpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: otelhttp
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: zipkinhttp
    port: 9411
    protocol: TCP
    targetPort: 9411

And installed

$ helm install -n grafana alloy -f grafana.values.yaml grafana/alloy
NAME: alloy
LAST DEPLOYED: Thu Jul 10 20:10:05 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Welcome to Grafana Alloy!

This time it worked

$ kubectl get po -n grafana
NAME                       READY   STATUS    RESTARTS   AGE
alloy-2h559                2/2     Running   0          5m2s
alloy-6cqcw                2/2     Running   0          5m3s
alloy-bbqvt                2/2     Running   0          5m2s
alloy-l467k                2/2     Running   0          5m2s
grafana-7b9777bf67-lg7xp   1/1     Running   0          21m

Prometheus

Grafana is used to visualize and alert and Alloy is an OTEL collector but we need somewhere in-between for Metrics and Logs.

For Metrics, we’ll use Prometheus.

I can add the chart and update, if needed

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" already exists with the same configuration, skipping
$ helm repo update

Most people put Prometheus at the root of their cluster (default namespace), but for this, I’m leaving in the Grafana namespace

$ helm install prometheus -n grafana prometheus-community/prometheus
NAME: prometheus
LAST DEPLOYED: Fri Jul 11 06:18:56 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-server.grafana.svc.cluster.local


Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 9093 on the following DNS name from within your cluster:
prometheus-alertmanager.grafana.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=alertmanager,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9093
#################################################################################
######   WARNING: Pod Security Policy has been disabled by default since    #####
######            it deprecated after k8s 1.25+. use                        #####
######            (index .Values "prometheus-node-exporter" "rbac"          #####
###### .          "pspEnabled") with (index .Values                         #####
######            "prometheus-node-exporter" "rbac" "pspAnnotations")       #####
######            in case you still need it.                                #####
#################################################################################


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-prometheus-pushgateway.grafana.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app=prometheus-pushgateway,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

To reach this externally, we can expose a nodeport (though I won’t neccessarily use it with the OTel Collector, but will for other things)

$ kubectl expose service prometheus-server -n grafana --type=NodePort --target-port=9090 --
name=prometheus-server-ext
service/prometheus-server-ext exposed

We can now see Alloy, Grafana and Prometheus ports

$ kubectl get svc -n grafana
NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                AGE
alloy                                 ClusterIP   10.43.154.188   <none>        12345/TCP,4317/TCP,4318/TCP,9411/TCP   10h
grafana                               ClusterIP   10.43.37.230    <none>        80/TCP                                 10h
prometheus-alertmanager               ClusterIP   10.43.44.14     <none>        9093/TCP                               2m27s
prometheus-alertmanager-headless      ClusterIP   None            <none>        9093/TCP                               2m27s
prometheus-kube-state-metrics         ClusterIP   10.43.62.62     <none>        8080/TCP                               2m27s
prometheus-prometheus-node-exporter   ClusterIP   10.43.32.2      <none>        9100/TCP                               2m27s
prometheus-prometheus-pushgateway     ClusterIP   10.43.204.146   <none>        9091/TCP                               2m27s
prometheus-server                     ClusterIP   10.43.112.167   <none>        80/TCP                                 2m27s
prometheus-server-ext                 NodePort    10.43.128.3     <none>        80:31182/TCP                           48s

Loki

Lastly, let’s add Loki for Logs.

Logs can be a big ask so Loki can be installed as a monolith, microservice or scalable setup. This is just about how the subservices of Loki are run (Monolith is all-in-one and good for small setups, scalable is really just a scalable monolith, and microservice breaks each component out into their own independent service).

I’ll use Monolith for my use case. We can use MinIO for storage. If you want a guide on setting that up on a NAS you can click the link to a blog post on that.

That said, by default, when we enable MinIO in the helm values below, it adds it’s own one pod minio instance.

I’ll create a values.yaml - this enables MinIO but does not configure it

loki:
  commonConfig:
    replication_factor: 3
  schemaConfig:
    configs:
      - from: "2024-04-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  pattern_ingester:
      enabled: true
  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
  ruler:
    enable_api: true

minio:
  enabled: true
      
deploymentMode: SingleBinary

singleBinary:
  replicas: 3

# Zero out replica counts of other deployment modes
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0

ingester:
  replicas: 0
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
distributor:
  replicas: 0
compactor:
  replicas: 0
indexGateway:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0

We can now install

$ helm install loki -n grafana grafana/loki -f loki_values.yaml
NAME: loki
LAST DEPLOYED: Fri Jul 11 06:33:43 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:
***********************************************************************
 Welcome to Grafana Loki
 Chart version: 6.31.0
 Chart Name: loki
 Loki version: 3.5.0
***********************************************************************

** Please be patient while the chart is being deployed **

Tip:

  Watch the deployment status using the command: kubectl get pods -w --namespace grafana

If pods are taking too long to schedule make sure pod affinity can be fulfilled in the current cluster.

***********************************************************************
Installed components:
***********************************************************************
* loki

Loki has been deployed as a single binary.
This means a single pod is handling reads and writes. You can scale that pod vertically by adding more CPU and memory resources.


***********************************************************************
Sending logs to Loki
***********************************************************************

Loki has been configured with a gateway (nginx) to support reads and writes from a single component.

You can send logs from inside the cluster using the cluster DNS:

http://loki-gateway.grafana.svc.cluster.local/loki/api/v1/push

You can test to send data from outside the cluster by port-forwarding the gateway to your local machine:

  kubectl port-forward --namespace grafana svc/loki-gateway 3100:80 &

And then using http://127.0.0.1:3100/loki/api/v1/push URL as shown below:

---
curl -H "Content-Type: application/json" -XPOST -s "http://127.0.0.1:3100/loki/api/v1/push"  \
--data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}" \
-H X-Scope-OrgId:foo
---

Then verify that Loki did receive the data using the following command:

---
curl "http://127.0.0.1:3100/loki/api/v1/query_range" --data-urlencode 'query={job="test"}' -H X-Scope-OrgId:foo | jq .data.result
---

***********************************************************************
Connecting Grafana to Loki
***********************************************************************

If Grafana operates within the cluster, you'll set up a new Loki datasource by utilizing the following URL:

http://loki-gateway.grafana.svc.cluster.local/

***********************************************************************
Multi-tenancy
***********************************************************************

Loki is configured with auth enabled (multi-tenancy) and expects tenant headers (`X-Scope-OrgID`) to be set for all API calls.

You must configure Grafana's Loki datasource using the `HTTP Headers` section with the `X-Scope-OrgID` to target a specific tenant.
For each tenant, you can create a different datasource.

The agent of your choice must also be configured to propagate this header.
For example, when using Promtail you can use the `tenant` stage. https://grafana.com/docs/loki/latest/send-data/promtail/stages/tenant/

When not provided with the `X-Scope-OrgID` while auth is enabled, Loki will reject reads and writes with a 404 status code `no org id`.

You can also use a reverse proxy, to automatically add the `X-Scope-OrgID` header as suggested by https://grafana.com/docs/loki/latest/operations/authentication/

For more information, read our documentation about multi-tenancy: https://grafana.com/docs/loki/latest/operations/multi-tenancy/

> When using curl you can pass `X-Scope-OrgId` header using `-H X-Scope-OrgId:foo` option, where foo can be replaced with the tenant of your choice.

Let’s just do a quick test as it suggests

$ curl “http://loki-gateway.grafana.svc.cluster.local/loki/api/v1/query_range” –data-urlencode ‘query={job=”test”}’ -

I’ll do as it suggests. In one window

$ kubectl port-forward --namespace grafana svc/loki-gateway 3100:80
Forwarding from 127.0.0.1:3100 -> 8080
Forwarding from [::1]:3100 -> 8080
Handling connection for 3100

In another

$ curl -H "Content-Type: application/json" -XPOST -s "http://127.0.0.1:3100/loki/api/v1/push"  \
> --data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}" \
> -H X-Scope-OrgId:foo

Then ask for that back

$ curl "http://127.0.0.1:3100/loki/api/v1/query_range" --data-urlencode 'query={job="test"}' -H X-Scope-OrgId:foo | jq .data.result
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2853    0  2825  100    28  18708    185 --:--:-- --:--:-- --:--:-- 18769
[
  {
    "stream": {
      "detected_level": "unknown",
      "job": "test",
      "service_name": "test"
    },
    "values": [
      [
        "1752234158000000000",
        "fizzbuzz"
      ]
    ]
  }
]

Connecting Loki and Prometheus to Grafana

To view our data, we need to first login into Grafana. The admin password can be retrieved from a k8s secret

$ kubectl get secret -n grafana grafana -o jsonpath="{.data.admin-password}" | base64 --dec
ode
AAdfsdfasdASasdsfasASDASdasa

To make it easier, I’ll fire up an ext nodeport for Grafana as I did for Prometheus

$ kubectl expose service grafana -n grafana --type=NodePort --target-port=3000 --name=grafana-ex
service/grafana-ex exposed
$ kubectl get svc grafana-ex -n grafana
NAME         TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
grafana-ex   NodePort   10.43.32.173   <none>        80:30168/TCP   32s

So now I can use port 30168 to get to Grafana without needing to leave out a port-forward command. I can use any node in the cluster with that port to get there

I can set up Prometheus and try and set up Loki using the UI

At this point I’m a bit stumped why Loki, even on node-port, is rejecting connections

$ curl "http://192.168.1.34:32151/loki/api/v1/query_range" --data-urlenco
de 'query={job="test"}' -H X-Scope-OrgId:foo
curl: (7) Failed to connect to 192.168.1.34 port 32151: Connection refused

It would see a service-to-service setup was not going to cut it. I did some debugging on the side and came up with an ext NodePort service that routes to the pods direct

$ cat loki-gateway-ext2
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: gateway
    app.kubernetes.io/instance: loki
  name: loki-gateway-ext2
  namespace: grafana
spec:
  ports:
  - name: http-metrics
    port: 80
    protocol: TCP
    targetPort: http-metrics
  selector:
    app.kubernetes.io/component: gateway
    app.kubernetes.io/instance: loki
    app.kubernetes.io/name: loki
  type: NodePort
$ kubectl apply -f ./loki-gateway-ext2
service/loki-gateway-ext2 created
$ kubectl get svc loki-gateway-ext2 -n grafana
NAME                TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
loki-gateway-ext2   NodePort   10.43.53.51   <none>        80:30392/TCP   14s

This worked in a local test

$ curl "http://192.168.1.34:30392/loki/api/v1/query_range" --data-urlenco
de 'query={job="test"}' -H X-Scope-OrgId:foo
{"status":"success","data":{"resultType":"streams","result":[{"stream":{"detected_level":"unknown","job":"test","service_name":"test"},"values":[["1752234158000000000","fizzbuzz"]]}],"stats":{"summary":{"bytesProcessedPerSecond":509,"linesProcessedPerSecond":31,"totalBytesProcessed":32,"totalLinesProcessed":2,"execTime":0.062778,"queueTime":0.00424,"subqueries":0,"totalEntriesReturned":1,"splits":5,"shards":5,"totalPostFilterLines":2,"totalStructuredMetadataBytesProcessed":16},"querier":{"store":{"totalChunksRef":0,"totalChunksDownloaded":0,"chunksDownloadTime":0,"queryReferencedStructuredMetadata":false,"chunk":{"headChunkBytes":0,"headChunkLines":0,"decompressedBytes":0,"decompressedLines":0,"compressedBytes":0,"totalDuplicates":1,"postFilterLines":0,"headChunkStructuredMetadataBytes":0,"decompressedStructuredMetadataBytes":0},"chunkRefsFetchTime":0,"congestionControlLatency":0,"pipelineWrapperFilteredLines":0}},"ingester":{"totalReached":13,"totalChunksMatched":2,"totalBatches":12,"totalLinesSent":2,"store":{"totalChunksRef":0,"totalChunksDownloaded":0,"chunksDownloadTime":0,"queryReferencedStructuredMetadata":false,"chunk":{"headChunkBytes":32,"headChunkLines":2,"decompressedBytes":0,"decompressedLines":0,"compressedBytes":0,"totalDuplicates":0,"postFilterLines":2,"headChunkStructuredMetadataBytes":16,"decompressedStructuredMetadataBytes":0},"chunkRefsFetchTime":2643929,"congestionControlLatency":0,"pipelineWrapperFilteredLines":0}},"cache":{"chunk":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"index":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"result":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"statsResult":{"entriesFound":3,"entriesRequested":3,"entriesStored":1,"bytesReceived":522,"bytesSent":0,"requests":4,"downloadTime":3933691,"queryLengthServed":1382000000000},"volumeResult":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"seriesResult":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"labelResult":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0},"instantMetricResult":{"entriesFound":0,"entriesRequested":0,"entriesStored":0,"bytesReceived":0,"bytesSent":0,"requests":0,"downloadTime":0,"queryLengthServed":0}},"index":{"totalChunks":0,"postFilterChunks":0,"shardsDuration":0,"usedBloomFilters":false}}}}

To add this in Grafana, be aware we have to add an org as well (just as we did with that test).

My goal is to sort this out with Loki and Prometheus. Let’s use Gemini CLI for this

It did as I suggested and looked up the local k8s services

It had figured out Loki, but had a placeholder for metrics. So I asked it to try a bit harder

It needed me to be a bit more explicit on the ask

I was surprised how thrown it got by that last ask… it kept chugging for a while

I finally stopped it after 3.5 minutes at that last step

This actually looks rough

$ cat alloy.config
logging {
  level  = "info"
  format = "logfmt"
}

// Loki configuration: Collect logs from Kubernetes pods and send them to the local Loki instance.
loki.source.kubernetes "pods" {
  forward_to = [loki.write.default.receiver]
}

loki.write "default" {
  endpoint {
    url = "http://loki.grafana.svc.cluster.local:3100/loki/api/v1/push"
  }
}

// Prometheus configuration: Scrape metrics from the local Prometheus server instance.
prometheus.scrape "default" {
  targets = [
    {"__address__" = "prometheus-server.grafana.svc.cluster.local:80", "job" = "prometheus"},
  ]
  forward_to = [prometheus.remote_write.default.receiver]
}

// Define where to send the scraped Prometheus metrics.
// TODO: Replace the placeholder URL with your actual metrics endpoint.
prometheus.remote_write "default" {
  endpoint {
    url = "http://prometheus-server.grafana.svc.cluster.local:80/api/v1/write"
  }
}

I’m going to pitch something similar to Claude Code

using “kubectl get svc -n grafana” to find the loki and prometheus services, please suggest a reasonable grafana alloy configuration │ │ (alloy.config) that would listen to OTLP and send metrics to Prometheus and logs to the Loki service. My goal is to listen to OTLP for metrics │ │ logs and traces and send metrics and logs through to prometheus and loki to view in Grafana.

Like Gemini CLI, Claude asks for some permissions to run kubectl commands

Initially it wanted to merge in new configs

But I told it that the local file is garbage and to overwrite it

it wrapped up

and all in, cost less than a quarter which isn’t too bad

Other than the fact it invented a Jaeger collector endpoint for traces (I’ll later use Zipkin), it did a good job

$ cat alloy.config
logging {
  level  = "info"
  format = "logfmt"
}

// OTLP receiver for metrics, logs, and traces
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }
  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch processor for performance optimization
otelcol.processor.batch "default" {
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
    logs    = [otelcol.exporter.loki.default.input]
    traces  = [otelcol.exporter.otlp.traces.input]
  }
}

// Prometheus exporter for metrics
otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.prometheus.receiver]
}

// Loki exporter for logs
otelcol.exporter.loki "default" {
  forward_to = [loki.write.loki.receiver]
}

// OTLP exporter for traces (configure endpoint as needed)
otelcol.exporter.otlp "traces" {
  client {
    endpoint = "http://jaeger-collector:14317"
    tls {
      insecure = true
    }
  }
}

// Prometheus remote write configuration
prometheus.remote_write "prometheus" {
  endpoint {
    url = "http://prometheus-server.grafana.svc.cluster.local/api/v1/write"
  }
}

// Loki write configuration
loki.write "loki" {
  endpoint {
    url = "http://loki-gateway.grafana.svc.cluster.local/loki/api/v1/push"
  }

Zipkin

Note to reader: I later decide to just use Tempo and Grafana as the Alloy collector does NOT have a zipkin exporter

In fact, let’s sort that out now:

I’ll fire off a quick deployment of zipkin

$ kubectl create deployment zipkin --image openzipkin/zipkin -n grafana
deployment.apps/zipkin created

Then expose it on a nodeport

$ kubectl expose deployment zipkin -n grafana --type=NodePort --target-port=9411 --port=9411 --name zipki
n-svc-ext
service/zipkin-svc-ext exposed
$ kubectl get svc -n grafana zipkin-svc-ext
NAME             TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
zipkin-svc-ext   NodePort   10.43.91.230   <none>        9411:31568/TCP   21s

I’ll be real exacting but give Gemini a chance to fix up the file for jaeger to zipkin traces

Oh good, it caught both the exporter type and output block which I might have missed:

Now my file looks like this:

$ cat alloy.config
logging {
  level  = "info"
  format = "logfmt"
}

// OTLP receiver for metrics, logs, and traces
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }
  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch processor for performance optimization
otelcol.processor.batch "default" {
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
    logs    = [otelcol.exporter.loki.default.input]
    traces  = [otelcol.exporter.zipkin.default.input]
  }
}

// Prometheus exporter for metrics
otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.prometheus.receiver]
}

// Loki exporter for logs
otelcol.exporter.loki "default" {
  forward_to = [loki.write.loki.receiver]
}

// Zipkin exporter for traces
otelcol.exporter.zipkin "default" {
  endpoint = "http://192.168.1.33:31568/api/v2/spans"
}

// Prometheus remote write configuration
prometheus.remote_write "prometheus" {
  endpoint {
    url = "http://prometheus-server.grafana.svc.cluster.local/api/v1/write"
  }
}

// Loki write configuration
loki.write "loki" {
  endpoint {
    url = "http://loki-gateway.grafana.svc.cluster.local/loki/api/v1/push"
  }
}

It’s minor, but before i go whacking the CM and rotating pods, I need to ensure Alloy doesn’t come back in and replace it later.

This means setting the “create” to “false” in the helm values

$ cat ../jekyll-blog/grafana.values.yaml
alloy:
  configMap:
    create: false
    key: config.alloy
    name: alloy-config
  extraPorts:
  - name: otelgrpc
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: otelhttp
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: zipkinhttp
    port: 9411
    protocol: TCP
    targetPort: 9411
$ helm upgrade -n grafana alloy -f ../jekyll-blog/grafana.values.yaml grafana/alloy
Release "alloy" has been upgraded. Happy Helming!
NAME: alloy
LAST DEPLOYED: Fri Jul 11 08:00:21 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
Welcome to Grafana Alloy!

I can then create the alloy config from the file

$ kubectl create configmap -n grafana alloy-config "--from-file=config.alloy=./alloy.config"
configmap/alloy-config created

And see it created

$ kubectl get cm -n grafana
NAME                      DATA   AGE
alloy-config              1      8s
grafana                   1      39h
kube-root-ca.crt          1      334d
loki                      1      29h
loki-gateway              1      29h
loki-minio                5      29h
loki-runtime              1      29h
prometheus-alertmanager   1      29h
prometheus-server         6      29h

I rotated the pods

builder@DESKTOP-QADGF36:~/Workspaces/alloy-setup$ kubectl get po -n grafana -l app.kubernetes.io/instance=alloy
NAME          READY   STATUS    RESTARTS   AGE
alloy-2h559   2/2     Running   0          39h
alloy-6cqcw   2/2     Running   0          39h
alloy-bbqvt   2/2     Running   0          39h
alloy-l467k   2/2     Running   0          39h
builder@DESKTOP-QADGF36:~/Workspaces/alloy-setup$ kubectl delete po -n grafana -l app.kubernetes.io/instance=alloy && sleep 10 && kubectl get po -n grafana -l app.kubernetes.io/instance=alloy
pod "alloy-2h559" deleted
pod "alloy-6cqcw" deleted
pod "alloy-bbqvt" deleted
pod "alloy-l467k" deleted
NAME          READY   STATUS             RESTARTS      AGE
alloy-cqfmv   1/2     CrashLoopBackOff   1 (8s ago)    13s
alloy-kdcft   1/2     CrashLoopBackOff   1 (5s ago)    12s
alloy-m6tqr   1/2     CrashLoopBackOff   1 (11s ago)   14s
alloy-qr5js   1/2     CrashLoopBackOff   1 (11s ago)   14s

But saw there was an error in the logs

$ kubectl logs alloy-cqfmv -n grafana

Error: /etc/alloy/config.alloy:41:1: cannot find the definition of component name "otelcol.exporter.zipkin"

40 | // Zipkin exporter for traces
41 | otelcol.exporter.zipkin "default" {
   | ^^^^^^^^^^^^^^^^^^^^^^^
42 |   endpoint = "http://192.168.1.33:31568/api/v2/spans"

Error: /etc/alloy/config.alloy:26:16: component "otelcol.exporter.zipkin.default.input" does not exist or is out of scope

25 |     logs    = [otelcol.exporter.loki.default.input]
26 |     traces  = [otelcol.exporter.zipkin.default.input]
   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
27 |   }
interrupt received
Error: could not perform the initial load successfully

It would seem that out of the box, Zipkin does not consume OTLP (though there are add ons) so we cannot use the otelcol.exporter.otlp. Scrolling the list of exporters in Grafana Docs, at present, there are no exporters for Zipkin (the format)

Tempo

I think that’s so you use one of their newer OS projects, Tempo.

$ helm install tempo -n grafana grafana/tempo
NAME: tempo
LAST DEPLOYED: Sat Jul 12 12:03:03 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
TEST SUITE: None

OOTB, that should support OTLP:

  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: 0.0.0.0:14250
        thrift_binary:
          endpoint: 0.0.0.0:6832
        thrift_compact:
          endpoint: 0.0.0.0:6831
        thrift_http:
          endpoint: 0.0.0.0:14268
    opencensus: null
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

I can fix and rotate the pods

$ kubectl delete cm alloy-config -n grafana && kubectl create configmap -n grafana alloy-config "--from-file=config.alloy=./alloy.config" && kubectl get po -n grafana -l app.kubernetes.io/instance=alloy && kubectl delete po -n grafana -l app.kubernetes.io/instance=alloy && sleep 10 && kubectl get po -n grafana -l app.kubernetes.io/instance=alloy
configmap "alloy-config" deleted
configmap/alloy-config created
NAME          READY   STATUS             RESTARTS        AGE
alloy-cqfmv   1/2     CrashLoopBackOff   8 (4m28s ago)   20m
alloy-kdcft   1/2     CrashLoopBackOff   8 (4m6s ago)    20m
alloy-m6tqr   1/2     CrashLoopBackOff   8 (4m29s ago)   20m
alloy-qr5js   1/2     CrashLoopBackOff   8 (4m16s ago)   20m
pod "alloy-cqfmv" deleted
pod "alloy-kdcft" deleted
pod "alloy-m6tqr" deleted
pod "alloy-qr5js" deleted
NAME          READY   STATUS    RESTARTS   AGE
alloy-9mrxk   1/2     Running   0          12s
alloy-gktqn   1/2     Running   0          12s
alloy-tw728   1/2     Running   0          12s
alloy-vgjf7   1/2     Running   0          11s

I want a reachable port for Gemini to use outside the cluster so I’m going to try and create a NodePort service for that which is ClusterIP

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: networking
    app.kubernetes.io/instance: alloy
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: alloy
    app.kubernetes.io/part-of: alloy
    app.kubernetes.io/version: v1.9.2
    helm.sh/chart: alloy-1.1.2
  name: alloy-svc-nodeport
  namespace: grafana
spec:
  type: NodePort
  internalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - name: http-metrics
      port: 12345
      protocol: TCP
      targetPort: 12345
    - name: otelgrpc
      port: 4317
      protocol: TCP
      targetPort: 4317
    - name: otelhttp
      port: 4318
      protocol: TCP
      targetPort: 4318
    - name: zipkinhttp
      port: 9411
      protocol: TCP
      targetPort: 9411
  selector:
    app.kubernetes.io/instance: alloy
    app.kubernetes.io/name: alloy
  sessionAffinity: None

I didn’t set the ports, expecting K3s to determine and provide them for me after I apply

$ kubectl apply -f ./alloy-svc-nodeport.yaml -n grafana
service/alloy-svc-nodeport created
$ kubectl get svc alloy-svc-nodeport -n grafana
NAME                 TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                        AGE
alloy-svc-nodeport   NodePort   10.43.215.63   <none>        12345:30367/TCP,4317:30921/TCP,4318:31235/TCP,9411:30093/TCP   65s

Gemini to Alloy

While we can create a markdown file (GEMINI.md) in our local remote or a top level ~/.gemini folder, we can also handle some settings in a settings.json file

builder@DESKTOP-QADGF36:~/Workspaces/pybsposter/.gemini$ ls
GEMINI.md
builder@DESKTOP-QADGF36:~/Workspaces/pybsposter/.gemini$ vi ~/.gemini/
google_account_id  installation_id    oauth_creds.json   settings.json      tmp/

Initially, all that is there is our theme and auth type. Now I’m going to add telemetry as well

$ cat ~/.gemini/settings.json
{
  "theme": "Shades Of Purple",
  "selectedAuthType": "oauth-personal",
  "telemetry": {
    "enabled": true,
    "target": "local",
    "otlpEndpoint": "http://192.168.1.33:30921",
    "logPrompts": true
  }
}

One other quick change I did was add “GOOGLE_CLOUD_PROJECT=myanthosproject2” to ~/.env so it would remember it between sessions.

Now I can fire up Gemini and make an ask

It proposed some fixes

Again, I just love making Gemini a surly angry developer - I asked for this in my Gemini.md, mind you

I see no metrics nor logs in Grafana so I checked Alloy logs.

Debugging configuration issues

The first issue i see relates to Prometheus

$ kubectl logs alloy-vgjf7 -n grafana | tail -n 10
ts=2025-07-12T17:14:53.125345325Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=0e3dd0b6a561d7c554b39ebb13cfdc15 node_id=otelcol.exporter.otlphttp.tempo duration=43.161µs
ts=2025-07-12T17:14:53.125530754Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=0e3dd0b6a561d7c554b39ebb13cfdc15 node_id=otelcol.processor.batch.default duration=156.905µs
ts=2025-07-12T17:14:53.12565605Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=0e3dd0b6a561d7c554b39ebb13cfdc15 node_id=otelcol.receiver.otlp.default duration=90.33µs
ts=2025-07-12T17:14:53.125704882Z level=info msg="finished node evaluation" controller_path=/ controller_id="" trace_id=0e3dd0b6a561d7c554b39ebb13cfdc15 node_id=tracing duration=20.037µs
ts=2025-07-12T17:14:53.125728566Z level=info msg="finished complete graph evaluation" controller_path=/ controller_id="" trace_id=0e3dd0b6a561d7c554b39ebb13cfdc15 duration=2.747739ms
ts=2025-07-12T17:14:53.125784542Z level=info msg="config reloaded" service=http
ts=2025-07-12T17:14:53.125968878Z level=info msg="scheduling loaded components and services"
ts=2025-07-12T18:11:43.013625508Z level=info msg="Done replaying WAL" component_path=/ component_id=prometheus.remote_write.prometheus subcomponent=rw remote_name=6cd6b9 url=http://prometheus-server.grafana.svc.cluster.local/api/v1/write duration=59m50.036241866s
ts=2025-07-12T18:11:43.641318664Z level=error msg="non-recoverable error" component_path=/ component_id=prometheus.remote_write.prometheus subcomponent=rw remote_name=6cd6b9 url=http://prometheus-server.grafana.svc.cluster.local/api/v1/write failedSampleCount=2 failedHistogramCount=0 failedExemplarCount=0 err="server returned HTTP status 404 Not Found: remote write receiver needs to be enabled with --web.enable-remote-write-receiver\n"
ts=2025-07-12T18:11:53.643882427Z level=error msg="non-recoverable error" component_path=/ component_id=prometheus.remote_write.prometheus subcomponent=rw remote_name=6cd6b9 url=http://prometheus-server.grafana.svc.cluster.local/api/v1/write failedSampleCount=20 failedHistogramCount=0 failedExemplarCount=0 err="server returned HTTP status 404 Not Found: remote write receiver needs to be enabled with --web.enable-remote-write-receiver\n"

Another issue comes from logs going to Loki and not having the Org ID

$ kubectl logs alloy-tw728  -n grafana | tail -n 10
ts=2025-07-12T18:17:31.561629028Z level=error msg="final error sending batch" component_path=/ component_id=loki.write.loki component=client host=loki-gateway.grafana.svc.cluster.local status=401 tenant="" error="server returned HTTP status 401 Unauthorized (401): no org id"
ts=2025-07-12T18:17:52.066427905Z level=error msg="final error sending batch" component_path=/ component_id=loki.write.loki component=client host=loki-gateway.grafana.svc.cluster.local status=401 tenant="" error="server returned HTTP status 401 Unauthorized (401): no org id"

The first can be done with values.

While Sonnet suggested

prometheus:
  server:
    extraFlags:
      - web.enable-remote-write-receiver

Reviewing the helm values (with --all), at least the community edition, just uses the “server”

I set some values and upgraded my Prometheus (keeping the existing enable-lifecycle i saw there by default)

$ cat ./values.yaml 
server:
  extraFlags:
  - web.enable-lifecycle
  - web.enable-remote-write-receiver
$ helm upgrade prometheus -n grafana -f ./values.yaml prometheus-community/prometheus
Release "prometheus" has been upgraded. Happy Helming!
NAME: prometheus
LAST DEPLOYED: Sat Jul 12 13:35:06 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-server.grafana.svc.cluster.local


Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 9093 on the following DNS name from within your cluster:
prometheus-alertmanager.grafana.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=alertmanager,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9093
#################################################################################
######   WARNING: Pod Security Policy has been disabled by default since    #####
######            it deprecated after k8s 1.25+. use                        #####
######            (index .Values "prometheus-node-exporter" "rbac"          #####
###### .          "pspEnabled") with (index .Values                         #####
######            "prometheus-node-exporter" "rbac" "pspAnnotations")       #####
######            in case you still need it.                                #####
#################################################################################


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-prometheus-pushgateway.grafana.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace grafana -l "app=prometheus-pushgateway,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace grafana port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

The other fix was to add a header to the loki write block in the alloy.config

// Loki write configuration
loki.write "loki" {
  endpoint {
    url = "http://loki-gateway.grafana.svc.cluster.local/loki/api/v1/push"
    headers = {
			"X-Scope-OrgID" = "foo",
		}
  }
}

I then updated the config and rotated the Alloy pods to pull it in

builder@DESKTOP-QADGF36:~/Workspaces/alloy-setup$ kubectl delete cm alloy-config -n grafana && kubectl create configmap -n grafana alloy-config "--from-file=config.alloy=./alloy.config" && kubectl get po -n grafana -l app.kubernetes.io/instance=alloy && kubectl delete po -n grafana -l app.kubernetes.io/instance=alloy && sleep 10 && kubectl get po -n grafana -l app.kubernetes.io/instance=alloy
configmap "alloy-config" deleted
configmap/alloy-config created
NAME          READY   STATUS    RESTARTS   AGE
alloy-9mrxk   2/2     Running   0          85m
alloy-gktqn   2/2     Running   0          85m
alloy-tw728   2/2     Running   0          85m
alloy-vgjf7   2/2     Running   0          85m
pod "alloy-9mrxk" deleted
pod "alloy-gktqn" deleted
pod "alloy-tw728" deleted
pod "alloy-vgjf7" deleted
NAME          READY   STATUS    RESTARTS   AGE
alloy-6wknt   1/2     Running   0          11s
alloy-dpgx9   1/2     Running   0          12s
alloy-gsmng   2/2     Running   0          12s
alloy-sjz8b   2/2     Running   0          12s

I’ll ask it to tweak the README.md again

Now that it’s finished

As I make updates, I keep using Gemini to test.

Here I asked it to render out the diagrams

Open Telemetry data in Grafana

Now that we have Gemini sending telemetry data to Alloy which sends metrics to Prometheus, Logs to Loki and hopefully traces to Tempo, we should be able to view the Metrics from Prometheus in Grafana itself.

For instance, using the query gemini_cli_token_usage_total{model="gemini-2.5-pro"} we can view our token usage for gemini-2.5-pro (versus flash) for the last hour

Or, perhaps I care to track how many of my daily 1000 requests I’ve used

Dashboards in Grafana

Let’s create a new Dashboard first

I’m just going to skip adding things here and save it

I’ll call it “Gemini Usage”

Back on the metrics explorer, I can add to a dashboard and select this existing one from the list

I’m going to tweak the graph a bit to sum by type so we pull all sessions together

Loki (Logs) data

We can see Info and Errors by volume via our Loki data source

Or log lines returned

I think this is a pretty good dashboard now for viewing my Gemini Usage, at least when local

If I wanted to, i would just need to punch a hole in my firewall to allow traffic to flow in and reach the 192.168.1.33 node (or any in the cluster) on port 30921 and I could keep fetching OTLP data when I’m remote

$ cat ~/.gemini/settings.json
{
  "theme": "Shades Of Purple",
  "selectedAuthType": "oauth-personal",
  "telemetry": {
    "enabled": true,
    "target": "local",
    "otlpEndpoint": "http://192.168.1.33:30921",
    "logPrompts": true
  }
}

Tempo

I added the tempo http listen port

$ helm get values --all tempo -n grafana | grep listen
    http_listen_port: 3200

I could view a trace with some poking. I didnt see much by way of details so I didn’t add it to my dashboard

From my eyes, it seemed like just a simple OAuth flow

Summary

Let’s review the setup here.

First, we setup the Open Telemetry collector with a config that could send metrics to Prometheus, traces to Tempo, logs to Loki and then bring them all together in Grafana for reporting and visualization.

In the end we set it up so we could collect data and view it all in a nice Grafana dashboard

Next Steps

If we wished to setup SMTP to use something like Resend or Sendgrid, we would update the email settings in General settings

Then define some alerting rules which could trigger off our metrics