Published: Mar 26, 2026 by Isaac Johnson
I came across a YT video extolling the virtues of Pi CLI. While yes, it is another CLI, it has a very light footprint and just enough skills (like command line access) to do things.
We’ll test it on a few different hosts and models; both cloud and local. After some basic tests (involving good old Commander Keen), we’ll use Pi + Gemini to build a Pomodoro app.
Installing Pi
Like many other tools we can install npm
(base) builder@LuiGi:~/Workspaces/piagent$ npm install -g @mariozechner/pi-coding-agent
npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead
added 257 packages in 21s
34 packages are looking for funding
run `npm fund` for details
(base) builder@LuiGi:~/Workspaces/piagent$
Now I just need to fire pi to launch
I gave a quick test to ask for Commander Keen as ASCII art but it just gave me a penguin afaik
It claims to have spent 12.5c on that
~/Workspaces/piagent
↑77k ↓1.6k R37k $0.124 0.8%/1.0M (auto) gemini-2.5-pro • medium
If I search models, we only see Google ones. This is because it noticed I only had a GEMINI API key set in my env vars so it just assumed I would use Google
Next, I tried setting an OPENAI KEY and BASE to match my GPT 5 Nano deployment in Azure AI Foundry.
However, as we can see, it doesn’t respect the env var for OPENAI_API_BASE to a cognativeservices URL.
However, once I switched to a proper OPENAI API Key, then it worked just fine
We can see it picked up my existing agent skills
Revisiting the providers docs, I think I need to use their format for the Azure OpenAI instances
Let’s set those
export AZURE_OPENAI_API_KEY=BgxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxiC
export AZURE_OPENAI_BASE_URL=https://isaac-mgp1gfv5-eastus2.cognitiveservices.azure.com
export AZURE_OPENAI_DEPLOYMENT=gpt-5-nano
export AZURE_OPENAI_API_VERSION=2024-12-01-preview
Though trying a few permutations for resource name, nothing seemed to work on GPT 5 via Azure AI Foundry
But we can see it works fine with Gemini Flash
Let’s compare to Gemini CLI
I thought it was interesting that this used 17k tokens to make a cow
Whereas Pi used just over 7k. Just to be sure of this, I tried again
and indeed it was about 4.5k used.
~/Workspaces/piagent
↑4.5k ↓63 R2.0k $0.003 0.2%/1.0M (auto) (google) gemini-3-flash-preview • medium
Custom Models and Ollama
Right now we have no custom models
cat: /home/builder/.pi/agent/models.json: No such file or directory (os error 2)
example Ollama
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "llama3.1:8b" },
{ "id": "qwen2.5-coder:7b" }
]
}
}
}
So, when in my home network, I could use
(base) builder@LuiGi:~/Workspaces$ cd piagent/
(base) builder@LuiGi:~/Workspaces/piagent$ ls
commander_keen.png cow.txt 'search?q=commander+keen&tbm=isch'
commander_keen.txt cow_ascii.txt search_results.html
(base) builder@LuiGi:~/Workspaces/piagent$ cat ~/.pi/agent/
auth.json bin/ models.json sessions/ settings.json skills/
(base) builder@LuiGi:~/Workspaces/piagent$ cat ~/.pi/agent/models.json
{
"providers": {
"ollama": {
"baseUrl": "http://192.168.1.143:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "gemma3:4b" },
{ "id": "qwen3:8b" },
{ "id": "qwen2.5-coder:1.5b" },
{ "id": "llama3.1:8b" },
{ "id": "llama3.2:3b" },
{ "id": "deepseek-r1:7b" }
]
}
}
}
I tried asking gemma3 but it didn’t support tools. I then asked qwen3:8b and it just went out to lunch
However, qwen2.5-coder worked
I did find llama3.1:8b, while slow, did a pretty decent job
Those were using an Ollama on a dedicated host in network. I was curious how well a laptop with no real video card would handle an 8b
(base) builder@LuiGi:~/Workspaces/piagent$ ollama pull llama3.1:8b
pulling manifest
pulling 667b0c1932bc: 100% ▕█████████████████████████████████████████████████████▏ 4.9 GB
pulling 948af2743fc7: 100% ▕█████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4: 100% ▕█████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5: 100% ▕█████████████████████████████████████████████████████▏ 96 B
pulling 455f34728c9b: 100% ▕█████████████████████████████████████████████████████▏ 487 B
verifying sha256 digest
writing manifest
success
Then I switched up my models to use localhost
(base) builder@LuiGi:~/Workspaces/piagent$ vi ~/.pi/agent/models.json
(base) builder@LuiGi:~/Workspaces/piagent$ cat ~/.pi/agent/models.json
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "llama3.1:8b" }
]
}
}
}
It ran, and after about 10 minutes did display something
I tried a simple python app request
Again, on the LG Gram laptop it was slow, but it did work.
I’ll try on my laptop with a proper GPU
builder@builder-Lenny16:~/Workspaces/pitest$ cat ~/.pi/agent/models.json
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "mistral-nemo:12b-instruct-2407-q4_K_M" },
{ "id": "qwen2.5-coder:14b" },
{ "id": "gemma3:12b" },
{ "id": "deepseek-r1:14b" },
{ "id": "qwen3:14b" }
}
}
}
I did record it building an app, but it really failed at writing files the first time. Then got hung up on a python library that doesn’t exist.
Here you can see it with a local 14b model improve the UI.
note: it did leak the API key in the video above so I expired it right after
Mixed use
I had the idea of mixing things up. What if I built out the basic app with local models and Pi, then pivoted to Gemini CLI with Stitch to make it ‘pretty’.
I think that would be the most optimized on token use.
I stewed a bit on an idea before coming up with one.
The used to be an Adobe Air app, pomodorio that was this very clean minimalist UI that was reminiscent of WinAmp. I found it very useful. Too many pomodoro apps over complicate things or take the full screen. I need small.
To get this done though, I have a bit of housecleaning. I need to stash my skills, now that I’m actually using them in my regular flow.
I’ll create a private repo in my own Git system (Forgejo)
Part of my flow is to let an admin create repos and then invite my lower privileged user to collaborate
I’m going to try and do this in a plan to plan to do approach.
So that means making an initial plan:
$ cat plan.md
---
name: implementation-planner
description: Creates detailed implementation plans for new features.
prompt: You are an expert software planner. Your task is to generate a comprehensive plan in markdown format.
tools:
- name: FileSystem
actions: [read, search]
---
# Plan Request
This app should create a 25 minute timer with 5 minute rest period in the form of the Pomodoro technique.
Required features:
- light/dark mode
- settings with
- changes to default work/rest times
- ability to set notification sound (or disable)
- size (scale with font and UI from 50% to 200% size).
This should build self contained with a dockerfile.
This app should allow download and upload of planned tasks and track "pom"s on each task completed. we can mark tasks closed. The input and output should be in a JSON format.
# required outputs
Dockerfile for the app
Helm chart to install the app with
- deployment
- service
- option ingress with annotations
## UI
The UI should be simple and angular based on the original WinAmp style (https://en.wikipedia.org/wiki/Winamp)
We should have a few options for color themes
I fired it up and it went really quite fast
I saw it made some form of an app, tests and dockerfile, but no evidence of a helm chart
It did more on a second pass, but still left some things out
I spent another 5 minutes in Gemini CLI letting it do cleanup and work which didn’t use that many tokens
I continued to test and make minor improvements with Gemini - adding a proper whip sound then fixing the helm chart
Launching in Kubernetes
Now that I have an app, I want to actually test it in my hosted environment.
I need a quick A record first
$ az account set --subscription "Pay-As-You-Go" && az network dns record-set a add-record -g idjdnsrg -z tpk.pw -a 76.156.69.232 -n pomo
{
"ARecords": [
{
"ipv4Address": "76.156.69.232"
}
],
"TTL": 3600,
"etag": "1aec837a-3d39-47c8-929d-756b65cbedaa",
"fqdn": "pomo.tpk.pw.",
"id": "/subscriptions/d955c0ba-13dc-44cf-a29a-8fed74cbb22d/resourceGroups/idjdnsrg/providers/Microsoft.Network/dnszones/tpk.pw/A/pomo",
"name": "pomo",
"provisioningState": "Succeeded",
"resourceGroup": "idjdnsrg",
"targetResource": {},
"trafficManagementProfile": {},
"type": "Microsoft.Network/dnszones/A"
}
I pushed my container to Dockerhub
and after a quick helm deploy, I had a pretty good functional app
You are welcome to use the app as it’s now hosted at pomo.tpk.pw.
Also, because caring is sharing, I put all the code into Github: https://github.com/idjohnson/pomoApp.
Summary
Thus far Pi looks like a pretty good local tool. I get annoyed that it sometimes fails to write files, but seems to do fine after the files are initially laid down.
I found it’s pretty performant on decent hardware. Yes, that is like saying sports cars go fast, but it’s worth noting that on my Lenovo Legion with a 12gb 5070 it runs smooth, but on the LG Gram with just CPU its a good 10+ minute wait.
This isn’t a deal breaker, however, there are times I’m just happy to ask the LLM to do some work and then set the laptop down and watch a show or get back to other work.
I want to explore pushing the limits of tools next. As we know, the downside to local models is they get out of date and have fixed knowledge. Making sure they can reach out to the interwebs and fetch latest data is key to really making them useful.
However, I plan to definitely keep Pi in my stack as writing tools or coming up with ideas when I’m either offline or in low internet areas is quite useful.


























