Published: Aug 27, 2024 by Isaac Johnson
At my recent DevDays conference I worked through Sisense, a commercial BI tool. While it is quite powerful, it is also quite expensive.
Looking for a product that might have a free tier and is Open-Source, I came across Knime. It has a simple free hosted and downloadable instance (Community Hub)
Today we’ll set it up and give it a go to see if Knime can solve some of our basic BI tooling needs.
Setup
To start, I’ll download the Windows binary.
I have to go through a few settings - the key one being memory which defaults to wanting half of all my hosts memory during running (i dropped that down a bit). Then I can install
When done, I can launch it
We are going to now focus on the newest UI which is a bit different from the walk-through examples you might find on knime.com.
We can go to “Local space” and choose the big yellow “+” to create a new workflow
I’ll give it a name, like “MyFirstProject” and click create
I’m going to start by grabbing a file reader and dragging it into my workflow
I’ll then add a JSON to table filter and link the “File Reader” by dragging an arrow over
I’ll then do a table writer
Choosing to save it in My Documents
I can click Apply and OK to save (which will use C:\Users\isaac\OneDrive\Documents\MyOutput.table
).
I’ll do similar with the “File Reader” box telling it to pull from “MyInput.json”
Which reads from C:\Users\isaac\OneDrive\Documents\MyInput.json
Testing
I want some basic JSON data so I’ll pull the JSON output of my primary cluster nodes into a file
builder@DESKTOP-QADGF36:/mnt/c/Program Files$ kubectl get nodes -o json > /mnt/c/Users/isaac/OneDrive/Documents/MyInput.json
I can now use F7
or click “Execute all” to run the flow
I see it’s having some issues so I’ll save the current workflow before I fix
The problem is “File Reader” really expects a CSV or similar formatted data file.
We need JSON reader instead
It’s quite easy to swap that out, let’s do that together:
The “table” output is a binary file. For me, notepad took it to be Chinese (i think)
Reading in a terminal suggests something might be ‘gspec.xml’
builder@DESKTOP-QADGF36:/mnt/c/Program Files$ cat /mnt/c/Users/isaac/OneDrive/Documents/MyOutput.table | head -n 10
gspec.xml�}o�ص��?���(�9-e���g�E&���y��I�)
��l������ܢ��nJ�H9��=.��/ [Ң���!�\�����b�wi�j2�}������h^Lf��~���� �������h>N��gշߜ���`�����|6�0�yy�/�0���?�|�t��o�?�XM�<����y�љ�ȃɬ�����gU��j���Q^�6��*�u>���'�N�l�*��;7����8N��}���?�쟫u���������C̬.?��|�d�_��?-̷�|���6�|���_�q������x���bhʓ�|���Uׯ2���/���ݧ����OV�����B���b�a�+ܩתi��¾�굪��u��?�����.l��_o
6?L�z��?2���h�Wծ��$��e4/͠��|`m�w��=��w��b~�Of�
One issue I saw was the JSON Reader was just seeing one object (and I know I have 4 nodes). To solve this, we can pull out the ‘.items[*]’ using the JSON path $.items[*]
That looks much better when I click the run action
I can now see 4 rows in the Table transform
I’m going to add a “Table view” node so we can see the table data
I could also do something like kick it to a chart, such as a bar chart.
Here I show hosts and CPU counts (one host has 8 CPUs and 3 have 4 CPUs)
Google Sheets
Let’s add a Workflow that will use data stored in Google Sheets.
Here I’ll setup a Google Authenticator block which ties to a connector and then to the Sheets picker.
In my case, I did the interactive login and gave it “Google Sheets” permissions
I can now pick one of my Google Sheets to do work against
I could then feed a sheet, for instance, into a statistics view
Community Hub
While they don’t have a hosted workflow engine, we can get “Community Hub” to store and share our workflows
We can click “Sign up for free” from the Pricing page
As I signed up via Google, all that is really asked is a Picture for my account
I can now create and store workflows in the Community Hub
For instance, if I wanted to share out my first example, I could right click in the UI and chose “Export” on the “MyFirstProject” flow
I’ll store it in the root of my user dir
Back in Public, I’ll import the workflow
Pick the file
Now it’s in a shared space
For which anyone can search up and use
In the free tier we can have Public, Private and a unique space as well (I just called mine “New Space”)
Next steps
There are great walk throughs using a provided Excel sheet here and the community hub has great components as well as workflows
Here is a quick test of a component and a workflow. It took me a bit to figure out that one is just supposed to drag and drop from a web browser in (here I use Chrome and Edge):
side note: that python install did complete, just took a long while. once done, I was prompted to restart to have it take effect
While we are running Knime locally, the Pricing page shows other plans. For instance, the next level up is Team plan which runs a workflow for about US$0.12 a minute and you could share with a team
Summary
Knime is a very powerful java based Open-source BI suite (we didn’t explore it, but their source is on Github here). They have a simple drag-and-drop interface and it wasn’t too hard to build out a basic workflow.
My trouble is always about when to use BI tools. The programmer in my just wants to pull the data into structures and then kick out an HTML page using something like chartjs or CanvasJS.
There have been times, just to answer a question for a meeting, I’ll just kick data to CSV and use Excel or Google sheets - low tech, of course, but I’m just trying to determine a trend.
The whole thing that turned me on to BI was doing a Sisense workshop last week where it was quick to build and create multi-widget dashboards
But then there is licensing and the care/love/feeding of yet-another tool.
I do hope you enjoyed this little tour of Knime. I may come back later to build a much more complicated flow and share it out.