Published: Apr 28, 2026 by Isaac Johnson
In my last MAI post we just focused on the MAI-Image-2 image generation model. It was basic but functional.
In the time since, Microsoft has rolled out “MAI-Transcribe-1” and “MAI-Voice-1”, an audio transcription model and text-to-speech model
Transcribe
My first attempt to drop a lecture series in failed as the max size seems to be 10Mb
I reduced my recording to 11 minutes and set the bitrate to 110kbps VBR and tried again (this time it was under 10Mb)
I just got an invalid request error
I reduced further to 5 minutes.. This time it worked!
In fairness, I am using a cassette tape recording from the 1990s of my Grandfather I imported to the PC so the fidelity isn’t great.
In other words, I’m going to set something down on the record so that you’ll know right from the beginning, and that is that I’m not an expert in the devotional Christian classics. That would take a lifetime, and if one would take a lifetime, one would become an expert in only maybe a couple of these devotional Christian classics. One could spend a lifetime in Augustine’s Confessions and in the life of Augustine, and at the end of a lifetime, you’re still almost a beginner because you’re dealing with a man of fantastic genius. But I am a learner and I am a believer, not just a Christian believer, but I’m also a strong believer in the worth of the devotional Christian classics. I was very pleased. The secretary sent me your bulletin from last week and also the church newsletter. And as I was reading the newsletter, I thought, “Wow, John is really on target on this subject.” And that this is the season that we get closer to Jesus. And then all at once he introduced the Christian Classics in my presentation. And I was amazed how close we were on that theme. And that’s the only intention for wanting to know the Christian Classics. And by the time this morning is over, I think I’ll make the the proof very clear. But I’m absolutely convinced regarding the possibility of growing spiritually if we invest ourselves in reading and living with these significant books. Now, we can’t read them all in one day. We know that. But we have years before us, and it depends on the kind of commitment that we make. Let me share with you a letter that one of our members received from Chuck Swindoll. Chuck Swindoll, as you know, is an outstanding preacher. I think he lives in Texas now. But I’m not sure of the question that she wrote, but here’s the letter she received. And Jean will appreciate this because he knows her husband. This is Marilyn Ash that received this letter from Dr. Swindoll. “Dear Marilyn, few characteristics are more disheartening about our times than shallowness. So many, it seems, are satisfied to live their lives like a rushing river, a mile wide and about three inches deep.” And then he quotes Richard Foster. “Superficiality is the curse of our age. The doctrine of instant satisfaction is a primary spiritual problem. The desperate need today is not for a greater number of intelligent people or gifted people, but for deep people.” Then he goes on to say, “But depth is neither popular nor easily attained, not these days and not in most places of the world, certainly not in America. Having grown up in this great country and having spent the majority of my life here, I have observed that many have achieved the good life and some the fast life, but precious few the deep life. Part of the reason is obvious. Nothing from this world system encourages it. Various media may entreat or fascinate or entertain us, but they do not deepen us, not really. As persuasive as our politicians may be, few of them prompt either depth of character or a life of quiet faith.” And then one last quotation: “Rarely do I hear someone extol the virtues of slowing down, of being still, of mediating, of meditating on the things that matter, like reordering one’s private life and reshaping one’s priorities.” Quite a quotation by a very knowledgeable person. I think this shallowness in the church is a plague. We’re part of a shallowness of an age. When you think of all the things that hurry up, when you turn on your television today, you expect the picture and the sound to be there right now. Compare that with the old television sets where you had to wait for things to warm up. And just think how impulsive we are at stop signs. We hit a lot of stop signs today. Also, a train that was going across the tracks, which delayed us about 10 or 15 minutes. But there’s something that keeps pushing inside. I must say that about 20 years ago, no longer ago than that, in 1969 and ‘70, I took a sabbatical leave. And probably one of the best things I learned on that sabbatical leave was that I wasn’t God. And I learned to slow down. And I haven’t rubbed the motor in front of a stop sign since. But there is a tendency we want to keep pushing. What about going to McDonald’s? We want that food right now. Our whole temperament, the climate of the times is hurry up, fast, fast, fast. Well, we’re in an area that’s not fast, but it is deep. And this makes a difference. You have received, hopefully, the outline of the class sessions. And what did I do with my own outline? Which I– could I have one, please? Thank you.
I then thought, what about Music?
Early in his life, my younger brother was in Minnesota Boychoir and I had imported this 1995 cassette tape into the computer as audio files.
Let’s try this track “Freedom is Coming”.
It has high voices singing in unison so I was curious how it would handle that.
I tried it twice and it couldn’t figure out the words
There is a rare(r) Unkle album with lots of movie tracks in there I have copied to my computer (as it’s rather harder to replace), Unkle WW III
I tried the first track of that:
Come with us into the realm of imagination. I’m within range of the reconnaissance vessel, but I’m hit on my left front stabilizer. I’m switching circuits to compensate, but it’s gonna take some time. I need those fighters. The fighters are locked to your coordinates. Stay on target for the rendezvous phase. I’m losing stability. I can’t shake it. I’m gonna switch to the– 77, are you there? 77, do you copy? Copy. 77, do you copy? 77, do you copy?
That sounds about right to me.
I tried another, a section from a 1977 Steven King audiobook “The Shining” as read by Campbell Scott. I had bought that audio book as MP3s at the time.
Again, it did just fine with newer content
- The Front Porch The Torrance family stood together on the long front porch of the Overlook Hotel as if posing for a family portrait, Danny in the middle, zippered into last year’s fall jacket which was now too small and starting to come out at the elbow, Wendy behind him, with one hand on his shoulder, and Jack to his left, his own hand resting lightly on his son’s head. Mr. Ullman was a step below them, buttoned into an expensive-looking brown mohair overcoat. The sun was entirely behind the mountains now, edging them with gold fire, making the shadows around things look long and purple. The only three vehicles left in the parking lots were the hotel truck, Ullman’s Lincoln Continental, and the battered Torrance VW. “You’ve got your keys, then?” Ullman said to Jack. “And you understand fully about the furnace and the boiler?” Jack nodded, feeling some real sympathy for Ullman. Everything was done for the season—the ball of string was neatly wrapped up until next May twelfth, not a day earlier or later. And Ullman, who was responsible for all of it, and who referred to the hotel in the unmistakable tones of infatuation, could not help looking for loose ends. “I think everything is well in hand,” Jack said. “Good. I’ll be in touch.” But he still lingered for a moment, as if waiting for the wind to take a hand and perhaps gust him down to his car. He sighed. “All right. Have a good winter, Mr. Torrance, Mrs. Torrance.” “You too, Danny.” “Thank you, sir,” Danny said. “I hope you do, too.” “I doubt it,” Ullman repeated, and he sounded sad. “The place in Florida is a dump, if the out-and-out truth is to be spoken. Busy work. The Overlook is my real job. Take good care of it for me, Mr. Torrance.” “I think it will be here when you get back next spring,” Jack said, and a thought flashed through Danny’s mind. “But will we?” And was gone. “Of course. Of course it will.” Ullman looked out toward the playground where the hedge animals were clattering in the wind. Then he nodded once more, in a business-like way. “Goodbye, then.” He walked quickly and prissily across to his car—a ridiculously big one for such a little man—and tucked himself into it. The Lincoln’s motor purred into life, and the taillights flashed as he pulled out of his parking stall. As the car moved away, Jack could read the small sign at the head of the stall: “Reserved for Mr. Ullman, Manager.” “Right,” Jack said softly. They watched until the car was out of sight, headed down the eastern slope. When it was gone, the three of them looked at each other for a silent, almost frightened moment. They were alone. Aspen leaves whirled and skittered in aimless packs across the lawn that was now neatly mowed and tended for no guest’s eyes. There was no one to see the autumn leaves steal across the grass but the three of them. It gave Jack a curious, shrinking feeling, as if his life force had dwindled to a mere spark while the hotel and the grounds had suddenly doubled in size and become sinister, dwarfing them with sullen, inanimate power. Then Wendy said, “Look at you, Doc. Your nose is running like a firehose. Let’s get inside.” And they did, closing the door firmly behind them against the restless whine of the wind.
I’m a bit of a data hoarder. I had saved aside old memory cards from phones. I found a voice memo from an old Nokia that is of terrible quality from 2004:
I can barely make that out, but MAI actually parsed the background talking.
My final test was to just use a basic Android app to record a memo, in this case, the summary for this article
Then use the MAI playground to convert it to text
Voice
The Voice generator has quite a few styles that roughly translate to emotions
And interestingly named voices
I’ll try “wave” (?gender British) with “disgust”
It worked fine, though I don’t really hear disgust in it
I tried the same prompt with “Alder (American)” and “anger”
MAI 2
Let’s try some of those prompts form last time.. still blocked on even loosely copyrighted stuff
However, talking near the topic seemed to do better now:
Also, horror images are also blocked as before (trying the same prompt as last time)
spoiler see notes on AI Foundry later in article
Let’s try the “efficient” version with the prompt we did last time:
“In block letters on white background “Hello World” with a gentle artistic notes of grass and flower as if the letters were on a spring forest glen. light tones, watercolor”
Here we see we now can get 4 responses which is pretty handy
Diagrams
so last time, the diagram ask was pretty bad:
Let’s use the “Refined” option this time for one image that should be improved:
That is WAY better!
The “No Public Internet” line is a bit strange - I would likely remove that, and I would change “Private IP” to “Private IPs”, but otherwise that would be very usable in a deck.
AI Foundry deployment with custom filter
As with Flux, let’s try deploying the new MAI model (2e) to our AI Foundry instances
I’ll then edit it and set the Content filter to the lowest settings
That actually worked
Granted, I’m now paying for these generations, but without (as many) nanny filters.
This is really important. I have given feedback several times to Google that the blocking of knives meant for real work I do, where I need to see Medical setups, I cannot have Surgeons using plastic kitchen knives or diner utensils and they block anything sharp in most Gemini images.
Here I asked for: “Surgical room, patient prepped for surgery, modern operating room, table with surgeons utensils including scalpel and forceps”
I got a pretty reasonable response
I tried “fast” mode now in Gemini and it looks reasonable, but there are a LOT of clips there on the table
Even though I explicitly mentioned scalpel (same prompt), MJ wouldn’t give it to me
And just to test, I asked Gemini in “Thinking” mode to create the horror image and it seemed to do it okay now:
Photorealistic painting of a Victorian woman with the upper half of her head wrapped in bandages. The lower half of her face reveals a mouth full of monstrous teeth that are falling out. Numerous other teeth hang from above on threads. Red liquid stains around mouth. Insanely intricately detailed white fabric clothing decorated with teeth. Dramatic lighting. Esoteric, horror, creepy, unsettling, disorienting, dreamcore. Surreal
My last quick test of MAI is to do the one that is always flagged for violence/BDSM: “Software developers in a meeting room tied up in chairs with ropes”.
That is awesome. I might tweak the size of the room, but that was exactly what I was hoping for - the ropes around around the chairs, not loose, and some of them are gagged, which was a bit more than I wanted, but hey, it’s riffing on my ask.
Using URLS
I was curious if MAI could pull from a URL. I asked it:
show an office building with the logo from https://www.logo.wine/logo/Control_Data_Corporation
It figured out the words, but not the logo
Google Gemini did succeed and funny enough gave it a retro look with 1970s cars
Summary
Well, I do think MAI has had some pretty impressive improvements in the last few weeks period. Last time I wrote an article, I really thumbed my nose at it and suggested it was very far behind period. But as we saw, the voice transcription is very good.
The voice creation, the text to voice is okay. And the image model has greatly improved, if for no other reason, diagrams are actually usable period. I think going forward, I will regularly check back on the MAI playground and see if it’s improved at all period. My final test here is using a Android app that’s a voice recorder to record this summary and transcribe it. We’ll see how it goes.
Addendum
I was experimenting with the AI Foundry deployment after I wrote that summary and in light of that, I do think MAI2 will be one of my go-tos when I’m nanny-blocked by MJ and Gemini. I was especially impressed with the tied-up-developers image.
Also, in just the week I was away at Next26, I noticed UI improvements on the MAI page, the layout now getting a bit tighter:
compared to just a bit over a week ago:
So this is one to watch.

































