Bootcamp Finale & Whisper Experiments
Buenas, todos! Itās the very last day of the 6-week AI Builders Bootcamp Iāve been taking, led by the enviably articulate and insightful Shawhin Talebi. I just dropped my last message to the peer group Iād volunteered to lead throughout the course.
Bootcamp Finale šŖ¶
āHello hello everyone! Can hardly believe today is the last session for the bootcamp ā it flew by like the 5-hour data limit window before Claude boots you out š
This week, I focused on a tiny fine-tuning experiment ā more about shoring up my understanding of foundation models than producing anything polished. Since I plan to make hands-free accessibility central to future projects, I worked with the open-source automatic speech recognition (ASR) model Whisper. What a time to be alive that these models are available as a baseline! By the second epoch, Whisper had already outpaced my little CNN model that was learning from scratch.
I wanted to fine-tune Whisper on a key voice marker I designed for an āAI reading/voice notes partnerā (a week-3 project), but the dataset is still way too small. The big lesson this week: fine-tuningās bottleneck is almost always the dataset. While I didnāt land a real fine-tuned application, I now have a clearer sense of what a fine-tuning plan would need.
Outside the bootcamp bubble, I overheard a fascinating conversation at a coffee shop between folks from a mission-driven design agency (Partner & Partners). They were debating whether tracking time helps or just adds overhead, when someone joked, āIsnāt AI supposed to handle that for us?ā ā¦at which point I swooped in š Hoping to follow up with them, hear more about their pain points, and maybe sketch out a business-requirements doc as practice. Besides, Iām genuinely interested in tinkering some AI systems to help with time-tracking, even just for myself! Iām also curious: have any of you tinkered with AI for time-tracking yet?
Itās been such a pleasure learning alongside you all these past weeks. Iāll be bouncing between Madrid, Columbia (South Carolina), and Sao Paulo this year, so reach out if your travels overlap! In particular, if youāre in Madrid (or know someone there whoād like to chat), let me know, as that is the long-term destination goal, and I currently know zero MadrileƱos. In the meantime, letās keep in touch on LinkedIn or email. And I promise Iām not intentionally a sneaky conversation creeper šā
Automatic Speech Recognition - Hands-Free Accessibility š¤²
If youāre interested in the pedagogical notebook I mentioned in my last bootcamp post, you can find it here: š Automatic Speech Recognition Fine-Tune vs. CNN from Scratch
ASR is absolutely central to my interest in hands-free accessibility with tech. ā¦And hands-free accessibility is central to my AI dev ideas because:
-
wow, arthritis, way to truncate that post-carpal tunnel bliss!
-
as if m typos arent bda eougnh arlaedy, I canāt imagine them surviving the baby-cradling, toddler-chasing chapters of my life.
Whisperātrained on 680,000 hours of multilingual data by OpenAIāis already remarkably capable: it processes mel-frequency representations that approximate how humans perceive speech, and captures most utterances with striking fidelity.
FIG. A: While CNN treats all frequencies equally, Whisper has learned to emphasize the frequency ranges where human vocal information is concentratedāhence the narrower, more focused band.
But fine-tuning remains crucial, as I show in the notebook. Inevitably, bias emerges through the training dataāWhisperās corpus leans heavily toward polished presentation speech, often in English, often by fluent speakers.
So what happens when a voice doesnāt fit those statistical norms? Be it someone new to English, someone who prefers not to speak English, someone just learning to speak at all (hello, lilā tot š£)ā¦or someone who just ran out of rods to spin another plate and just needs to ramble a bit š«£. In those last cases, the ellipses and marginalia may actually carry the most signal.
Precursor to the Fine-Tuning Experiment: Saluda AI š
I first explored this transcription challenge with Whisper back in Week 3 of the bootcamp. The experiment was to develop an AI that remembers your book reactions and uses them to inform its responsesāwhether for brainstorming or emotional processing. I recorded myself reading passages that struck me and then riffed with a few thoughts.
Voluble I might be, but emotions feel slipperier than dream memory. I suspect Reason is really a combination of logic and sentiment. So I wonder: what personal emotional texture might our smarty-pants AI kin infer when I nerd out about books, rather than try so hard to āfeelā? A very preliminary POC lives here: š Saluda Cognitive Fingerprinting (POC)
But āSaluda AIā is a story for another time, alongside the many other experiments I cobbled together during this bootcamp journey.
As for time-tracking AIānothing to recommend just yet (for the design agency of which I am now an honorary meeting member š ). But! If youāre looking for a time management solution for diving into the AI sandbox, Shawās bootcamp is one expertly structured place to start. šŗļø