Protected Content - Enter password to view case study
We Didn't Have a Native App. We Shipped Voice Anyway
A hybrid click-less + click-ful workflow designed for noisy, frontline reality, and handed off with build-ready component states.

CoachHub
Product Designer
Product Manager, Engineering
Half a Quarter
Context
AIMY is CoachHub's AI coach: a chat-based product where employees reflect, set goals, and work through real workplace challenges,like having a coach with you whenever you need them..
When I joined the AIMY project, the product had been live for about 2 months and Voice mode was already generating a steady stream of bug tickets and negative feedback. The flow felt unnatural, AIMY sometimes responded before users finished speaking, and it was especially painful in noisy environments. Mobile was even worse, and we didn't have a native app, just a desktop web. The first survey after launch had returned a very low, negative NPS and voice was the lowest-rated dimension, along with complaints that onboarding wasn't building enough trust.
Then a new pilot client signed: frontline workers in a medical environment. They had no desks, no offices, and no company laptops. If AI coaching was going to work for them, it had to work on their personal phones. We needed to make this happen for them or the pilot won't be a success.
Challenge
In a few weeks, I needed to get AIMY’s voice experience working on mobile web, while recognizing the inherent limitations we had without a native app.
But once I started digging in, it was obvious the “mobile problem” was sitting on top of a bigger one: voice mode had a bunch of interaction issues that were already confusing people (unclear modes, unclear system status, messy switching between voice + text). So I treated this as a two-for-one moment: solve the urgent mobile constraint, and ship a set of smaller UX/UI fixes that made the whole conversation flow feel more intuitive.
Process
Audit: discovering a broken middle state

Before jumping into solutions, I mapped the full interaction system: text chat, voice mode in full-screen, and a weird “in-between” voice mode where chat was still visible.
That middle mode was a big clue: it confused users because it wasn’t clear what state they were in (typing? speaking? listening?) and it wasn’t obvious how to switch modes, even for me at first. That became one of my guiding principles for the redesign:
Make modes explicit
Make system status impossible to miss
Make the entire interaction more intuitive
I also flagged the biggest usability issues: system status visibility, error recovery, match to real-world expectations, consistency. This gave me a concrete list of high impact, easy wins (clearer labels, tooltips, predictable toggles) and bigger, structural problems to solve, namely, how voice mode functionally worked in the mobile experience.
Benchmarking: borrowing what users already know
I didn't have time for user research before the pilot cohort started. So instead of inventing interaction patterns from scratch, I applied Jakob's Law: users spend most of their time in other products and bring those mental models with them. If I could align AIMY's voice behaviour with what users already knew from ChatGPT, Gemini, and Claude, I could reduce cognitive load without a single interview.
I looked at:
How each product signals mode: text vs. voice input
How they show listening vs. thinking vs. speaking states
What they do when audio fails or cuts out
How they handle controls without clutter on icon-only buttons
What they do on mobile when real-time voice conversation isn't reliable
This benchmarking gave me a default pattern library I could reference as I iterated on the design. It also helped me frame decisions as reducing cognitive load, not designer preference.
Storyboarding: making the case for a bigger fix
We didn’t have time for user interviews before the pilot cohort started, but we did speak with the HR manager overseeing the group to understand where and how they’d realistically use AIMY: on breaks, between tasks, in noisy spaces, exclusively on personal phones.
Then I built a lightweight persona and storyboards to test my early ideas. This helped in two ways:
It exposed gaps. Our existing voice flow assumed users were in a quiet room with reliable audio, allowing for uninterrupted back-and-forth. That’s not real life for frontline workers.
It made alignment possible. I could show PM and engineering why improving the existing click-less mode wasn't enough by making the design mobile-friendly, but we needed to overhaul the entire interaction model.
The storyboards shifted the conversation from "how do we improve what exists" to "what does this user actually need that we don't have yet." The answer was a second mode entirely, one where the user controls when they start and stop speaking, and AIMY can't misfire because of background noise. That wasn't the original scope, and the storyboards were what made that investment impossible to argue against.
Aligning on the solution: two contexts, two modes
After the audit, benchmarking, and storyboards, my recommendation was a hybrid approach:
Keep click-less conversation mode: for quiet, intentional coaching moments. The user speaks freely, AIMY listens and responds with voice. Full-screen, immersive, hands-free.
Add click-ful dictation mode: for noisy, interrupted environments. The user taps to start recording, speaks at their own pace, taps again when done. AIMY replies in text, not voice. Background noise can't trigger early responses, spotty audio can't cut them off mid-sentence. Voice-note style, not live conversation.
In dictation mode, the user taps when they’re done speaking and AIMY replies in text, not voice. This was a deliberate tradeoff: it keeps the speed benefit of speaking, while avoiding the most common mobile voice failure modes — noisy audio confusing speech detection, and latency issues that make the experience feel broken.

I ran an impact/effort prioritization with the team and PM so we could be focused with our scope, keeping in mind the main question does this help us ship a working voice experience for mobile users in time?. Our key alignment decisions we made together:
We add a click-ful dictation mode, while keeping the full-screen focused conversation mode click-less (hybrid)
Some UI improvements are essential (status clarity, tooltips, mode entry)
Some ideas are “nice to have” (e.g., listen-to-message controls everywhere)
One item that didn't make the cut: allowing users to stop AIMY mid-response while it was speaking. Genuinely useful, but the technical complexity wasn't justified for a V1.
Solution
Before the redesign, AIMY had one voice mode that tried to serve every context. It didn't work, especially not on mobile, and especially not in noisy environments.
Once the direction was agreed, I moved into prototyping with a very specific goal: make every interaction state unambiguous for the engineer building it alone.
In the prototype, I detailed:
Entry points: how users enter conversation mode directly from the main chat
Control logic: mic on/off behaviour, what happens when typing starts, when buttons hide and show
System status: clear listening vs. speaking indicators
Exit behaviour: how to leave voice mode cleanly from any state
Error states: what happens when audio isn't picked up
I also used the prototype to run structured workshops with PM and engineering, so we could decide edge cases together. This helped avoid interpretation gaps, sped up refinement conversations, and made for clearer tickets at hand-off.
The final design
Two modes each designed for a different reality.
Outcome
The feature shipped roughly six weeks after I joined the team. The pilot client converted into a six-month paid contract shortly after. About a month post-launch, support tickets related to voice usability dropped significantly. The same problems that had dominated feedback before the redesign - modes unclear, AIMY responding before users finished speaking, nothing working on mobile - stopped coming up. We also shipped a clearer interaction model: text, dictation, and conversation, which made voice feel more controllable and predictable, especially in noisy or mobile contexts.
The NPS impact became measurable later, when the full body of work: voice, onboarding, and avatar diversity, shipped together in a major product release a few months after the voice feature landed. NPS moved from -56 to +8.3. Voice had been the weakest dimension in the first survey; by the second, it was no longer the ceiling. The score doesn't belong to any single fix, but voice was the most structural intervention, and the one that unlocked the client relationship that made the timeline possible.
Beyond the pilot itself, the project had a longer-term effect: it triggered a broader process of enhancing AIMY's mobile web experience.
Support tickets on voice dropped significantly post-launch
Reflection
This project was a good reminder that deadline work doesn't have to mean "only ship the quick fix." While the headline goal was making voice usable on mobile web, I used that urgency to also ship a set of interaction fixes that removed confusion across the whole flow.
Storyboarding was the turning point: it made the problems with our existing voice modes impossible to ignore, and it helped me make the case for a bigger solution instead of getting stuck debating preferences.
Benchmarking kept everything moving. By leaning on users' existing mental models, I could make fast, confident design calls without reinventing how AI voice chat works while needing to meet a tight deadline.
The bigger pattern: designing voice for an AI product in constrained conditions isn't just about the happy path. It's about making every failure state legible, every mode transition predictable, and every interaction forgiving enough to survive real-world noise.



