Faris Baidoun

Work

About me

Contact

Faris Baidoun

Work

About me

Contact

We Didn't Have a Native App. We Shipped Voice Anyway

How I redesigned voice for an AI coach that had to work in noisy environments, on mobile, for users who had no other option.

Company

CoachHub

Industry

SaaS, Personal Development, Human Resources

SaaS, Personal Development

Role

Product Designer

Team

Product Manager, Engineering

Timeline

Half a Quarter

Context

AIMY is CoachHub's AI coach: a chat-based product where employees reflect, set goals, and work through real workplace challenges, like having a coach available whenever they need one.

When I joined the project, AIMY had been live for about two months. Voice mode was already generating a steady stream of bug reports and negative feedback. The flow felt unnatural, AIMY sometimes responded before users finished speaking, and it was especially painful in noisy environments. The first post-launch survey returned a deeply negative NPS. Voice was the lowest-rated dimension.Users rated the voice experience 4.2 out of 10 for how natural and fluid it felt to use. Nearly 30% of users who skipped voice altogether cited not having a private space to speak.

Then a sales pilot was agreed with a large enterprise client: a frontline workforce in a medical environment. Their working environment had no desks, no offices and no laptops, with personal phones only. This wasn't a typical CoachHub buyer, It was a new segment entirely, and the pilot's success would determine whether a larger contract followed.

Challenge

Improving voice mode to work on mobile was the immediate goal. We had six weeks before the pilot launched. If voice still didn't work by then, we'd lose the client. The assumption was that fixing what was already there would be enough.

But once I started digging in, it was clear mobile was only part of the problem. Voice mode already had serious interaction issues that affected everyone: unclear states, no system status visibility, messy switching between voice and text.

So I treated this as a two-for-one: solve the urgent mobile constraint for the pilot, and fix the underlying interaction model at the same time.

Process

Audit: discovering a broken middle state

The tickets and survey feedback told a consistent story before I even started the audit: modes feeling broken, AIMY cutting users off mid-sentence. What the audit needed to answer was what exactly was causing that confusion, and whether the fixes would be surface-level or something more structural.

I mapped the full interaction system: text chat, full-screen voice mode, and an ambiguous middle state where chat remained visible while voice was active.

That middle state was a significant clue. It confused users because it wasn't clear what mode they were in, typing, speaking, or listening, and it wasn't obvious how to switch, even for me at first. That became the foundation for three guiding principles:

Make modes explicit
Make system status impossible to miss
Make the entire interaction more intuitive

I also flagged the highest-impact usability issues: system status visibility, error recovery, match to real-world expectations, and consistency across states. This gave me a concrete list of high-impact quick wins (clearer labels, tooltips, predictable toggles), and bigger structural problems to solve: namely how the voice interaction model worked in noisy, interrupted environments on mobile.

Benchmarking: borrowing what users already know

Instead of reinventing the wheel, I applied Jakob's Law to benchmark how ChatGPT, Gemini, and Claude already handle voice, a risk-reduction move under a tight timeline.

I looked at:

How each product signals mode: text vs. voice input
How they show listening vs. thinking vs. speaking states
What they do when audio fails or cuts out
How they handle controls without clutter on icon-only buttons

This gave me a pattern library to reference throughout the project, and a key insight: every major AI voice product already separated conversational and dictation modes. AIMY only had one, and it was already confusing users. This pointed in a clear direction, but whether dictation mode was the right answer for this specific user group still needed grounding in their actual context.

Building the case: grounding the direction in reality

Through the sales process, I managed to speak with the HR manager overseeing the pilot group. What she described grounded the benchmarking insight in reality: these workers are hands-on and on the go most of the time, voice is specifically expected to work for them, and typing isn't always a viable option.

From that, I built a lightweight persona and storyboards to visualise how this user group would actually move through a session with AIMY in their environment. That's where it became clear that a new mode was needed, not a fix to the existing one.

The PM's instinct was to fix what was already there: cleaner states, better feedback, resolve the existing complaints. The storyboards reframed it. This wasn't a UI problem. The interaction model couldn't serve this user group regardless of how well it was executed. Dictation wasn't something extra to build, it was delivering on what the client was specifically expecting from the product.

Aligning on the solution: two contexts, two modes

The solution was a hybrid approach:

Keep conversation mode: for quiet, intentional coaching moments, where the user speaks freely, AIMY listens and responds with voice, full-screen, immersive, and hands-free.
Add dictation mode: for noisy, interrupted environments, where the user taps to start, speaks at their own pace, and taps again when done. AIMY replies in text rather than voice, which keeps the speed benefit of speaking while sidestepping the two most common mobile failure modes: background noise cutting responses short, and audio latency making the experience feel broken.

The solution was a hybrid approach:

Keep conversation mode: for quiet, intentional coaching moments, where the user speaks freely, AIMY listens and responds with voice, full-screen, immersive, and hands-free.
Add dictation mode: for noisy, interrupted environments, where the user taps to start, speaks at their own pace, and taps again when done. AIMY replies in text rather than voice, which keeps the speed benefit of speaking while sidestepping the two most common mobile failure modes: background noise cutting responses short, and audio latency making the experience feel broken.

I ran an impact/effort prioritisation with the PM to keep scope tight, aligning on the hybrid approach, adding essential UI improvements around status clarity and mode entry, and parking the rest.

One item that didn't make the cut: letting users stop AIMY mid-response, genuinely useful, but the technical complexity wasn't justified for V1.

Solution

Before the redesign, voice mode had one state for every context, no way to know what AIMY was doing, and no path for users who couldn't use it in quiet conditions.

Before

Once the direction was agreed, I moved into prototyping with one specific goal: make every interaction state unambiguous for the engineer building it alone. I detailed entry points, control logic, state transitions, exit behaviour, and error states, then ran structured walkthroughs with the PM and engineering to align on edge cases before handoff.

The final design

Two modes, each designed for a different reality.

Dictation mode: designed for noisy, interrupted use. Tap to speak, tap when done, AIMY replies in text. Every state is labelled, every control is explained. Explore the hotspots for the thinking behind each detail.

Two modes, each designed for a different reality.

Dictation Mode

Conversation mode: designed for focused, quiet use. Speak freely, AIMY listens and responds with voice. State changes are visually distinct so you always know what's happening. Explore the hotspots for the thinking behind each detail.

Conversation Mode

Impact

Pilot converted to a 6-month paid contract

Voice tickets dropped to near zero within a month of launch

Voice naturalness 4.2 → 6.1 in three months post-launch

The feature shipped roughly six weeks after I joined the team. The pilot client converted into a six-month paid contract shortly after. Support tickets related to voice usability dropped to almost zero within a month of launch. The complaints that had dominated feedback before, modes being unclear, AIMY cutting users off mid-sentence, nothing working on mobile, stopped coming up.

Three months after launch, a second survey put voice naturalness at 6.1 out of 10, up from 4.2 before the redesign. Not perfect, but a meaningful shift in how users experienced speaking to AIMY.

As part of a larger release that included onboarding improvements and customisation for diversity, NPS moved from -56 to +8.3. That number belongs to the full release, not to voice alone.

The project also established something with longer reach: AIMY now had a voice model that could serve frontline and mobile-first users, opening the door for future clients in similar environments.

Reflection

I joined this project after launch, when the team was already under pressure fixing multiple things at once. There was no clean brief or a research in scope, and six weeks to make it work for a client nobody had designed for before. What I had wasn't much: some audit findings, benchmarking, and one conversation with someone close to the client. But it was enough to define a direction and get it built.

Storyboarding was the turning point. It made the interaction problems impossible to ignore and gave me the language to get buy-in for a bigger solution, rather than getting stuck in a debate about whether a new mode was worth building. Dictation mode wasn't on anyone's radar when the project started. Making the problem visible first is what made it possible to build.

The one thing I'd do differently: get closer to the actual end users. I spoke to the HR manager overseeing the pilot, which gave me useful context, but it's not the same as observing or testing with the workers themselves.

Lets work together.

Get in touch

Lets work together.

Get in touch

Lets work together.

Get in touch