Flipkart Grocery - Multi-modal Conversational Assistant

Voice-based conversational shopping assistant for new internet users.

As a product designer, in the Conversation Assistant team, I led the Voice User Interaction (VUI) design to build an interaction model that would help our customers shop online using the Flipkart application.

I collaborated with the developers, and conversational and content designers to realize this solution, as we tried to effectively optimize the interplay of touch, speech, and visual modalities.

During my time in the Conversational Assistant team I was responsible for the entire suite of voice and conversational products on Flipkart. Predominantly, Flipkart had ventured into the voice domain across five different product channels. With the final goal of streamlining these into an integrated multi-modal conversational AI voice assistant; which would eventually roll up to serve as a singular experience for assistance.

Surface Areas

  1. 1. Flippi (Lifestyle Assistant - Category Specific) - a conversational commerce assistant that performed speech-to-text-to-navigation throughout the shopping journey in lifestyle category
  2. 2. Voice Search - transcribed texts from vernacular speech
  3. 3. Decision Assistant - Chat bot which helps customers answer customer queries about a product
  4. 4. Grocery Assistant - a conversational commerce assistant that performed speech-to-text-to-navigation throughout the shopping journey in the grocery category
  5. 5. Customer Support (CX Bot) - chatbot which assists in customer supports queries

The Grocery Voice Assistant we’ll discuss in this post was Flipkart’s first foray into using voice technology to tap into the emerging markets of India. For Flipkart, India’s largest e-retailer, the next wave of growth was now coming in from customers who are new internet users who come from regions in India where even Hindi is not the first language, forget being able to understand English. For many, using Flipkart might have been their first interaction with technology. Many new internet users struggle with lower literacy and depend on technologies such as voice to help them gain confidence and ease into using products and services such as Flipkart.

This project was a key milestone for me on my journey into the voice industry, as it served as a test for the user understanding and skill set that I had developed while working in Flipkart's Fintech arm. Not only did it provide me with the responsibility I was seeking, but also allowed me to display my horizontal play and develop systemic thinking within Flipkart's growth charter, helping me to continuously enhance my evolving role as a designer and gain valuable insight into adjacent roles.

Flippi, what’s the big deal with voice?

Voice has become an integral part of interacting with the internet, especially for new internet users. Several studies have demonstrated an increase in the use of voice technology, with Flipkart's data showing a steady climb in voice usage traffic. With over a million new internet users coming online every day, many of whom wouldn't be able to interact with technology without the ability to use speech input, it is clear that voice is an invaluable tool for many.

Not only is voice helpful in those moments when you are unable to type, such as while driving or cooking, but it is also critical for many people's daily engagements with technology. This is because the voice has the potential to transform entire industries, however, it also presents designers with complex, ambiguous challenges.

Having spent time at Flipkart working on building Flippi, our in-app multi-modal conversational assistant, I have identified several key learnings from our studies and interactions with new internet users. In this article, I have used those learnings to create a set of principles that could help to build more successful voice experiences for everyone.

Why voice? - It’s empowering.
  • 1. Increased Self-sufficiency
    Many new internet users, especially those with lower literacy, are finding voice to be a powerful tool that empowers them to perform tasks independently and confidently. With the help of voice technology, these users are able to explore the internet without needing the assistance of others, and this is helping to increase their self-esteem and confidence. Voice-enabled technology has become an invaluable resource for those with lower literacy, enabling them to take control of their online experience and do things that they may not have been able to do previously. As users become more familiar with voice technology, they are becoming more comfortable using it and can do more and more with it. This is allowing them to do things that they may not have been able to do before and is helping them to gain more confidence and become more independent.
“I can’t type, voice is the only way I can talk to the world”
  • 2. Understandable Output
    Output is a crucial component of voice functionality that is often overlooked. Although a lot of web content comes in the form of text, many new internet users find it difficult to understand written content. This is why text-to-speech (TTS) is so important; it allows users to understand digital content using their devices, even if they have difficulty with reading. Without TTS, it can be difficult for some users to fully understand the content they are presented with, leaving them unable to use their devices to its fullest potential.
“The voice reading out the page is good. It would get into my head more than reading.”
  • 3. Simplified Input
    Typing in indic and scripted languages is much more complex and time-consuming than typing in Latin characters, which makes voice input a great advantage for users of any literacy level. Users with lower literacy levels especially benefit from voice input as they can avoid the complexities of spelling and make the process much easier. English is often seen as the global language and many devices often come pre-set to English as the default language. This means that new internet users who may not be able to read, write, or type in English can still make use of their devices with voice input. Voice is of particular benefit to users of lower-end devices which feature traditional T9 keyboards. Furthermore, sending voice messages is becoming increasingly common, enabling users to mix languages more easily than when typing, as it can be somewhat cumbersome to switch between languages when typing.
Typing in Hindi can be 3x slower than typing in English
  • 4. Ability to multitask
    The ability to multitask is a major advantage of using voice technology for internet users all over the world. When we spoke to people, many of them shared that they are often trying to balance personal tasks, carry out household duties, and manage multiple sources of income. This is a reality for countless individuals, and voice functions integrated into devices can help make multitasking easier and more efficient. Not only do these tools streamline the process of managing multiple tasks, but they can also help to save time and reduce stress, allowing people to focus their energy and attention on the things that matter most.

Forms of Voice Technology

in which it exists in the customers’ hands

Our understanding of voice assistants or voice-based products is typically limited to Siri, Alexa, or Google Assistant-style general-purpose Assistants. However, the reality is that there is a much larger scope of voice-based products beyond the popular ones that many of us are not familiar with.

There are four main types of voice interactions that are widely used today; each one is built on a unique voice interaction model. This can confuse the mental models of voice as some of these interactions may or may not be prompted by the same microphone icon.

1. Recording

Use Case: Messaging Apps (WhatsApp, Instagram DM, WeChat)
Action: Share a recording of the user's voice

2. Dictation

Use Case: Word editors, Text input, Keyboards
Action: The user says a word or phrase and it gets transcribed as text

3. Commands

Use Case: Flipkart’ Search experiences (Voice Search)
Action: The user says a word or phrase as query and receive result

4. Conversational

Use Case: Flipkart’s Grocery Assistant and Flippi. Other examples, Siri, Google Assistant, Alexa
Action: Conversationally interact with devices. Users say commands and listen for a response.

Democratizing Access to Technology

The crippled consumption from most urban areas was compensated to a great extent by the sudden rise of e-Commerce transactions from the non-urban regions or tier -2 and 3 cities. Flipkart quickly realized that a significant share of their earnings was coming from these regions where the consumers had woken up to the benefits of online commerce. As Flipkart explored ways to enable diverse segments of consumers in tier-2, 3 cities of India, they soon came to understand that English-centric Apps with a focus on touch UI are a major huddle for most smartphone users in these regions. This was because lack of English literacy was a constraint and establishing the behavior of regular touch user interfaces as muscle memory was intimidating to being with.

This is when a lightbulb moment occurred for Flipkart and they realized that voice and vernacular are the biggest drivers to onboard these consumers. To capitalize on this, major Ecommerce brands have set up dedicated teams to understand the unique needs of this market to win over nearly 500 million consumers who will be the biggest market for these brands in the next decade - an incredible amount of customers, almost twice the population of a country like the USA, that is waiting to do business online!

Flipkart understood that in this market, Voice (with a Vernacular edge) has emerged as the most powerful driver that solves multiple unique challenges for these consumers. It was estimated that 9 out of 10 new users were likely to be local language users who were new to Flipkart and had to pay the ‘English Tax’ to use these apps.

The eventual business objective for Flipkart was clear-cut - stimulating growth in the number of new users on the platform, which would eventually lead to a rise in GMV (Gross Merchandise Value). To achieve this, Flipkart decided to invest in initiatives that would improve usability and accessibility, particularly in emerging markets, with Voice and Vernacular being the two main focuses.

It was with this in mind that Flipkart acquired Liv.ai in 2018, in order to provide consumers with a voice assistant that could understand different accents in nine Indian languages through natural language processing. This was in addition to their launch of Saathi, a smart assistive technology that could help users navigate the website through both audio and textual instructions, which was later developed into Flippi, Grocery Assistant, an AI-powered grocery shopping assistant. This was an important step in enabling users from emerging markets to access the Flipkart platform easily.

Looking Sideways

Amazon started with Alexa as ingress in the US market and then moved to mic ingress in 2020. In Indian Market, they have leveraged the popularity of microphone ingress only. The microphone ingress acts as both Voice Search and Assistant in different parts of the user journey. Amazon. brings Alexa personality and branding based on certain scenarios and in some implementations, Hot-word Detection of “Hey Alexa” as well as music, general knowledge, and chit chat.

For the first time in a major online sale, customers experienced the Amazon Great Indian Festival in regional languages: Hindi, Kannada, Tamil, Telugu and Malayalam.

Apart from this vernacular push, Amazon is also powering up it’s app with Alexa, it’s homegrown Voice Assistant that can invoke searches and specific actions in the app.

To get more users to engage and try their Alexa Voice Assistant in India, Amazon even announced roping in Amitabh Bachan as the voice of Alexa in India! This illustrates the seriousness with which Voice is being considered inside Amazon for Indian market and also the kind of no-holds-barred investments that are being made in the same.

Jio launched Assistant in their devices with microphone button on the feature phone. They are leveraging the same popularity on the TV/DTH remote. The assistant does end-to-end Assistance and Voice Search using the same entry point based on query type.

KaiOS is an operating system that powers non-touch smart feature phones like Reliance’s JioPhone and JioPhone 2 . On KaiOS, people have the option to speak to a voice assistant in their regional language and perform actions like sending messages on WhatsApp. Reliance & Google have invested in KaiOS’. Jio KaiOS devices are set to dominate the Indian market of Tier-2,3 regions in the coming years by selling as many as 100 million devices! When Google Assistant was launched on KaiOS the Google Assistant’s overall usage jumped by 6X.

Google is one of the biggest players in Voice Technology Adoption. Google started with Voice Search and Assistant Separately at the start of their journey. They have moved to merging these two products together at single ingress which is a branded Microphone ingress. Assistant is invoked based on user type, query type and Keyword Detection (Hotword Detection).

Hindi is the second most used language in the entire world on Google Assistant.
"These new users would much rather speak to the internet than tap or type so as a result, for instance, voice search queries in India are growing at 270% per year, which is staggering. We are already a video-first internet and if you ask me I would say that we will become the world's first voice-first internet.” - Google India Head
1. Grocery Shoppers are creatures of "Trust and Habit”

a. Users tend to stick with the first e-retailer they choose, or go ahead with their regular offline shop
b. They are very skeptical for the first online purchase, but slowly understand the benefits over subsequent sessions
"85% of e-commerce customers continue using the same retailer after the first purchase due to easier reorders. This is why it's very important to make the first shopping experience as easy as possible." Speechly, 2019
"If buying groceries online, given the option, users will choose their current retailer (85%) and another omni-channel grocer (11%)." Google x Bain, 2019
2. Personalize and Inspire

a. Voice assistants seem to be solving more for inspire & browse stages at present, rather than purchase. There is a big opportunity to shape demand via merchandising
b. Helping them uncover the best deals and creating a personalized experience, even in features like shopping lists, would be big opportunities to grow business
"43% of CPG (consumer packaged goods) shoppers said they used search to become inspired, browse, or research." Google x Bain, 2019
"Over half of all consumers are using voice to research products and a fifth are using voice to make a purchase." Speechly, 2019
3. Embrace Multi-ModaIity

a. They are very skeptical for the first online purchase, but slowly understand the benefits over subsequent sessions
b. Voice interfaces which have a multi-modal experience are better than smart speakers
"Voice assistant usage on smartphones is three times higher than smart speakers. Speakers also cannot show the product." Speechly, 2019
Product-related details being insufficient, missing, inaccurate, with unclear images is the biggest complaint." Onespace, 2018 QI & 02

Flippi, who are you meant for?

How to design technology for the next 500 million users?

E-commerce comprises less than 5% of the retail market in India. The primary driver of e-commerce in the past has been the deals and discounts that marketplaces offered, especially during the well-advertised sale days closer to the key spending seasons.

This was further amplified by the explosion of mobile device sales across India and the entry of a variety of mobile device brands to India. Year-on-year mobile sales grew rapidly to reach a vast base of 500 million people.

As with most things in India, it is hard to define a clear boundary. But there’s a high probability that a large portion of the next 500 million users come from India’s Tier 2/3/4 cities and smaller towns, mostly because that’s where 92% of India’s population lives.

And that’s where the maximum growth is happening in the mobile device adoption, first time internet usage - which will be on a phone and not a laptop/PC and that’s where giants like Jio are headed with their data and affordable entertainment devices.

200 million+ Indians access the internet through their mobile phone

Over the next five years in India, 500 million first-time internet users are expected to come online via their mobile phones. This is a population segment that has been technologically excluded, underserved, and disempowered until now and that’s who Flipkart Voice Assistant, Flippi is meant for.

“The next 500 million users are conspicuous consumers, meaning that they are willing to spend more for a product or service (than the middle class for instance) if given added safety nets and quality services, which they have for the longest time been denied.”

What’s troubling them? - Key barriers

1. Paucity of local language content on the internet

Symbols and language from unfamiliar shopping experiences mean very little to this populace — and can be intimidating. The next 500 million users have not been exposed to a self service experience in shopping (using a cart, going to “checkout”). Their typical experience is walking up to a counter and being served by the shop owner who actively helps them decide what to buy.

Local language content is key in bringing the next 500 million users online as they are not native English speakers. 70 percent of Indians found local language digital content more reliable, according to a KPMG/Google study in May 2017.

2. User Interfaces and Experiences not adapted to their social/cultural context

Building trust with the users is of utmost importance to capitalize on this demography. For that to happen, the first step is prioritizing simplicity and clarity in the communication about a product and its benefits. Universally-recognised signs & symbols can be used instead of text to drive conversions and actions - localizing UI for cultural context; something I’d like to call as designing a Local Language Interface.

Breaking the Language Barrier

With Grocery category as one of the key verticals of growth for Flipkart. Flipkart launched an In-App Voice Assistant in early 2020 to make buying groceries online as intuitive as shopping from a neighborhood store. The In-App Grocery Voice Assistant could understand local dialects, variations, colloquial terms and mixed language commands.

Top cities that used the Flipkart Voice Assistant, Flippi feature are Bangalore, Hyderabad, Delhi, Chennai and Mumbai. Interestingly, more than 20% of the new users who visit Flipkart everyday use regional languages on the Flipkart App

For many Indians, using a service such as Flipkart by voice rather than touch and text is their first choice — rather than typing searches and adding items to their cart, the huge mass of these new entrants to the internet are using voice to navigate across the shopping funnel.

According to Google, searches in local languages have seen 66% growth but still English remains the most used language across Google’s product platforms in the country. Moreover, according to the search engine giant, 9 out of 10 new internet users in the country at present are consuming online content in Indian languages.

To adjust to this new reality, Flippi - a multimodal conversational assistant was introduced. Purpose-driven innovations such as Flippi can create opportunities to improve the lives of the next 500 million mobile phone users who can now access much-needed services

Keyboard vs Voice Input

Voice is beneficial to users of all literacy levels, but especially those with lower literacy. We have observed that those with lower textual literacy often resort to voice.

As users become more textually literate, they are more likely to use the keyboard; however, this varies by location. When voice fails, many of these users fallback to the keyboard.

Barrier to using Voice

It can be frustrating.
1. Misinterpretation

Many new internet users blame themselves for a failed voice experience. Voice interpretation and speech recognition technology is not perfect yet, causing misinterpretations to happen to everyone, however new internet users often blame their sophistication and experience level instead. Almost every internet novice we have met expressed their frustration that it “didn't understand my accent.” After a few a poor experiences, they are way more likely to abandon using voice input.

Furthermore, the interaction models of voice products fail to be implanted into their muscle memory. There are multiple voice interaction types that are used today as we discussion in the parts above (e.g. dictation, commands, conversational, recording) and it’s not very clear to the users as to why and how they differ. This makes transitioning between them complicated and confusing.

For example, many, if not most, mobile device users hold their devices too close to their mouth when using voice, assuming it will help with word detection. They tend to hold their phones awkwardly and end up not looking at the screen, making it tough to see if the transcription is correct during text input. When the input is wrong, users get confused about how to correct an error and are constantly trying to fix misinterpretations with their voice by saying “pause” or “change ‘water’ to ‘weather’”. When this fails, those who can, resort to typing but it doesn’t solve the confusion and implies to them that voice isn’t as supportive as it should be.

2. Self-perception and Privacy

There is a widely accepted notion that voice tools help address illiteracy. Many users expressed their fear of being seen using voice because it could make them seem uneducated, or that their friends would make fun of them.

Another inhibitor is that people find themselves in situations where they are constantly surrounded by other people and are worried about others overhearing them and privacy.

Ghost Work-I

A lot of work that often goes into designing a product doesn't always make it to the public eye. These are often documents, Slack channels, brainstorming canvases where I spend a lot of time to get a better understanding of the problem. I look at competitors, stay up-to-date with the movements in the fintech industry, have brainstorming workshops with product managers, and prepare specifications for developers - all of which are critical parts of the process of delivering a valuable experience.

Flippi, what’s wrong?

Grocery Assistant had been live for 100% users in English and Hindi for over a year. While tracking some of the health metrics we identified that:

  1. 1. The User-level Adoption for Grocery Assistant was at 1.1% & the weekly new user share is at 80% (low retention).
  2. 2. About 45% of users convert to Browse stage in their journey on the Assistant.
  3. 3. Grocery is primarily present in Tier - 1 & 2 cities, with a low vernacular share (~ 1.8% pure Hindi) and high RFM (37% Platinum, Gold, and Silver).

In spite of the great potential of what the technology provided its users - Flipkart’s grocery assistant was struggling to establish ground. It contributed to only 0.3% of Total Order that were place on Flipkart and a mere 16% Add to Basket (A2B). From our user survey studies, we had identified a set of hypothesis which were primarily based on the gaps in the interaction model of the assistant. The problem called for some SOS problem solving.

Curious case of finding PMF - Hypotheses and Validation

Before we dived into defining the problem state we wanted to clear out some of the assumptions we that could help us in framing the problem space better.

  • 1. Low Awareness : False
    57% were aware, but 36% did not try it despite knowing. Mic icon is universally associated with Voice Interactions.
  • 2. Wrong Perception or Comprehension : False
    73% of those who tried it said it worked as expected; 82% of the those who tried would use it in the future
  • 3. Usefulness of Product : True
    Multiple Hindi users stated it’s their key method to shop groceries

Identifying Use Cases and Way Forward

To identify key leverage points in the customer journey we compared different parts of the shopping funnel qualitatively to identify the pain points and advantages for the three modes of Grocery shopping: Offline, Online (as it exists today), Ideal Online with Voice (the best online experience, and where voice can be leveraged to make it even better differentiator). The output of this activity was clearly identified use cases which we could tap into and help us in prioritization of initiatives

Based on our research we identified key experience enhancement areas and prioritize improvements accordingly and merging the voice experience within the core grocery experience rank as our top priority.

Grocery Voice Product was built in parallel to the core grocery category experience in order to test the waters; thus, we had historically decided to keep the two separate from the core experience. We realized that this decision came at a cost, as many features that were core to the user experience for Grocery Category were absent from the Grocery Assistant Experience (Examples: Quantity Collapse, Category Browse, Product Page Support, Sticky Basket). After seeing good adoption of all these features, we asked ourselves a question: should we merge the Assistant Experience with the current Grocery Experience, with the Grocery Assistant manifesting on top of the base Grocery experience, or should we continue to exist as a separate experience?

This posed a difficult decision for us, as each approach comes with its own set of advantages and disadvantages, and only after careful consideration and thorough analysis were we able to arrive at a decision that would best benefit our users. We decided that we would pick up the grocery merge initiative, for the following reasons:

Tradeoff of Experiences:

We felt that it was unfair for users who were entering the stand alone voice experience to miss out the features that were developed for the core grocery experience. The trade off the users were having to make for either of the experiences was just not justified.

Tech Effort:

I was realized that there would be considerable Product, Design, and Engineering effort to gain for the new launches once this parity was achieved. For instance, once we’d take up this merge, ~60% of JIRA backlog issues would be resolved.

Long-term View and Philosophy:

Since our long term vision was to have one assistant across the entire Flipkart app, this activity would help us solve for various use cases that grocery as a category demanded and help us better prepare for the future as we roll out Flippi across different categories - such as defining principles for the ingress point and modal-interactive layer of the assistant.

Breadboarding Flippi

The data showed that the current user interface (UI) was not meeting the objective. We had many usability issues to address in order to implement the interaction model we had envisioned for our users. At times, I was unsure what "better" should look like.

Northstar Metrics
1. Increase # of Users with at least 1 query
2. Increase # of Users adding at least 1 item to basket

Thankfully, this happened right in time when a remote research study was to be conducted. I was immediately presented with the specific problems that needed to be solved and, upon seeing them, I was suddenly filled with a plethora of ideas for potential solutions. I didn't want to slow down and start writing any one of them in detail because I was afraid I would lose the whole bunch, so I decided to create a “dump” on Miro without worrying if it was right or not. I simply placed the ideas in one column, one after the other, with no structure and no relationships between them.

I then needed to observe the participants using the grocery assistant and note every single step they took, from start to finish, including steps outside the app. Whenever they veered off the golden path and applied a compensating behavior, I would flag that. This would often indicate a problem, and the compensating behavior showed what a solution could look like. I found this to be the most effective way to gain an understanding of the reasons behind the lack of adoption.

The flagged areas would then be the starting points for the design work, allowing me to make the necessary changes to improve the interaction model.

As a VUI (Voice User Interface) designer, I realized that my role had changed drastically from my previous positions. Not only did I need to understand the dialogue flow, the language model, and how the User Interface would react to user inputs, but I also had to be on top of my game with a solid systems thinking of all surfaces areas across Flipkart where the assistant could be used. This presented quite a challenge, as many of the problems we encountered were caused by the lack of consistency stemming from the various individual efforts in voice that Flipkart had developed over the last 1-2 years.

Ghost Work - II

A lot of work that often goes into designing a product doesn't always make it to the public eye. These are often documents, Slack channels, brainstorming canvases where I spend a lot of time to get a better understanding of the problem. I look at competitors, stay up-to-date with the movements in the fintech industry, have brainstorming workshops with product managers, and prepare specifications for developers - all of which are critical parts of the process of delivering a valuable experience.

Efficient communication is essential for productive and satisfying relationships in the workplace. I practice what I preach.

Improving Processes and Fostering Collaboration
Look for ways to enhance the design workflow and collaborate with engineers and product managers to standardize processes.

Upholding Design Quality
Continually raise the quality bar by demonstrating solid craftsmanship, providing feedback, and actively participating in bug bash sessions for engineers.

Diagnosis, Insight, Action
I work with the team to identify problems and opportunities by analyzing insights and leveraging existing knowledge.

Solution Space

Solving for this merge bought together a huge set of problems, many of which were related to the core UI structure of Flipkart’s design language. To narrow down the focus, let us understand three factors had were to be balance while designing for the Assistant on Grocery:

  • 1. Consistency across Flipkart for Voice and Assistant Experience
  • 2. Handle nuances and unique user behaviour on categories like Grocery/Quick
  • 3. Combining Assistant and Voice Search

Problem 1 - Voice Ingress

When Flippi (Name of Flipkart’s voice assistant) was initially launched in the lifestyle category to a small segment of our user base, we had it manifested into a Floating Action Button using Flippi’s nemonic for the ingress. We knew it then and we knew it now that this the ingress’ manifestation as a FAB wasn’t going to scale across the app. At some point we knew we had to come back to it. And since our broader goal was to have once integrated assistant across the entire app we wanted Flippi’s design to be frame of reference for all the decisions we took.

Additionally, we needed to think through about the nemonic that was going to be used on the ingress, since the ingress to the current voice experience in Grocery was through a microphone icon whereas for Flippi it was via Flippi’s nemonic.

Flippi’s nemonic was chosen as the right approach since it provided the visual identity of Assistant’s Persona(lity) design. Even though we had our share of concerns early on, but our research suggested otherwise.

A major problem in the state of voice products has been that not all voice experiences are represented by the same icon across different apps and devices. This has created confusion for users around expectations and even privacy. Not only does it muddy the mental model but also creates visual problems too.

For the two problem that surfaced here, the Ingress Manifestation and the Ingress Iconography we took an approach to go ahead with an Microphone-based ingress on the toolbar. This was based on the following reasons:

  • 1. Market Precedence:The microphone icon is a natural choice for many voice products in the market, particularly those that feature an assistant. We have discovered that using a single point of entry for any voice-enabled feature is an effective strategy that increases users' recall and usage of the product. By utilizing a single, readily-identifiable icon, users are able to quickly and intuitively access voice-enabled features, creating an effortless and enjoyable user experience. Additionally, since the microphone icon is already a widely-recognized symbol, it further reduces the learning curve when using voice-enabled products, allowing customers to quickly become acclimated to the product.
  • 2. Piggybacking on Voice Search:Voice search has seen a strong adoption amongst the next 500 million customers, which serves as a large top of the funnel for assistants.
  • 3. Safe UI:The Ingress, in the form of FAB, was causing interference with some of the key customer-targeted actions (CTAs) - the Grocery Cateogry experience includes a sticky bottom bar which includes grocery category navigation, basket building construct and Add-to-Basket (A2B) CTA. Furthermore, it led to a higher rate of accidental clicks and an unhealthy number of curious clicks, all of which had a negative impact on the effectiveness of the overall funnel. To prevent this from happening again, we determined that an ingress on the top header of the page was essential.

Problem 2 - Assistive Panel

The core grocery experience included a sticky bottom bar which included grocery category navigation, basket building construct and Add-to-Basket (A2B) CTA. To ensure the best user experience, we needed to evaluate how the assistant’s panel would interact with these elements. As the sticky bottom-bar and basket building features could change with time, it was important to consider the impact of the panel design on the overall user experience. We needed to make sure that the design of the panel was future-proof and able to accommodate any potential changes to the sticky bottom-bar and basket building interactions.

Like like the Ingress, we took a two fold approach with this problem:

For the business critical release, whenever Flippi was invoked via the Ingress, the assistant panel would transition the sticky basket bottom-sheet out of the view port and exist in its place. Similarly, we scaled this interaction behaviour with the Quantity Selection sheet. We decided that the assistant panel would replace it. This would mean that the users could not interact with this modality by voice for quantity selection. But to circumvent this we decided that Flippi would continue to be in passive state behind the panel (something which we could easily configure through our Dialog Manager).

Like all our decisions, we had to consider the potential business impacts of the designs we propose. I had a broad spectrum of explorations to bring parity within the core grocery experience and the grocery voice assistant. One of which threw me down a rabbit hole of spatial interfaces. I wanted to think of how I might be able to derive a solution that could feel natural to our customers and allow us to use more surface area to provide an universal assistive layer across the app especially since we needed to support features such as conversational support, which would require both the keyboard and voice modalities. But the potential solution that I had in my mind would need to be stress tested across various categories, which is why we took this exploration forward with the Design System team to continue exploring the next steps of this problem - defining a pixel conservative, category agnostic solution assistant spawn interaction model

Problem 3 - Integrated Assistant Flow

In the Grocery Category specifically, the assistant was absent from product pages, which was an inconsistent experience for the users, since they expected that the assistant would be present on the product pages as well. Since Assistant was only accessible from the Home Page, users couldn’t enter the experience from anywhere in between the funnel. This was turning out to be a major problem. And Since Grocery Voice Search was essentially a subset of Grocery Voice Assistant (Browse and Filter use-cases), we decided to merge the Voice Search with Assistant in such a way that it would appear as if the user is only using one product, except whenever they’d enter a category that wasn’t supported by Flippi, it would not be handled by the Dialog Manager.

Even though the participants can understand voice search on the top bar but unable to distinguish between Voice Search and Grocery Assistant.

This was a critical piece in the puzzle, not because of the design deliverables, but because of the specifications that were required to be built for such a merge. We were to use all the existing technology with only tweaks on the UI, since Voice Search would now have the same panel as that of the Assistant. Another key problem that needed to be address in this specifications was the masking of latency - whenever their was a switch between Voice Search and Assistant. Since the latency could go beyond 1500ms we need to handle this as elegantly as we could.

The Conclusion

After considering all the nuances of the Grocery Assistant journey, I began to understand that a great portion of conversational design was focused on dealing with errors and making up for any technological limitations. Even though many of my designs were not implemented in the short-term, they still served an important purpose in helping the team ask the right questions and make informed decisions. As the project progressed, I became more and more aware of the importance of these designs and the integral role they played in the success of the project.


Looking back, there are so many things I would have done differently on the project and the experience taught me so many lessons that have helped me to grow professionally.

Let's Fork it

I learned during this project that one of the most powerful things designers can do is to keep reminding stakeholders of the problem at hand. It can be difficult to remain focused on the problem rather than the solution as more time is spent discussing, building, and iterating on a specific solution.

Power in words

I had never written so much in my previous projects, but this experience taught me to never underestimate the power of taking the lead on a first draft. Words can also be used to identify a problem space and break it down into prioritized chunks. Writing a clear proposal or creating a prototype of an experience can have a significant impact on a discussion. The first draft can be the source of truth, not just an opinion. In my experience, the first draft can determine 90% of a project. So, if you create it, you have already "won" the argument before it has even begun.

Sell Aspirations

NBU's aren't disabled, so designing technology for them should be marketed in such a way that they can boast about it.

Among all the daily tasks and roadmaps, our users are our compass. Our work has a far-reaching impact. They will guide us to the next stage of e-commerce, and beyond, into the next computing era. It's important to remember that we are not only teaching them to use our platform; we are learning by eliminating their obstacles.

Context Context Context

This was my first major 1 → N project at Flipkart, and it was a unique experience for me. It was a project that intersected with all other shopping categories, making it complex and multifaceted. It was an important lesson in project management, as I had to ensure that everyone was on the same page and that all the teams - design, marketing, business, and engineering - were collaborating effectively. The project also taught me the importance of alignment and the need to ensure that all the teams involved were working towards a common goal.

Letting it Settle in

Our reaction to feedback is often to consider how to act on it right away. However, it can be beneficial to take a step back and reflect. Doing so can help us internalize the feedback, listen to different perspectives, and then decide if and what action is necessary.

Moonshot Ideas

Long-shot problems, such as this one, require a different kind of nurturing environment. Moon-shot problems require a vision that is equally ambitious, as well as approaches to problem-solving that are equally creative.

What worked for a nuance co-branded card program, which involved an partnerships with external partners, need not necessarily have the same approach to problem-solving as one like this, where technology adoption will be slow.

Selecting the right metrics is key; the bottom line or financial value generated from such projects should never be the sole measure of success.

Be a Salesman

Evangelizing your project is as important as crafting all pieces of the project together. Being informed single-handedly about all data points can only go so far; I should have evangelized the problem statement more across the company.

Special thanks to all the designers, engineers, and marketers who assisted me on this journey: Nandini, Neha, Shivangi, Shrey, Krutika, Deepak.