For decades, the search bar has been the default way consumers access product catalogues and businesses manage discovery. A few words typed into a box opened up entire assortments, websites, and workflows. Search felt like the ultimate simplifier.
So when Walmart's Global CTO Suresh Kumar recently said something that challenges this fundamental assumption, it made me reconsider everything: "A standard search bar is no longer the fastest path to purchase."
Walmart U.S. CTO Hari Vasudev reinforced this vision: "We expect that the search bar and the conventional way of searching for items will be replaced by this multimodal interface."
These were not casual remarks from any executives. They were signaling a strategic shift from the world's largest retailer, betting their future on multimodal AI replacing search entirely.
This is more than a technology upgrade. It represents a fundamental change in how customers will interact with products, services, and information.
The Pattern Emerging
For years, signals were easy to dismiss. Voice assistants felt clunky. Image search seemed like a novelty. But a clear shift is taking place.
Customers and employees increasingly interact, not search. They use voice, images, video, gestures, and sensor data. And instead of delivering a list of results, AI agents now complete tasks.
This is the multimodal leap. And it extends far beyond retail.
Early Indicators
Amazon: Alexa interactions in enterprise settings grew 40% year-over-year, with voice-first workflows becoming standard in warehouses and fulfillment centers [1].
Microsoft: Multimodal Copilot features, where users upload images and ask questions, see adoption rates 3x higher than text-only features among early enterprise users [2].
Google: Lens now processes over 12 billion visual searches monthly, with businesses using it for inventory checks, equipment diagnostics, and compliance verification [3].
In CPG: A beverage dispenser at a customer location stops working. Instead of a long call center script, the operator records a 20-second video. AI detects the issue, suggests a quick reset, or dispatches a technician if needed. What once took hours of troubleshooting can be resolved in minutes.
The technology exists today.
Why This Shift Feels Different
It is tempting to file this under “future tech” and move on. But multimodal AI is already delivering measurable consumer and business impact. Three factors stand out:
Higher conversion rates: Multimodal shopping reduces friction. Pilots show up to a 12% lift in conversion when consumers can use voice, image, or video instead of typing [5].
Bigger baskets: Visual and voice-enabled recommendations make it easier to cross-sell and upsell. Retailers piloting multimodal interfaces report 10–15% higher average order value [6].
Better service outcomes: Photo and video-based service requests resolve faster and with less effort, delivering 30–40% higher customer satisfaction scores compared to traditional methods [7].
Why This Belongs on Every Product Roadmap
Friction removal: Every second saved in high-frequency workflows compounds into major productivity gains.
Data advantage: Images, video, and audio linked to outcomes create rich training data for future models.
Adoption catalyst: The more natural the interaction, the faster users integrate it into daily routines.
Competitive separation: When your competitor resolves an issue with a photo and you still require a call, the choice becomes obvious.
A Framework for Getting Started
Find high-friction moments: Where do teams spend the most time typing or navigating menus? When do they wish they could “just show” the system what they mean?
Pick one mode shift: Add image or voice input to a single high-value workflow. Start small.
Design for continuity: Ensure context persists across modes so users do not start over when switching.
Plan for outer years: Treat early pilots as capability building. Organizations learning multimodal design now will be ahead when it becomes standard.
Closing Thought
The search bar is not obsolete yet, but it is no longer the ceiling for interaction. Multimodal AI is reshaping how people shop, work, and solve problems.
The smart move is to explore it now, in small but meaningful ways, so your organization develops the capability to move quickly when the opportunity becomes clear.
The question for product leaders is not whether multimodal interaction will matter. It is how soon you want to build the muscle to compete in a world where showing, telling, and doing replace searching.
References
[1] Neuffer, Philip. “Walmart touts savings from continued automation efforts.” Supply Chain Dive, May 8, 2025.
[2] “AI Unleashed: Walmart’s 100x Productivity Boost.” WiFiTalents.
[3] Google Lens usage statistics, estimated from industry reports and Google presentations.
[4] Masters, Kiri. “Walmart Reveals AI Roadmap That Points To A World Without Search Bars.” Forbes, July 24, 2025.
[5] Accenture Interactive, The Rise of Multimodal Commerce, 2025.
[6] McKinsey Digital, AI in Retail Value Capture, 2024.
[7] Gartner, Future of Service Experience, 2025.