Selecting a model for semantic search at Dropbox scale

Selecting a model for semantic search at Dropbox scale

Nautilus is our search engine for finding documents and other files in Dropbox. Introduced in 2018, Nautilus uses a conventional keyword-based approach that, while functional, has some inherent shortcomings. Because Nautilus has limited contextual understanding of what someone may be looking for, users are required to precisely recall a file’s exact name or the specific keywords within. For instance, a search for “employment contract” may overlook relevant “job agreement” or “offer letter” documents, as Nautilus did not grab their contextual similarity. And for multilingual users, Nautilus expects queries and documents to be in the same language, hindering efficient retrieval when dealing with content in different languages.

To mitigate these limitations, we considered techniques such as stemming, spelling correction, and query expansion for improved flexibility. However, we wondered if we could elevate the Dropbox search experience further. Could it be possible to help users find their content without needing to know the exact search term?

Enter semantic search. Rather than rely on exact keyword matches, semantic search aims to better understand the relationship between user queries and document content. This functionality ultimately enables Dropbox users to locate crucial information more quickly, so they can spend less time searching and more time focusing on the task at hand.

For multilingual users, semantic search also unlocks another capability: cross-lingual search. This advanced feature allows users to search in one language and receive relevant results in other languages, further enhancing accessibility and usability.

We’re excited to share that Dropbox now supports semantic search (powered by Nautilus), adding the aforementioned capabilities. We rolled it out for Dropbox users internally in early 2024 and then externally as an experiment for a subset of Pro and Essential users in May 2024. And with this release, we observed a nearly 17% reduction in empty search sessions (measured by ZRR, or zero-results rate), and a 2% lift in search session success (measured by qCTR, or qualified click-through rate). Based on these positive results, we decided to make semantic search generally available to all Pro and Essential users in August 2024, and coming soon in early 2025 for Business users.

How informative was this article?

Powiązane wpisy