State-of-the-art hybrid search for RAG and AI App

Posted on July 25, 2024July 25, 2024 by cloudmatrix.website

Snowflake Cortex Search, a fully managed search service for documents and other unstructured data, is now in public preview. With Cortex Search, organizations can effortlessly deploy retrieval-augmented generation (RAG) applications with Snowflake, powering use cases like customer service, financial research and sales chatbots. Cortex Search offers state-of-the-art semantic and lexical search over your text data in Snowflake behind an intuitive user interface, and it comes with the robust security and governance features that Snowflake is known for.

Solving the challenges of building high-quality RAG applications

From the beginning, Snowflake’s mission has been to empower customers to extract more value from their data. In the era of enterprise AI, this mission extends more than ever to unstructured data, where RAG has become a standard approach to customizing generative chat applications with proprietary data. RAG empowers organizations to create, among many other things, powerful customer service, sales and R&D applications that accurately leverage their proprietary data.

Yet, while retrieval is a fundamental component of any AI application stack, creating a high-quality, high-performance RAG system remains challenging for most enterprises. Consider the components one must manage to successfully deploy RAG at scale:

Infrastructure and operations: Platform teams have to deploy and manage numerous retrieval components — hosted embedding models, vector databases, data indexing pipelines, hosted reranking models, observability tools and more.
Search-quality tuning: Engineers and data scientists have to spend time evaluating models and parameter configurations to tune the retrieval and ranking components to their specific business use cases.
Security and governance: Security teams have to conduct extensive reviews to ensure that each component in the stack is treating data securely and respecting governance policies.

Cortex Search provides hybrid search at enterprise scale

Cortex Search is natively integrated into Snowflake, built to serve queries in 200-300 ms over large volumes of text. It supports “fuzzy” search — the service takes in natural language queries and returns the most relevant text results, along with associated metadata. It’s optimized for low latency, making it an ideal backend for interactive end user applications. And when combined with industry-leading LLMs in Cortex AI, Cortex Search can be used to develop powerful chatbots.

*Figure 1: RAG with Cortex Search and Cortex LLM functions*

Cortex Search provides world-class, AI-powered search capabilities at a lower total cost of ownership (TCO). This means you can spend less time on infrastructure management and retrieval-quality tuning, and more time on building delightful AI-powered applications for end users. It’s designed with the following tenets in mind:

Easy to use: Fully managed infrastructure means that operational responsibilities are handled by Snowflake. Cortex Search offers automated, incremental ingestion with low-latency serving.
State-of-the-art search quality: Get state-of-the-art “fuzzy” search capabilities out of the box — no tuning required.
Secure and governed: Benefit from the same security and governance features as the rest of your Snowflake data.

“We’ve developed Coda Brain, an AI platform that understands and allows users to act on all of their company’s structured and unstructured data. For Coda Brain’s unstructured RAG system, we needed high-quality search results, and we didn’t want to be in the business of managing search infrastructure at scale for each tenant. Cortex Search is a great fit to power Coda Brain — we’re getting better search quality than we’d get from similar search products, with minimal operational overhead on our side.”

Shishir Mehrotra, Co-Founder and CEO, Coda

Fully managed indexing and serving

Cortex Search enables anyone in the organization to leverage a powerful search engine. The service automatically indexes and embeds your data in an incremental fashion, meaning it only processes changed rows from the underlying data source.

All of the operational complexity of building the search service is abstracted into a single SQL statement for service creation. This removes the burden of creating and managing multiple processes for ingestion, embedding and serving, ultimately freeing up time to focus on developing cutting-edge AI applications.

*Figure 2: One SQL statement – declarative interface for defining a Cortex Search service*

Once the service is created, it’s easy to query it from your application via REST or Python APIs. This includes both applications hosted in Snowflake (e.g., Streamlit in Snowflake) or applications hosted in an external environment.

State-of-the-art search quality with hybrid search

Cortex Search provides state-of-the-art search quality behind a user-friendly interface. The backbone of Cortex Search retrieval is the vector search component, which is powered by Arctic Embed M, Snowflake’s high-performance, cost-effective model. On top of Arctic Embed, Cortex Search leverages lexical search and reranking in what is known as a “hybrid” approach to retrieving and ranking. Thus, each search query to a Cortex Search service uses:

Vector search for retrieving semantically similar documents
Keyword search for retrieving lexically similar documents
Semantic reranking for ranking the most relevant documents in the result set

This ensemble retrieval technique additionally supports metadata filtering on all queries, allowing you to filter the search to a subset of relevant documents. For example, a customer support agent could filter their fuzzy search query to only tickets from the year 2024 from customers in the EMEA region.

Hybrid search outperforms vector or keyword search alone

Cortex Search combines the strengths of vector search, keyword search and semantic reranking into a single search interface. Our internal research shows that this approach yields higher quality search results across a variety of RAG-oriented search workloads than a vector search or a keyword search alone. This means that you get an out-of-the-box quality boost over standalone vector databases, which typically provide only vector search without lexical search or reranking. In fact, on a sampled set of public and proprietary “question-and-answer”-style benchmarks, we found that Cortex Search’s hybrid retrieval approach achieved a more than 12% retrieval boost, as compared to simple vector search alone, and drastically outperforms simple keyword search (see Figure 3 below).

*Figure 3: Retrieval quality with Cortex Search compared to vector or keyword search alone*

This complex retrieval and reranking stack is fully managed, saving you from having to stitch together and tune hyperparameters for multiple retrieval and reranking services. More details on the research behind the Cortex Search retrieval stack will be shared on our Snowflake Engineering Blog.

Upholding Snowflake’s high security and governance standards

While ease of use and great search quality are important features, we know that a strong security and governance posture is absolutely critical for any enterprise developing AI-powered applications.

Secure: All Cortex Search operations, including vector embeddings and search query serving, run fully within the Snowflake perimeter and each customer’s data is isolated from all others.

Governed: Cortex Search services are schema-level objects in Snowflake and integrate with existing role-based access control (RBAC) policies in a Snowflake account. This means you can grant usage on a service like you would any other Snowflake object. For document- or chunk-level access controls, you can use metadata filtering to ensure that the service only returns the results that the client is authorized to view.

Real-world use cases of Cortex Search

Snowflake customers are developing a range of AI-powered search applications in Snowflake, including:

Research and productivity assistants: Chatbots, enabled with the context of your enterprise’s proprietary data, help to drive team efficiency, simplifying the process of searching through large sets of documents for relevant information. Examples include:
- – Customer Support: Help support agents to quickly and efficiently triage tickets and find the best answer for the customer’s question
- – Financial: Enable financial analysts to quickly retrieve and compare earnings reports across companies
- – Sales: Provide sales personnel with case studies and pitch materials most relevant to their customer
- – R&D: Assist researchers in finding and synthesizing scientific literature that is pertinent to their ongoing research

AI-powered search: Minimize time to finding relevant unstructured information, resulting in more effective and efficient use of the most valuable assets. Examples include:
- – Product search: Lexical and semantic search for products in an online catalog based on product title, specifications and reviews
- – Documentation search: Site search for navigating technical support and product documentation pages

This is just the beginning for Cortex Search, and we’re excited to see what customers build next with this powerful search capability in Snowflake.

Try out Cortex Search

Starting today, Cortex Search is in public preview in these Snowflake regions, and rolling out soon to an extended set of global regions. Check out the following resources to get started: