Concepts

This section roughly explains how chat and search implementations work in those particular aspects understanding will allow better use and configuration of the services.

How search works#

Search begins with your catalogs. Products and pages are indexed, fields are declared for filters/facets, and text content is embedded with the chosen OpenAI embeddings model so it can be stored in the vector index.

When a query arrives, a configuration is selected (either by API parameter or by UI configuration). That configuration defines the search mode (semantic, fulltext, hybrid, etc), optional catalog inference (product vs pages), and any filters or facets to apply.

The engine runs a semantic vector search (and perhaps also a fulltext search) to collect candidates, see Search Mode. Relevance pruning happens next: clustering can drop low-score clusters and a relevancy threshold can exclude or mark weak hits.

Finally, post-processing plugins (such as redirects or variant un-grouping) run in order, and the service returns ordered results plus facets/metrics.

References:

how chat works#

Chat reuses the search pipeline to ground answers in your catalog. Each user question can be contextualized with chat memory (when enabled) so follow-up questions are rewritten before search. The service infers or respects the requested document type (products or pages) using the inference prompt, then executes search with the active configuration, including filters, clustering, and relevancy thresholds. The top N documents are sent to the LLM together with the configured prompt (or pages prompt when the query targets pages), which controls tone, formatting, and how products should be listed. If nothing relevant survives the filters, a relevancy prompt can be used to ask the user for better context instead of hallucinating an answer.

References:

Terminology#

semantic search: Vector similarity search over embedded catalog content to rank results by meaning instead of exact keyword matches. See Search modes
fulltext search: Keyword-based matching with stemming/stop-words that favors documents containing more of the query terms; used alone or to re-rank semantic results in hybrid mode. See Search modes
document: A single indexed record from a catalog (product or page) including its fields, raw text, and stored embeddings. Document contains all the information provided in the catalog, plus a text generated from some fields which is used to create the semantic embedding, see bolow
embedding: Numeric representation of text produced during indexing with the selected OpenAI embeddings model (e.g., OpenAi's text-embedding-3-small or text-embedding-3-large models) and used for semantic similarity scoring. For each document a document text representation is geerated using some relevant-semantic fields and from this text an embedding is stored to run semantic searches.
vector store: The index that holds embeddings for every document so nearest-neighbor semantic queries can be executed quickly.
prompt: The instructions given to the LLM (chat prompt, pages prompt, inference prompt, relevancy prompt) that shape how answers are generated from search results.
clustering: A relevancy filter that groups results by score and returns only the top clusters to remove noise.
LLM: Large language model invoked for catalog inference, optional query contextualization, and final chat answers based on retrieved documents. Example implementations: OpenAI's ChatGpt, Google's gemini, DeepSeek, etc.
back-office or admin-panel: A web application that allows store administrators, create their data catalogs, configure search and chat services and test both in a playground.