December 11, 2025 in platform, update by Romain Perennes, Dimitri Tombroff and Simon Cariou5 minutes
Fred’s latest release strengthens retrieval: Lucene HNSW with cosine, semantic/hybrid/strict modes, corpus/global scope controls, chunk visibility, and attachment summaries injected into context. This post explains the why and the how.
Fred has always been about trustworthy, grounded answers. In the last month we pushed a set of retrieval upgrades that make the document backbone faster, clearer, and more controllable for anyone who simply wants answers with receipts.
Most of our users run Fred on-premise because their documents are sensitive, regulated, or simply too valuable to entrust to public-cloud LLM APIs. That makes retrieval quality and control non-negotiable: corporate corpora are complex, access-scoped, and often enormous. This post explains how the new stack serves those realities while staying approachable to teams that just want reliable answers from their own data.
Retrieval-augmented generation (RAG) stands on three pillars: vectors that accurately capture meaning, controls that let you widen or narrow the search surface, and enough visibility that users trust the answers they see. If semantic similarity drifts, you start from the wrong evidence. If scope is fixed, you can’t serve both “corpus-only” compliance and “bring in more context” exploration. And if users can’t see or steer what’s happening, confidence erodes. The work in this release reinforces all three pillars so that grounding is both precise and observable.
Every document chunk is turned into a vector: a long list of numbers that captures meaning. Similar texts have vectors that “point” in similar directions. When you search, your question is also embedded into a vector. Vector search then finds the closest vectors—semantically similar chunks—rather than just keyword matches.
Fred supports multiple stores (Chroma for lightweight/dev, OpenSearch for production scale). The recent work focused on OpenSearch because it’s distributed, battle-tested, and now offers Lucene-native HNSW with cosine similarity out of the box.
This is why OpenSearch matters here: it lets us mix lexical and vector legs, fuse scores, and enforce filters at query time so the model only sees the right slices of your corpus.
We upgraded to OpenSearch 2.19+, enabling Lucene-native HNSW with cosine similarity. That brings tighter score distributions, better recall, and more stable latency. We also validate index compatibility (dimension/engine/space/method) on startup, so misconfigured indices fail fast instead of failing silently.
Hybrid mode now uses OpenSearch’s hybrid query plus a post-processing pipeline to normalize and blend BM25 and vector scores. This keeps “exact word matches” and “semantic matches” in the same ranked list without one starving the other.
The UI isn’t a thin veneer; it exposes the same controls the backend enforces. Scope toggles (corpus-only, global, hybrid) feed directly into the agent runtime so the chosen policy is honored end-to-end. Score filtering is semantic-only by design, with a default of 0.0 to avoid dropping low-but-relevant hits; it is transparent and opt-in. Top-k and pipeline selection are surfaced so teams can tune breadth versus latency and pick processing paths without redeploying. Chunk visibility shows how many snippets per document are in play, turning retrieval from a black box into something users can inspect and trust. Together these contracts keep the UI honest and the core predictable.
Not every file a user drops into chat should be sent through the full RAG ingestion flow—indexing every ad-hoc upload would be slow, costly, and often unnecessary. Yet it’s useful for a user to say “consider this PDF I just shared” and have the agent stay grounded on that content during the conversation.
Fred addresses this with lightweight attachment processing pipelines instead of immediate full-corpus indexing. When users drop files into chat, we run a summarizer that produces compact “agentic RAM” snippets and injects them into the conversation context. The benefits:
If a document later proves valuable for broader reuse, it can be promoted into the full RAG corpus via the heavier ingestion path. Until then, chat-time summaries give users immediate grounding without paying the indexing cost.
When a provider safety filter triggers, backend logs now clearly label guardrail/refusal states with type, status, request ID, and details. Operators can tell if a response was blocked by policy versus an actual failure. Users get a clear fallback that explains the block without guessing.
The value is practical: strict and corpus-only modes keep answers honest when evidence is thin, while hybrid mode blends precision and recall without guesswork. Operational guardrails—index health checks, explicit policies, and richer logs—cut down on on-call surprises. The UI mirrors backend controls, so even non-experts can see and steer what’s being retrieved. And because the stack runs on Lucene HNSW with cosine and applies filtering only where it makes sense, you get relevance and speed without silently dropping useful results.
Fred’s agents, UI, and security continue to evolve, but the retrieval backbone is already stronger, clearer, and more controllable. If you want grounded answers with real levers for control, give the new stack a spin.