August 17, 2025 in architecture, agents, rag by Thomas Hedan, Simon Cariou, Alban Capitant and Dimitri Tombroff4 minutes
Fred uses REST for graph-driven RAG (Rico, Rico Pro) and MCP for tool-driven RAG (Dominic). Running both lets us measure latency, reliability, governance, and developer ergonomics—without changing our UI or schemas.
As part of the Fred platform, we’re exploring two integration styles for Retrieval-Augmented Generation (RAG):
Both paths return the same Pydantic schema — VectorSearchHit
— so our UI, citations, and analytics stay consistent while we experiment.
We standardized on a single hit model:
# fred_core/vector_search.py
class VectorSearchHit(BaseModel):
content: str
page: int | None = None
section: str | None = None
uid: str
title: str
# ... (author, created, modified, file_name, tags, score, rank, etc.)
Everything that does retrieval — HTTP clients, MCP tools, or services — returns VectorSearchHit[]
. That’s what makes A/B comparisons honest.
VectorSearchClient
).[hit.model_dump(), …]
to response_metadata["sources"]
for the UI.Strengths
Trade-offs
search_documents_using_vectorization
).VectorSearchHit[]
, validated and attached to metadata.Strengths
Trade-offs
We want evidence, not vibes. Running both in production-like scenarios lets us compare:
Dimension | REST (Rico/Rico Pro) | MCP (Dominic) |
---|---|---|
Latency | 🟢 Usually lower | 🟡 Slightly higher (tool call overhead) |
Determinism | 🟢 Full graph control | 🟡 Model decides when/what to call |
Governance | 🟡 App-level | 🟢 Centralized in MCP server |
Credential Scope | 🟡 Distributed in agents | 🟢 Centralized & rotated server-side |
Evolvability | 🟡 Code deploy to add a tool | 🟢 Tool discovery/versioning via MCP |
Testing Simplicity | 🟢 Pure Python unit tests | 🟡 Needs MCP client/mocks |
Because both emit VectorSearchHit
, the UI, citations, and analytics are identical. That lowers the cost of experimentation.
hits = search_client.search(query=question, top_k=5, tags=tags)
answer = await model.ainvoke([HumanMessage(content=prompt_from(hits))])
answer.response_metadata["sources"] = [h.model_dump() for h in hits]
response = model.invoke([base_prompt] + state["messages"])
for msg in state["messages"]:
if isinstance(msg, ToolMessage) and msg.name == "search_documents_using_vectorization":
data = msg.content if isinstance(msg.content, (list, dict)) else json.loads(msg.content)
hits = hits_adapter.validate_python(data) # -> List[VectorSearchHit]
response.response_metadata["sources"] = (response.response_metadata.get("sources", []) +
[h.model_dump() for h in hits])
Same output shape, different control surface.
We’ll publish our notes and any sample configs as we go.
VectorSearchHit
) so switching paths doesn’t ripple into your UI or analytics.Running both approaches in parallel is the point: Fred is a lab for building serious, evaluable agent systems.
VectorSearchHit
.Let’s keep the experiments honest and the abstractions thin.