July 20, 2025 in architecture, ops, orchestration by Alban Capitant, Simon Cariou, Florian Muller and Dimitri Tombroff5 minutes
Thanks to Temporal, Fred makes document ingestion as reliable as microservices — with automatic retries, fine-grained observability, and custom pipelines for push or pull ingestion. A true game-changer for AI apps.
Ingesting documents sounds simple — until you need to do it at scale, reliably, and with visibility.
What if you could:
Fred now does exactly that. Thanks to Temporal, we’ve given our ingestion system a durable, observable, and developer-friendly architecture.
When you build GenAI apps that process user documents, ingestion is always the first bottleneck.
You need to:
And if any of this fails — especially in a batch — you want to know which file, why, and how to fix it.
Traditionally, teams solve this with brittle scripts, Celery queues, or one-off APIs. But that leads to poor observability, no retry logic, and painful debugging.
Fred now uses Temporal workflows to define its ingestion pipeline in three reusable steps:
Extract Metadata
Input Process
Output Process
Each step is a Temporal activity — which means:
The architecture is quite simple and is illustrated next:
flowchart TD subgraph UI["🧑 User Interface"] User[Fred Web UI] end subgraph KnowledgeFlow["📚 Knowledge Flow Backend"] KnowledgeAPI[Ingestion API] TemporalClient[Temporal Client SDK] end subgraph Temporal["⏱️ Temporal Orchestration Layer"] TemporalServer[🧭 Temporal Server] Worker1[🔧 Knowledge Flow Worker 1] Worker2[🔧 Knowledge Flow Worker 2] ScaleHint[...auto-scale as needed...] end User -->|REST or Upload| KnowledgeAPI KnowledgeAPI -->|launch pipeline| TemporalClient TemporalClient --> TemporalServer TemporalServer --> Worker1 TemporalServer --> Worker2 subgraph Workers["📦 Ingestion Workers (Knowledge Flow Workers)"] Worker1 Worker2 ScaleHint end style User fill:#d0e1ff,stroke:#333,stroke-width:1.5px style KnowledgeAPI fill:#d7fada,stroke:#333,stroke-width:1.5px style TemporalClient fill:#bbf3ff,stroke:#333,stroke-dasharray: 3,3 style TemporalServer fill:#ffd6e7,stroke:#333,stroke-width:1.5px style Worker1 fill:#f2f2f2,stroke:#333,stroke-width:1.5px style Worker2 fill:#f2f2f2,stroke:#333,stroke-width:1.5px style ScaleHint fill:#ffffff,stroke:#888,stroke-dasharray: 5,5
A closer look at the worker shows how its architecture is laser-focused on turning raw input — whether local or remote — into structured, searchable knowledge through a streamlined fetch → process → generate flow.
flowchart TD subgraph Source["📦 Input Sources"] Push[🗃️ S3 Fred Store] Pull[🌍 Remote Source] end subgraph Worker["⚙️ Knowledge Flow Worker"] Step2[🧠 LLM: Metadata + Markdown] Step3[🔎 LLM: Embeddings] Step4[💾 Save Artifacts] Step2 --> Step3 --> Step4 end subgraph Output["📤 Generated Artifacts"] Markdown[📄 Fred Previews] Embeddings[🧬 Fred Vectors] Archive[🗃️ S3 Fred Store] end Push --> Step2 Pull --> Step2 Step2 --> Markdown Step3 --> Embeddings Step4 --> Archive style Step2 fill:#e6f7ff,stroke:#333,stroke-width:1.5px style Step3 fill:#e6f7ff,stroke:#333,stroke-width:1.5px style Step4 fill:#f9f9f9,stroke:#333,stroke-width:1.5px style Push fill:#f2f2f2,stroke:#333,stroke-width:1.5px style Pull fill:#f2f2f2,stroke:#888,stroke-dasharray: 5,5 style Markdown fill:#f0f8ff,stroke:#333,stroke-width:1.5px style Embeddings fill:#e2ffe2,stroke:#333,stroke-width:1.5px style Archive fill:#eeeeee,stroke:#333,stroke-width:1.5px
Here’s how the full workflow looks in code:
@workflow.defn
class Process:
@workflow.run
async def run(self, definition: PipelineDefinition) -> str:
for file in definition.files:
metadata = await workflow.execute_child_workflow(ExtractMetadata.run, args=[file])
metadata = await workflow.execute_child_workflow(InputProcess.run, args=[file, metadata])
await workflow.execute_child_workflow(OutputProcess.run, args=[file, metadata])
return "success"
You can ingest 1 file or 1,000 — each goes through its own reliable workflow.
This design changes the game.
We’re now prototyping pipelines that:
If you run Fred locally start a temporal server (it will be shortly integrated into the devcontainers):
# Download the CLI
curl -LO https://github.com/temporalio/cli/releases/download/v1.4.1/temporal_cli_1.4.1_linux_amd64.tar.gz
tar xvzf temporal_cli_1.4.1_linux_amd64.tar.gz
sudo cp temporal /usr/local/bin/
# Start the Temporal server locally (for testing)
temporal server start-dev
You can also run it via Docker using
temporalio/auto-setup
:
https://hub.docker.com/r/temporalio/auto-setup
Then, make sure to:
config/configuration.yaml
of the Fred Knowledge Flow backendtemporal
sectionThis allows the ingestion API to act as a Temporal client and dispatch workflows to your Temporal server.
To trigger an ingestion manually visit Fred UI and launch a Process
on your files.
Once everything is running, you can:
http://localhost:8233
)With just a few lines, you’ll have a full ingestion pipeline running — durable, observable, and ready for scale.
Fred’s ingestion system is now ready to support:
We’re also exploring:
With Temporal in the picture, Fred’s ingestion story goes from “hope it works” to “I can see exactly what happened.” That’s a huge step for reliable AI pipelines.