April 16, 2026 in SQL, architecture by Simon Cariou7 minutes
Fred now treats tabular data like the rest of the corpus: document-scoped, team-aware, and authorization-first. This post explains why the old SQL-store model was limiting, how the new Parquet plus DuckDB runtime works, and what this changes for security, performance, and operations.
This new feature looks like a tabular refactor at first glance, but it actually reshapes a broader part of the platform.
Behind the implementation details, the improvement does something more important: it aligns tabular data with Fred’s main architectural principles.
In other words, CSV and spreadsheet-like datasets are no longer handled as a special case living beside the rest of the system. They are now treated like first-class Fred resources: document-scoped, team-aware, ReBAC-controlled, and served through the same object-storage-oriented runtime used elsewhere in the platform.
Tabular support is deceptively hard in a multi-user AI platform.
At first, loading CSV files into a shared SQL database feels natural. It gives agents a place to run queries and gives developers a familiar abstraction. But in a system like Fred, that model creates several tensions very quickly:
That mismatch is exactly what this improvement addresses.
The new design, documented in docs/design/tabular_data_store/PARQUET_OBJECT_STORE_DUCKDB.md, makes a strong architectural choice: one Parquet artifact per document, stored in content_storage, queried on demand with DuckDB.
This matters because the control plane moves back to metadata and authorization, instead of being hidden inside a long-lived SQL database.
Fred now uses a dataset-centric tabular runtime:
tabular_v1This is a clean shift in philosophy.
The runtime is no longer “put everything in one database, then try to filter access later.” It becomes “select the authorized datasets first, then make only those datasets exist for the query.”
That is a much better fit for Fred.
The most important part of this improvement is not DuckDB. It is authorization order.
Fred already uses ReBAC to express who can read what, who owns a team resource, and how private versus shared resources behave. The new tabular runtime finally makes tabular access follow the same rule.
Before a dataset is listed, previewed, or queried, Knowledge Flow resolves:
tabular_v1 artifactOnly then are SQL aliases generated and exposed.
This is why the design doc insists on a subtle but critical point: DuckDB is not the security boundary. The real security boundary is the dataset selection step that happens before query execution by the ReBAC engine.
That distinction is excellent architecture. It avoids turning SQL parsing alone into the whole defense model, and it keeps tabular access aligned with the same resource graph as documents and libraries.
The improvement also modernizes the ingestion path itself.
Instead of reading the full CSV into pandas first and then pushing it into a tabular backend, the new TabularProcessor uses a more scalable path:
That change brings two advantages at once.
First, it is a better fit for larger files because the pipeline avoids unnecessary full DataFrame materialization in the main conversion path. Second, it makes the Parquet artifact itself the durable source of truth for later queries, previews, and schema inspection.
The result is simpler and more coherent: the physical data lives in object storage, while the metadata record tells Fred how to expose it safely.
This improvement is also a good example of choosing tools for their real strengths instead of forcing everything through one storage layer.
Parquet gives Fred a portable, compact, columnar representation of each dataset.
Object storage gives Fred a deployment-friendly persistence layer that already fits MinIO and S3-compatible infrastructures.
DuckDB gives Fred fast SQL over files without needing a permanently running tabular database.
That combination is especially compelling in an agentic platform:
The implementation even supports remote object access through presigned URLs and DuckDB httpfs, which makes the runtime practical in containerized and Kubernetes-style deployments.
This is a strong operational story: object-store friendly, cloud-friendly, and still lightweight at query time.
Another valuable part of this is how much stricter query validation becomes.
The old style of query validation often relies on keyword deny-lists. That helps, but it is brittle. Here, the new tabular utilities move toward a much better model: parse the SQL with DuckDB itself, ensure there is exactly one read-only statement, and verify that the referenced relations belong to the authorized dataset aliases for the current request.
This means the protection is no longer just:
It becomes:
SELECT or WITH onlyThat is the right level of rigor for LLM-generated SQL in production.
It also pairs nicely with the runtime model: even if a query tries to escape its intended scope, the session only contains the datasets Fred deliberately mounted for that user.
This improvement also improves the platform story for operators.
The docs and configuration now converge on one recommended runtime instead of several competing tabular paths. The new guidance consistently points teams toward:
content_storagestorage.tabular_store for artifact layout and query limitsThat simplification matters.
One of the recurring costs in platform engineering is not just maintaining features, but maintaining multiple mental models for the same feature. This improvement reduces that cost. It gives tabular data a single preferred architecture, which is explained in great details in the README, configuration docs, deployment guide, developer guide, and Helm values to reflect that decision.
That is often the difference between a feature that technically exists and a feature that a team can confidently operate.
What makes this improvement worth highlighting is that it solves several problems with one coherent move.
It improves:
Most importantly, it makes tabular data feel native inside Fred.
That is the real achievement of this feature: not merely adding Parquet, not merely using presigned URLs, and not merely tightening SQL validation, but making structured data obey the same platform rules as the rest of the knowledge corpus.
For an AI system that wants to mix documents, agents, permissions, and analytics without losing control of scope, that is exactly the kind of refactor that matters.