Note for website readers: All navigation links on this page are internal to this document and work in all contexts. If you follow a link to a source file and it does not open, try reading this document on GitHub or in VS Code where repository files are directly accessible.

docs / ops / MIGRATION

Migration Runbook
Source Cluster → Target Cluster

End-to-end operational guide for migrating the production kea deployment from Source Cluster to Target Cluster, then upgrading to swift on Target Cluster. Audience: platform engineers and technical management.

In progress 4 open items — see § Key findings. Schema and ownership questions fully resolved.

Scope & audience

This runbook covers the full migration of user-owned data — teams, agents, knowledge-base documents, and conversation history — from a production kea deployment on Source Cluster to a fresh Target Cluster (cloud 2), followed by an in-place upgrade from kea to swift on Target Cluster.

Source Cluster and Target Cluster share no infrastructure. Every data store is explicitly exported, transferred, and verified. The guide is written so that management can track progress and risk at chapter level, while engineers have the detail needed to execute each step.

User experience goal: a Source Cluster kea user who logs in to Target Cluster finds their teams, agents, and knowledge-base documents exactly as they left them. Chat sessions are ephemeral and are intentionally not migrated.

Two-phase strategy

Source Cluster · kea

→

Chapter 1
data copy · same schema

→

Target Cluster · kea

→

Chapter 2
schema transform · same platform

→

Target Cluster · swift

Chapter 1 moves all data between two identical kea deployments. The schema is the same on both sides — no transformation is required, only transfer and verification. Users are cut over to Target Cluster kea at the end of this chapter.

Chapter 2 runs entirely on Target Cluster. Swift is deployed in a separate namespace alongside live kea. Migration scripts transform kea data into the swift schema. Users stay on kea until swift passes full validation; cutover is a single load-balancer rule change with no DNS delay.

Do not attempt a single-hop Source Cluster → Target Cluster swift migration. It conflates platform risk (Chapter 1) with schema risk (Chapter 2). If anything goes wrong there is no clean rollback path.

Transfer logistics

Source Cluster and Target Cluster share no network, no storage, and no identity infrastructure. Every byte of user data must be explicitly exported from Source Cluster, physically transported or routed through an authorized intermediary, and imported into Target Cluster. This section describes the transfer approach and sizes the data so the right method can be chosen.

The two-platform problem

Because both platforms are highly secured, the only viable transfer intermediary is an authorized workstation that is permitted to connect to services on both sides. There is no direct network path between Source Cluster and Target Cluster.

Option A — Single authorized laptop (simultaneous access)

preferred if access is confirmed

The operator runs migration scripts from one laptop that holds a VPN or bastion session to both Source Cluster and Target Cluster at the same time. Structured data (Postgres, Keycloak, OpenFGA) flows through the laptop's memory — never written to disk unencrypted. Object storage is synced via rclone or mc mirror using credentials for both endpoints.

Requires: confirmation that security policy permits one workstation to hold simultaneous authenticated sessions to both platforms. See open item below.

Option B — Sequential: export → encrypted archive → import

fallback if simultaneous access is not permitted

All data is exported to encrypted archives on the operator's laptop, transported through an approved secure channel (encrypted USB, secure file transfer system, or equivalent), then imported on the Target Cluster side. No simultaneous connectivity to both platforms is required.

Drawback: for large object storage volumes (> 5 GB) this becomes slow and operationally cumbersome. The secure channel must support the full transfer size.

Data volume estimate — 500 users, 10 teams, ~3 agents per user

Extrapolated from direct measurement of a local kea instance (real byte counts per table row). The structured data is negligibly small. Object storage is the only unknown that matters.

Store	Basis	Estimated raw	Compressed
PostgreSQL — `agent`	12 rows = 90 KB → scale ×125 (1,500 user agents)	~11 MB	~3 MB
PostgreSQL — `metadata`	4 rows = 120 KB → scale ×3,000 documents	~45 MB	~10 MB
PostgreSQL — all other tables	`tag`, `resource`, `users`, `teammetadata`	~25 MB	~5 MB
Keycloak realm export	500 users × ~10 KB/user (credentials, groups, roles)	~5 MB	~2 MB
OpenFGA tuples	~3,000 tuples × 200 B	~600 KB	<1 MB
Total structured data		~87 MB	~21 MB
Object storage (MinIO / GCS)	Unknown — must be measured on Source Cluster. Light use (20 docs/team, 2 MB avg): ~400 MB Moderate use (100 docs/team, 3 MB avg): ~3 GB Heavy use (500 docs/team, 5 MB avg): ~25 GB

Structured data is always fast regardless of method. At 21 MB compressed, Postgres + Keycloak + OpenFGA transfers in seconds over any connection. The choice of Option A vs B only matters for object storage.

Split approach when object storage is large (> 5 GB): structured data goes through the laptop (seconds); object storage sync runs from a bastion host in one of the clouds using rclone sync (server-side, no laptop bandwidth involved). This avoids routing gigabytes through a VPN tunnel unnecessarily.

Key findings & open items

All schema and ownership questions were resolved empirically by running a local kea instance and inspecting live data across every store. One item remains open.

[✓]

Agent schema — kea stores agents as payload_json blobs. The definition_ref field (e.g. "v2.react.basic") is the template key that maps directly to swift's source_agent_id. Team ownership lives entirely in OpenFGA, not in Postgres. Field map: §2.3.

[✓]

Team ID stability — team IDs are Keycloak group UUIDs, used consistently in OpenFGA tuples and Postgres session.team_id. A realm export with ID preservation keeps them stable across platforms. The teammetadata Postgres table is effectively empty in kea — teams live in Keycloak and OpenFGA only.

[✓]

MCP server table — kea's mcp-server table contains platform-level deployment config (knowledge-flow vector search, tabular, opensearch ops…). It is re-seeded by deployment on every environment. No migration needed.

[✓]

OpenFGA user format — the factory seeds each user twice: as user:alice (username) and user:<keycloak-uuid>. The import must use only the UUID format on the target to match how swift resolves identities.

[✓]

Personal team model — kea uses a single shared team:personal string. Swift assigns each user a distinct personal team derived deterministically from their Keycloak UUID. Chapter 2 must create one personal team record per migrated user.

[ ]

Maintenance window → mgmt team
Acceptable downtime for the Chapter 1 cutover must be agreed before scheduling. This determines whether the final data sync runs live (short freeze at the end) or requires a full service stop.

[ ]

Cross-platform workstation access → ops team
Confirm that one authorized laptop can hold simultaneous authenticated sessions to both Source Cluster and Target Cluster (VPN or bastion access to both platforms at the same time). This determines whether the transfer uses Option A (preferred) or Option B (sequential encrypted archive). See § Transfer logistics for full context.

[ ]

Source Cluster data volume measurement → ops team
Measure actual sizes on the Source Cluster production instance before planning the transfer. Required numbers:

MinIO bucket total size: mc du minio/<bucket> or equivalent
Postgres fred database size: SELECT pg_size_pretty(pg_database_size('fred'))
Number of rows in agent, metadata, resource, tag
Number of active users (to validate the 500-user assumption)
Number of teams (to validate the 10-team assumption)

The object storage size is the only figure that changes the transfer strategy. See § Transfer logistics for the decision thresholds.

[ ]

Agent behavioral equivalence harness → mid-July 2026
Cutover from kea to swift requires a validated set of reference questions run against equivalent agents on both platforms, with automated quality metrics. This is a hard gate on Chapter 2 cutover and a standing requirement for any future model, prompt, or retrieval change — not specific to this migration.
kea side: lightweight JSON batch endpoint to be added (WebSocket protocol makes automated replay impractical from outside the UI).
swift side: evaluation UI + script already in progress; deepeval-based open-source tooling under development.

Five stores are transferred from Source Cluster to Target Cluster. Because the application version is the same on both sides, no schema mapping is needed — only transfer and verification. At the end of this chapter, kea runs on Target Cluster with all user data intact and Source Cluster is placed in read-only mode.

Left behind intentionally: chat sessions, scheduler tasks, feedback records, execution checkpoints, logs, and KPI metrics. These are ephemeral or operational — users will not notice their absence.

1.1 Keycloak

Keycloak is the identity source for all users and groups. The full realm is exported from Source Cluster and imported into Target Cluster. Keycloak's export format preserves all internal UUIDs — including group IDs that serve as team IDs throughout the system — so no ID remapping is needed after import.

Realm export (users, groups, roles, clients)

copy as-is

Export from Source Cluster via the Keycloak Admin REST API partial-export endpoint (POST /admin/realms/{realm}/partial-export?exportClients=true&exportGroupsAndRoles=true) or kcadm.sh export. The resulting JSON file contains all users, hashed credentials, group memberships, roles, and client configurations.

Import to Target Cluster via kcadm.sh import or the Admin REST API import endpoint. Verify that group IDs in the imported realm match those in the Source Cluster OpenFGA export — they must be identical.

Hashed passwords migrate transparently. Users can log in immediately without a password reset, provided Target Cluster Keycloak uses the same hashing algorithm (bcrypt, the default).

MFA devices (TOTP / WebAuthn) must be checked before executing. TOTP secrets migrate within the realm export. WebAuthn / passkey credentials are hardware-bound and cannot be exported; affected users will need to re-enrol their devices on Target Cluster after login.

1.2 OpenFGA

OpenFGA holds all authorisation tuples: who is owner / manager / member of which team or organisation, and which user or team owns which agent, tag, or document. The tuple set is the authoritative access control state — if it is wrong, users will see incorrect team memberships and agent ownership.

OpenFGA authorization tuples

copy — UUID format only

Export from Source Cluster by reading all tuples from the store via the OpenFGA Read API with an empty filter (returns all tuples paginated). Collect the full list to a JSON file.

Deduplication required. Source Cluster seeds each user in two formats: user:alice (username) and user:<keycloak-uuid>. Keep only the UUID format — Target Cluster swift resolves identities by UUID exclusively. Filter out any tuple whose user field matches a plain username pattern (no hyphens, not a UUID).

Import to Target Cluster by creating a new OpenFGA store, applying the same authorization model (FGA schema), then bulk-writing the filtered tuples via the Write API in batches. The Target Cluster store ID will be different from Source Cluster's — update the application configuration accordingly.

Verify by spot-checking several users: confirm they appear as members of the expected teams and that their agent ownership tuples are present.

1.3 PostgreSQL

The fred database is transferred table by table. Schema is identical between kea instances. Only user-data tables are exported; platform-seeded and ephemeral tables are skipped.

Table	Action	Notes
`agent`	copy	All 12 rows including system agents — kept intact as the source for the Chapter 2 transformation
`tag`	copy	Knowledge-base tag hierarchy — identical schema
`metadata`	copy	Document metadata with tag_id references
`resource`	copy	Resource records with author and doc JSON
`teammetadata`	copy if non-empty	Team descriptions and banner keys — check row count on Source Cluster first; often empty
`users`	copy if non-empty	GCU acceptance records — only non-empty when users explicitly accepted GCU on Source Cluster
`mcp-server`	skip	Platform deployment config — re-seeded automatically by the Target Cluster kea deployment
`session`, `session_history`, `session_attachments`, `session_purge_queue`	skip	Ephemeral — users resume fresh sessions on Target Cluster
`feedbacks`	skip	Operational data — not user-facing
`tasks`, `sched_workflow_tasks`	skip	Scheduler tasks restart cleanly on Target Cluster
`v2_langgraph_checkpoint*`	skip	In-flight execution state only — not meaningful after migration

Transfer approach: use pg_dump -t <table> --data-only per table on Source Cluster, transfer the dump files to Target Cluster, and restore with psql -d fred < dump.sql. Run Alembic migrations on the Target Cluster database first to create the schema, then load data. Import order: tag → metadata → resource → teammetadata → users → agent.

1.4 Object storage

All content files and team banners are synced from Source Cluster MinIO to Target Cluster object storage. The key path structure must be preserved exactly so that existing database references (banner_object_storage_key, resource paths) remain valid without any update to Postgres records.

Target Cluster may use GCS, not MinIO. Verify the target storage backend before choosing a sync tool. Use mc mirror for MinIO-to-MinIO, or rclone sync for cross-provider transfer (MinIO → GCS or S3). Configure rclone with both source and target credentials before the maintenance window — the sync itself can run as a pre-step with a final incremental pass during the freeze.

Content bucket & banner bucket

sync — preserve key paths

Run a full sync before the maintenance window to transfer the bulk of data. During the freeze, run a final incremental sync to capture any last-minute uploads. Verify that the Target Cluster bucket names match what the kea application configuration expects (control-plane-content, app-content, etc.).

If the bucket names differ between Source Cluster and Target Cluster, update the kea application configuration on Target Cluster — do not rename files inside the buckets.

1.5 OpenSearch

The vector index is not copied from Source Cluster. Copying it would risk embedding model version mismatch and index corruption. Documents are re-vectorized on Target Cluster from the content files synced in §1.4. This is the correct and safe approach regardless of the knowledge-base size.

Re-vectorization is triggered after the kea application starts on Target Cluster. The knowledge-flow backend processes each document from object storage and populates the OpenSearch vector index. Duration is proportional to the number and size of documents ingested on Source Cluster.

The Source Cluster log store and KPI metrics index are also left behind — both are operational time-series data. Target Cluster starts a fresh observability baseline.

Knowledge-base search will be unavailable until re-vectorization completes. Plan for this gap in user communication. Agent chat (non-RAG) and team management are unaffected.

1.6 Validation checklist

Run all checks on Target Cluster kea before flipping traffic from Source Cluster. Do not cut over with any item failing.

Identity: log in as a migrated user

Log in as at least two users (one admin, one viewer). Confirm Keycloak accepts the Source Cluster password. Confirm group memberships are visible in the application.

Teams: verify membership and roles

For each team migrated: confirm the team appears in the user's team list with the correct role (owner / manager / member).

Agents: verify listing and ownership

Confirm user-created agents appear in the personal space and in each team they belong to. Confirm system agents (BankTransfer, Rico…) are present via the runtime catalog.

Knowledge base: verify tags and document listing

Confirm the tag hierarchy is intact. Confirm document metadata is listed. Vector search may still be incomplete at this point — note the gap explicitly and track re-vectorization progress separately.

Object storage: verify banner images

Open team pages that have banners set and confirm banner images load from Target Cluster storage.

Agent chat: start a basic conversation

Open a plain React agent and send a message. Confirm the streaming response arrives and the session is recorded.

1.7 Cutover procedure

The cutover transfers live traffic from Source Cluster to Target Cluster. It requires a maintenance window whose duration depends on the final incremental sync size (see open item in § Findings).

Announce the maintenance window

Notify users at least 24 hours in advance. Provide the expected duration and the Target Cluster endpoint they will use after cutover.

Put Source Cluster into read-only mode

Stop write operations: disable agent creation, document ingestion, and session writes. Read access can remain open during the final sync. Alternatively, perform a full service stop if the maintenance window allows.

Run the final incremental sync

Re-run the object storage sync for any changes since the pre-sync. Export any new Postgres rows written since the initial copy. This window should be under 30 minutes if the pre-sync was recent.

Run the validation checklist (§1.6)

All six checks must pass on Target Cluster before traffic is switched.

Switch traffic to Target Cluster

Update the DNS record or load-balancer rule to point the application hostname at the Target Cluster ingress. DNS TTL should be set low (60s) before the window to minimise propagation delay.

Keep Source Cluster in read-only for 72 hours

Do not decommission Source Cluster immediately. If a critical issue is found on Target Cluster within the rollback window, traffic can be flipped back to Source Cluster. After 72 hours without incident, Source Cluster can be shut down.

Swift is deployed in a separate Kubernetes namespace on Target Cluster alongside the live kea deployment. Migration scripts read data from the kea database (fred) and write transformed records into the swift database (fred_swift). Users remain on kea throughout; the cutover to swift is a single ingress rule change with instant effect and no DNS propagation.

What is new in swift that has no kea equivalent: the structured agent_instance table (replaces kea's opaque agent blob), the prompt library (starts empty — users build it in swift), and per-user personal teams (created by the migration script).

2.1 PostgreSQL schema transforms

The swift database (fred_swift) is created fresh by Alembic migrations. Migration scripts then copy and transform rows from fred into fred_swift. Both databases share the same Postgres instance and the same fred user.

kea (fred)	swift (fred_swift)	Action	Notes
`tag`	`tag`	direct copy	Identical schema — no mapping needed
`metadata`	`metadata`	direct copy	Verify `tag_ids` array format is compatible
`resource`	`resource`	direct copy	No structural changes
`users`	`users`	copy + defaults	Add `current_resources_storage_size = 0` for any rows missing it
`teammetadata`	`teammetadata`	copy + drop columns	Exclude kea's storage-size fields; `banner_object_storage_key` is unchanged
`agent` (UUID ids only)	`agent_instance`	full transform — see §2.3	Different table name, different structure, ownership resolved from OpenFGA
—	`prompt`, `default_prompt_usage`	start empty	No kea equivalent; users build the prompt library in swift
`session*`	`session_metadata`	skip	Ephemeral — not migrated in either chapter
`mcp-server`	—	skip	Platform config — re-seeded by swift deployment

2.2 Personal team creation

Swift assigns every user a personal team whose ID is derived deterministically from the user's Keycloak UUID via personal_team_id(user_id). Kea has no equivalent — it uses a single shared literal string "personal" as a pseudo-team. The migration script must bridge this gap.

This step is easy to miss and causes a hard failure on first login. Without a personal team row in teammetadata and the corresponding OpenFGA tuples, any user who logs in to swift will find no personal space and the application will return errors.

For each Keycloak user in the migrated realm, the script must:

Compute personal_team_id(user_keycloak_uuid)
Insert a row in fred_swift.teammetadata with that ID
Write two OpenFGA tuples to the Target Cluster store: user:<uuid> owner team:<personal_team_id> and organization:fred organization team:<personal_team_id>

The personal_team_id function is defined in the swift codebase (libs/fred-core/fred_core/teams/) and must be called from the migration script to guarantee consistency.

2.3 Agent migration

Kea stores agents as payload_json blobs with no team_id in Postgres — ownership lives in OpenFGA. Swift's agent_instance is fully structured and team-scoped. The field mapping is fully known from live data inspection.

payload_json → agent_instance field map

kea source	swift agent_instance column	Notes
`payload_json.id`	`agent_instance_id`	Preserve UUID as-is
`payload_json.name`	`display_name`	Direct copy
`payload_json.definition_ref`	`source_agent_id`	e.g. `"v2.react.basic"`
`payload_json.enabled`	`enabled`	Direct copy
`payload_json.tuning` (minus `fields`)	`tuning_json`	Strip the `fields` array — it is a template definition, not instance config
OpenFGA lookup: `team:X owner agent:Y`	`team_id`	→ `team_id = X`
OpenFGA lookup: `user:X owner agent:Y`	`team_id`	→ `team_id = personal_team_id(X)`
OpenFGA owner (user UUID)	`created_by`	Keycloak UUID of the agent's OpenFGA owner
Known at migration time	`source_runtime_id`	The runtime ID serving `definition_ref` on Target Cluster swift
Derived	`template_id`	`source_runtime_id + ":" + source_agent_id`

Filter: migrate UUID-id agents only. Kea's agent table mixes user-created agents (UUID primary keys) with platform-seeded system agents (short string keys such as "BankTransfer"). Only UUID agents are user data. System agents are re-seeded by the swift runtime catalog. SQL filter: WHERE id ~ '^[0-9a-f]{8}-[0-9a-f]{4}-'.

Plain React agents migrate cleanly. Source Cluster users primarily use v2.react.basic agents. This template exists in swift's runtime catalog. The agent is recreated with the same name, role, and description from tuning.role and tuning.description. No conversation history is lost — sessions are not migrated regardless.

2.4 OpenSearch — no re-vectorization needed

Swift uses the same embedding model as kea. The Chapter 1 OpenSearch index is therefore fully compatible with swift's knowledge-flow backend. No re-indexing is required: swift is configured to point at the existing index and search is available from the first request.

Action: configure swift's knowledge-flow backend with the Target Cluster OpenSearch endpoint and index name that were populated in Chapter 1. Verify search returns results before proceeding to §2.5.

If the embedding model ever changes in a future swift release, a full re-index from object storage will be required at that point — not as part of this migration.

2.5 Validation checklist

Run all checks on the swift namespace before switching ingress. The swift application must be reachable internally (not via public DNS) during this phase.

Personal team: verify each user has a personal space

Log in as multiple users and confirm that a personal team is visible, accessible, and correctly scoped to that user only.

Teams: verify membership and roles match kea

Compare team listings in swift against kea for at least two users. Roles (owner / manager / member) must match.

Agent instances: verify personal and team agents

Confirm that user-created agents appear in the correct personal space or team. Confirm display name, description, and template match the kea originals.

Prompt library: accessible and empty

Confirm the prompt library page loads without error and displays no entries (expected — it starts empty in swift).

Knowledge base: tags intact, search operational

Verify the tag tree is present. Confirm knowledge-base search returns results (may require waiting for re-vectorization to complete).

Agent chat: end-to-end conversation

Start a conversation with a migrated agent instance and confirm the full SSE streaming response arrives correctly.

Agent behavioral equivalence required

Run the same set of reference questions against identical agents in kea and swift and compare response quality metrics. This is a hard gate: cutover must not proceed until equivalence is confirmed.

This check is not migration-specific — it is equally required whenever a new model version, prompt, or retrieval configuration is deployed. It is therefore treated as a standing validation capability, not a one-off migration step.

Tooling target: mid-July 2026. A replay harness will be available on both sides: a lightweight JSON batch endpoint on kea (pragmatic, given its WebSocket protocol), and a full evaluation UI + script on swift (deepeval-based). Until then, manual spot-checking via the respective UIs is the fallback. See open item in § Key findings.

2.6 Cutover procedure

Cutover from kea to swift on Target Cluster is a Kubernetes ingress rule change. Because both namespaces are on the same cluster, there is no DNS propagation delay — the switch takes effect in seconds.

Confirm validation checklist complete (§2.5)

All six items must pass. Re-vectorization should be at or near 100% before cutover.

Update the Target Cluster ingress rule

Change the ingress backend from the kea namespace service to the swift namespace service. Confirm the change propagates within seconds by checking the application version banner or a health endpoint.

Keep kea namespace running for 72 hours

Do not delete the kea namespace or its database immediately. If a critical issue is found in swift, revert the ingress rule to restore kea instantly. After 72 hours without incident, decommission the kea namespace and drop the fred database.

Appendix A — Store decision table

Store / table	Chapter 1	Chapter 2	Notes
Keycloak realm	export / import	—	Preserve group UUIDs — they are team IDs everywhere
OpenFGA tuples	copy (UUID format only)	—	Deduplicate username vs UUID format
`agent` (UUID ids)	copy	→ agent_instance (§2.3)	Short-string ids are platform-seeded — skip those
`tag`	copy	copy (identical schema)
`metadata`	copy	copy
`resource`	copy	copy
`teammetadata`	copy if non-empty	drop storage_size fields	Often empty in practice
`users`	copy if non-empty	add storage_size default	Only non-empty if GCU was accepted on Source Cluster
Personal teams	—	create 1 per user (§2.2)	kea: shared literal "personal" · swift: per-user UUID-derived
`prompt`, `default_prompt_usage`	—	start empty	No kea equivalent
`mcp-server`	skip	skip	Platform config — re-seeded by deployment
`session`, `feedbacks`, `tasks`, checkpoints	skip	skip	Ephemeral or operational
Object storage (content + banners)	sync — preserve key paths	—	Final incremental sync during maintenance window
OpenSearch vectors	skip	re-vectorize from object storage	Never copy kea index — embedding model may differ
OpenSearch logs / KPIs	skip	skip	Start fresh on Target Cluster

Appendix B — Rollback strategy

Both chapters have a clean, fast rollback path. The key principle is that no source is destroyed until the 72-hour hold period expires.

Chapter 1 rollback — Source Cluster kea ← Target Cluster kea

Source Cluster remains in read-only mode for 72 hours after cutover. To roll back, re-enable writes on Source Cluster and update the DNS record or load-balancer rule to point back at Source Cluster. No data restoration is needed — Source Cluster's data was never modified.

Data written to Target Cluster kea during the rollback window is lost. Users who created agents or uploaded documents on Target Cluster after cutover will need to redo that work on Source Cluster. Communicate this risk explicitly before announcing the rollback.

Chapter 2 rollback — Target Cluster kea ← Target Cluster swift

The kea namespace remains running on Target Cluster for 72 hours after the swift cutover. To roll back, revert the ingress rule to point at the kea namespace service. The effect is instant. The fred database is untouched by the Chapter 2 migration scripts (they write to fred_swift only).

Chapter 2 rollback is the lowest-risk rollback in the sequence. Both kea and swift run on the same cluster with separate databases. No network change, no DNS propagation — one ingress rule change reverts the cutover completely.

Appendix C — Developer local setup

This appendix is for engineers who want to run the migration scripts locally to validate them before executing on production. It is not required reading for management.

Infrastructure: fred-deployment-factory

The ignored/fred-deployment-factory repository provides a full Docker Compose stack with Postgres, Keycloak, MinIO, OpenSearch, OpenFGA, and Temporal. A single command brings up the entire infrastructure:

cd ignored/fred-deployment-factory && make docker-up

Two databases, one Postgres container

The factory creates both the kea database (fred) and the swift database (fred_swift) in the same Postgres container. Both are owned by the fred user. This mirrors the production topology where the two namespaces share a Postgres service.

Database	Owner	Used by	Config key
`fred`	`fred`	kea (all backends)	`database: fred`
`fred_swift`	`fred`	swift (all backends)	`database: fred_swift`

To switch between running kea and running swift locally, change the database: field in the backend's configuration_prod.yaml. Both share the same Postgres port (5432) and credentials — only the database name differs.

Port conflicts prevent running kea and swift simultaneously on a local laptop (both bind 8222, 8111, etc.). Run one at a time by stopping the other's processes. The infrastructure services (Postgres, Keycloak…) remain running continuously.

Creating the fred_swift schema

After make docker-up, fred_swift is an empty database. Run swift's Alembic migrations to create the schema before running any migration scripts:

cd apps/control-plane-backend && alembic upgrade head

Checkpoints for repeatable migration testing

The factory supports named checkpoints — snapshots of all Docker volumes. Once you have created test data in kea (agents, teams, documents), save a checkpoint before running any migration scripts:

cd ignored/fred-deployment-factory
make checkpoint-save NAME=kea-source

To reset and replay the migration from scratch:

make checkpoint-restore NAME=kea-source && make docker-up

Configuration reference

Variable / key	kea value	swift value
Postgres database	`fred`	`fred_swift`
Postgres user	`fred`	`fred`
Postgres password (env)	`FRED_POSTGRES_PASSWORD=<change-me>`
Postgres host (local)	`localhost:5432`
OpenFGA token (env)	`OPENFGA_API_TOKEN=<change-me>`
MinIO access key (env)	`MINIO_ACCESS_KEY=admin`
Factory env var for swift DB	`POSTGRES_FRED_SWIFT_DB=fred_swift`

Migration RunbookSource Cluster → Target Cluster

Scope & audience

Two-phase strategy

Transfer logistics

The two-platform problem

Data volume estimate — 500 users, 10 teams, ~3 agents per user

Key findings & open items

1.1 Keycloak

1.2 OpenFGA

1.3 PostgreSQL

1.4 Object storage

1.5 OpenSearch

1.6 Validation checklist

1.7 Cutover procedure

2.1 PostgreSQL schema transforms

2.2 Personal team creation

2.3 Agent migration

payload_json → agent_instance field map

2.4 OpenSearch — no re-vectorization needed

2.5 Validation checklist

2.6 Cutover procedure

Appendix A — Store decision table

Appendix B — Rollback strategy

Chapter 1 rollback — Source Cluster kea ← Target Cluster kea

Chapter 2 rollback — Target Cluster kea ← Target Cluster swift

Appendix C — Developer local setup

Infrastructure: fred-deployment-factory

Two databases, one Postgres container

Creating the fred_swift schema

Checkpoints for repeatable migration testing

Configuration reference

Migration Runbook
Source Cluster → Target Cluster