Document Agents

Agent Thesis

DataZoom's document agents govern the creation, maintenance, and operational use of every artifact in the docs/ hierarchy, the datazoom-qa/ QA corpus, and the living action-plan network under docs/action_plans/. Each agent treats these files as its primary working memory: consuming upstream documents to derive context, emitting structured outputs that downstream agents and human reviewers can act on, and retiring superseded material into docs/archive/superseded/ when a plan's lifecycle closes. The agent network covers five document domains: (1) architecture and infrastructure specs (docs/AI_SERVICES.md, docs/BACKEND_INFRASTRUCTURE.md, docs/RAG_SYSTEM.md), (2) feature action plans (e.g., docs/cap-table/action_plans/, docs/action_plans/e_sign/), (3) QA and test manifests (datazoom-qa/COMPLETE_MANIFEST.md, datazoom-qa/BUILD_READY_SUMMARY.md), (4) schema and data dictionaries (docs/archive/superseded/DATABASE_SCHEMA.md, migration SQL files), and (5) operational runbooks (docs/archive/start_system.sh, deployment guides). Agents do not generate content from imagination; every claim in an emitted document must be traceable to a source file, API route, database schema field, or CI artifact in the repository.

Agent Roles

Agent	Inputs	Outputs	Review Gate
Ingestion Pipeline Agent (BR-001)	Raw documents uploaded via `product/app/(app)/documents/` UI; Supabase `documents` table (`filename`, `document_type`, `full_text`, `page_count`, `char_count`); Docker worker image `ghcr.io/midwestco/datazoom-worker:latest` logs from `docker/Dockerfile.worker`	`summary` field written to `documents` table; `document_chunks` rows with `embedding VECTOR(384)` via `sentence-transformers all-MiniLM-L6-v2`; status update records consumed by `product/app/api/admin/pipeline/health/route.ts`	Human reviewer confirms chunk count and embedding integrity via `product/app/(app)/admin/pipeline/page.tsx` before document is surfaced in chat (UJ-001)
RAG Retrieval Agent (BR-002)	`document_chunks` table (cosine similarity index `ivfflat`); user query from `product/app/(app)/context/components/strategic-input.tsx`; conversation history from `conversations` & `messages` tables	Ranked chunk set passed to LLM (Modal Qwen2.5:32B or local Ollama via `llm` service on port 8001); cited answer surfaced in `product/app/(app)/context/components/conversation-panel.tsx`; interaction record written via `product/app/api/ai/track-interaction/route.ts`	Citation system test `product/lib/__tests__/citation-system.test.ts` must pass; enhanced retrieval validated by `product/lib/__tests__/rag-retrieval-enhanced.test.ts` (TEST-001)
Cap-Table Extraction Agent (BR-003)	Equity documents from `documents` table where `document_type = 'equity'`; Modal endpoint triggered by `product/app/api/cap-table/extract/route.ts`; feature gate evaluated by `product/lib/cap-table/__tests__/feature-gate.test.ts`	Candidate transaction rows queued for review in `product/app/api/cap-table/review/route.ts`; observability metrics logged (see `product/lib/cap-table/__tests__/observability.test.ts`); auto-populated cap table via `product/app/api/cap-table/auto-populate/route.ts`	Human approves or rejects each candidate via `product/app/api/cap-table/review/[id]/approve/route.ts` and `/reject/route.ts`; smoke validated by `product/lib/__tests__/cap-table-extract-trigger.test.ts` (TEST-002)
Due Diligence Checklist Agent (BR-004)	Business type profile from `product/app/api/business-type-profiles/[profileKey]/route.ts`; checklist template from `product/app/api/business-types/[typeKey]/checklist/route.ts`; document metadata (`parties[]`, `key_terms[]`, `effective_date`) from `documents` table	Populated due-diligence checklist rendered in `product/app/(app)/[type]/[id]/views/dd-match-panel.tsx`; template events verified by `product/lib/due-diligence/__tests__/template-events-contract.test.ts` and `template-events-route.test.ts` (TEST-003)	Tech lead reviews template schema normalization (see PR #108) before checklist is locked; chat output validated via `product/lib/due-diligence/__tests__/chat-output.test.ts`
Advisor / Risk-Memo Agent (BR-005)	Strategic context from `product/app/(app)/context/components/strategic-overview.tsx`; advisor queue processed by `product/app/api/advisor/process-queue/route.ts`; batch jobs submitted via `product/app/api/advisor/batch/route.ts`	Risk memo document written to `product/app/api/advisor/risk-memo/route.ts`; strategic options emitted to `product/app/api/advisor/strategic-options/route.ts`; decision log persisted via `product/app/(app)/context/components/save-decision-form.tsx`	Product owner reviews risk memo before it propagates to `product/app/(app)/advisor/page.tsx`; model routing validated by `product/lib/__tests__/model-router.test.ts` (TEST-004)
Activity & Analytics Agent (BR-006)	Events from `product/app/api/activity/route.ts`; unified feed from `product/app/api/activity/unified/feed/route.ts`; Mixpanel tracking plan in `docs/archive/MIXPANEL_TRACKING_PLAN.md`	Daily activity feed in `product/app/(app)/activity/activity-content.tsx`; calendar entries via `product/app/api/activity/unified/calendar/route.ts`; metrics exported via `product/app/api/activity/export/route.ts`; Mixpanel events fired through `product/app/api/ai/track-interaction/route.ts`	Metrics API response validated against schema in `product/app/api/activity/metrics/route.ts`; calendar implementation verified against `docs/activity_page/UNIFIED_CALENDAR_IMPLEMENTATION_COMPLETE.md` (MON-001)
QA Documentation Agent (BR-007)	`datazoom-qa/action_plans/` series (00–10); `datazoom-qa/BUILD_READY_SUMMARY.md`; test files under `product/lib/__tests__/` and `product/lib/cap-table/__tests__/`	Updated `datazoom-qa/COMPLETE_MANIFEST.md`; per-plan status annotations; regression report fed into `product/lib/__tests__/app-smoke-regression.test.ts`	Engineering lead approves manifest before a release tag is cut; review-workflow correctness confirmed by `product/lib/__tests__/review-workflow.test.ts` (TEST-005)
Infrastructure & Deployment Agent (BR-008)	`docker-compose.yml`; `docker/Dockerfile.base`, `docker/Dockerfile.gpu`, `docker/Dockerfile.worker`, `docker/cloud-worker/Dockerfile`; `.github/workflows/build-images.yml`; `fly/orchestrator/Dockerfile`; `product/collaboration-ws/Dockerfile`	Updated `docs/BACKEND_INFRASTRUCTURE.md`; deployment confirmation notes (pattern: `docs/archive/completed_action_plans/document_refinement_v1/08_DEPLOYMENT_CONFIRMATION.md`); CI image tags pushed to `ghcr.io/midwestco/datazoom-{base,worker,gpu}:latest`	CI `build-images.yml` workflow must pass (including retry logic added in commit `960f88c`) before any infra document is promoted from draft to current (TECH-001)
Schema Migration Agent (DBT-001)	Current DDL in `docs/archive/superseded/DATABASE_SCHEMA.md`; SQL migration files (`docs/archive/add_missing_tables_to_cloud.sql`, `docs/archive/create_company_settings.sql`, `docs/activity_page/VERIFY_DELETE_LOGGING.sql`); action plans under `docs/cap-table/action_plans/01_cap_table_schema_foundation.md` and `02_cap_table_rls_and_policies.md`	Versioned migration scripts applied to Supabase PostgreSQL instance; updated data-dictionary section in `docs/cap-table/OVERVIEW.md`; calibration report written to `docs/cap-table/calibration_report.md` using template `docs/cap-table/calibration_report_template.md`	DBA or senior engineer reviews RLS policies (plan `02_cap_table_rls_and_policies.md`) and threshold policy (`docs/cap-table/calibration_threshold_policy.md`) before migration runs in production (DBT-002)

Document Loop

Generated documents become source material through a structured promotion cycle:

Draft → Active Action Plan. When an agent completes a task, it writes or updates the relevant action_plans/NN_<topic>.md file (e.g., docs/cap-table/action_plans/06_equity_extraction_candidates_pipeline.md). This file immediately becomes the operating context for the next agent in the sequence. The index file for each domain (e.g., docs/cap-table/action_plans/00_ACTION_PLAN_INDEX.md, docs/activity_page/action_plans/00_ACTION_PLAN_INDEX.md) acts as the agent's routing table.
Active → Complete. Upon human approval, the action plan receives a _COMPLETE suffix copy (pattern observed throughout docs/activity_page/action_plans/ and docs/archive/completed_action_plans/). The complete copy is retained as an immutable record; the base file continues to serve as a live reference for any agent that needs to reason about that feature's final state.
Complete → Archived. When a feature is superseded or refactored, the complete plan migrates to docs/archive/completed_action_plans/<domain>/ or docs/archive/superseded/. The Infrastructure Agent (BR-008) and Schema Migration Agent (DBT-001) are responsible for triggering this retirement when a replacement plan is activated.
Archived Material as Negative Context. Agents consult docs/archive/superseded/ (e.g., INTELLIGENT_SYSTEM_ARCHITECTURE.md, smart_model_routing.md, RAG_SYSTEM_OVERVIEW.md) to understand decisions that were reversed, avoiding re-proposing discarded approaches. The QA Documentation Agent (BR-007) cross-references datazoom-qa/COMPLETE_MANIFEST.md against the archive to ensure test coverage tracks current rather than superseded behavior.
Test Results Feeding Documentation. Passing test suites — particularly product/lib/__tests__/rag-retrieval-enhanced.test.ts, product/lib/cap-table/__tests__/observability.test.ts, and product/lib/__tests__/review-workflow.test.ts — are treated as binding evidence that an action plan's stated behavior is implemented. The QA Documentation Agent writes test outcome summaries back into the relevant action plan's status section, closing the feedback loop between implementation and specification.
Analytics Closing the Loop. Mixpanel event data described in docs/archive/MIXPANEL_EVENTS_REPORT.md and tracked through product/app/api/ai/track-interaction/route.ts feeds the Activity & Analytics Agent (BR-006), which surfaces usage patterns in product/app/(app)/activity/ and informs prioritization of new action plans under docs/action_plans/.

Governance

Human Review Tiers

Tier	Trigger	Reviewer	Required Action
T1 — Schema Change	Any agent proposes a DDL migration or RLS policy update	DBA or senior engineer	Explicit approval in PR before `supabase db push`; references `docs/cap-table/action_plans/02_cap_table_rls_and_policies.md` and `calibration_threshold_policy.md`
T2 — Agent Output to Production UI	Cap-table candidates, risk memos, or due-diligence checklists ready for user display	Product owner	Approve/reject via `product/app/api/cap-table/review/[id]/approve/route.ts` or equivalent; must not be bypassed even in staging
T3 — CI / Infra Promotion	New Docker image or Fly deployment configuration	Engineering lead	`build-images.yml` workflow passes with no manual overrides; commit `960f88c` established retry logic that must not be disabled
T4 — Document Retirement	Action plan proposed for archival	Document owner (team lead per domain)	Move to `docs/archive/superseded/` or `docs/archive/completed_action_plans/` with a dated commit message; update the domain index
T5 — QA Release Gate	Pre-release regression run	Engineering lead	`product/lib/__tests__/app-smoke-regression.test.ts` must pass; `datazoom-qa/COMPLETE_MANIFEST.md` must reflect current feature set

Approval Workflow

Pull requests that touch any file under docs/, datazoom-qa/, or product/lib/__tests__/ require at least one human reviewer before merge. The .husky/pre-push hook enforces linting; the .github/workflows/knip.yml workflow enforces that no dead exports are introduced by documentation-adjacent code changes.

Retirement Policy

A document is eligible for retirement when: (a) its corresponding feature has been replaced and the replacement action plan is in _COMPLETE state, (b) the schema it describes no longer matches the live Supabase DDL, or (c) it has been superseded by a newer version with an explicit successor reference. Retired documents are never deleted; they are moved to docs/archive/superseded/ with their original filename intact so that historical agent reasoning can be audited. The docs/DOCUMENTATION_INDEX.md is the single source of truth for which documents are current versus archived, and the Infrastructure Agent (BR-008) is responsible for keeping it synchronized after each deployment cycle.