Document agents
Document agents
Last updated 5/24/2026
Document Agents
Agent Thesis
DataZoom's document agents govern the creation, maintenance, and operational use of every artifact in the docs/ hierarchy, the datazoom-qa/ QA corpus, and the living action-plan network under docs/action_plans/. Each agent treats these files as its primary working memory: consuming upstream documents to derive context, emitting structured outputs that downstream agents and human reviewers can act on, and retiring superseded material into docs/archive/superseded/ when a plan's lifecycle closes. The agent network covers five document domains: (1) architecture and infrastructure specs (docs/AI_SERVICES.md, docs/BACKEND_INFRASTRUCTURE.md, docs/RAG_SYSTEM.md), (2) feature action plans (e.g., docs/cap-table/action_plans/, docs/action_plans/e_sign/), (3) QA and test manifests (datazoom-qa/COMPLETE_MANIFEST.md, datazoom-qa/BUILD_READY_SUMMARY.md), (4) schema and data dictionaries (docs/archive/superseded/DATABASE_SCHEMA.md, migration SQL files), and (5) operational runbooks (docs/archive/start_system.sh, deployment guides). Agents do not generate content from imagination; every claim in an emitted document must be traceable to a source file, API route, database schema field, or CI artifact in the repository.
Agent Roles
| Agent | Inputs | Outputs | Review Gate |
|---|---|---|---|
| Ingestion Pipeline Agent (BR-001) | Raw documents uploaded via product/app/(app)/documents/ UI; Supabase documents table (filename, document_type, full_text, page_count, char_count); Docker worker image ghcr.io/midwestco/datazoom-worker:latest logs from docker/Dockerfile.worker | summary field written to documents table; document_chunks rows with embedding VECTOR(384) via sentence-transformers all-MiniLM-L6-v2; status update records consumed by product/app/api/admin/pipeline/health/route.ts | Human reviewer confirms chunk count and embedding integrity via product/app/(app)/admin/pipeline/page.tsx before document is surfaced in chat (UJ-001) |
| RAG Retrieval Agent (BR-002) | document_chunks table (cosine similarity index ivfflat); user query from product/app/(app)/context/components/strategic-input.tsx; conversation history from conversations & messages tables | Ranked chunk set passed to LLM (Modal Qwen2.5:32B or local Ollama via llm service on port 8001); cited answer surfaced in product/app/(app)/context/components/conversation-panel.tsx; interaction record written via product/app/api/ai/track-interaction/route.ts | Citation system test product/lib/__tests__/citation-system.test.ts must pass; enhanced retrieval validated by product/lib/__tests__/rag-retrieval-enhanced.test.ts (TEST-001) |
| Cap-Table Extraction Agent (BR-003) | Equity documents from documents table where document_type = 'equity'; Modal endpoint triggered by product/app/api/cap-table/extract/route.ts; feature gate evaluated by product/lib/cap-table/__tests__/feature-gate.test.ts | Candidate transaction rows queued for review in product/app/api/cap-table/review/route.ts; observability metrics logged (see product/lib/cap-table/__tests__/observability.test.ts); auto-populated cap table via product/app/api/cap-table/auto-populate/route.ts | Human approves or rejects each candidate via product/app/api/cap-table/review/[id]/approve/route.ts and /reject/route.ts; smoke validated by product/lib/__tests__/cap-table-extract-trigger.test.ts (TEST-002) |
| Due Diligence Checklist Agent (BR-004) | Business type profile from product/app/api/business-type-profiles/[profileKey]/route.ts; checklist template from product/app/api/business-types/[typeKey]/checklist/route.ts; document metadata (parties[], key_terms[], effective_date) from documents table | Populated due-diligence checklist rendered in product/app/(app)/[type]/[id]/views/dd-match-panel.tsx; template events verified by product/lib/due-diligence/__tests__/template-events-contract.test.ts and template-events-route.test.ts (TEST-003) | Tech lead reviews template schema normalization (see PR #108) before checklist is locked; chat output validated via product/lib/due-diligence/__tests__/chat-output.test.ts |
| Advisor / Risk-Memo Agent (BR-005) | Strategic context from product/app/(app)/context/components/strategic-overview.tsx; advisor queue processed by product/app/api/advisor/process-queue/route.ts; batch jobs submitted via product/app/api/advisor/batch/route.ts | Risk memo document written to product/app/api/advisor/risk-memo/route.ts; strategic options emitted to product/app/api/advisor/strategic-options/route.ts; decision log persisted via product/app/(app)/context/components/save-decision-form.tsx | Product owner reviews risk memo before it propagates to product/app/(app)/advisor/page.tsx; model routing validated by product/lib/__tests__/model-router.test.ts (TEST-004) |
| Activity & Analytics Agent (BR-006) | Events from product/app/api/activity/route.ts; unified feed from product/app/api/activity/unified/feed/route.ts; Mixpanel tracking plan in docs/archive/MIXPANEL_TRACKING_PLAN.md | Daily activity feed in product/app/(app)/activity/activity-content.tsx; calendar entries via product/app/api/activity/unified/calendar/route.ts; metrics exported via product/app/api/activity/export/route.ts; Mixpanel events fired through product/app/api/ai/track-interaction/route.ts | Metrics API response validated against schema in product/app/api/activity/metrics/route.ts; calendar implementation verified against docs/activity_page/UNIFIED_CALENDAR_IMPLEMENTATION_COMPLETE.md (MON-001) |
| QA Documentation Agent (BR-007) | datazoom-qa/action_plans/ series (00–10); datazoom-qa/BUILD_READY_SUMMARY.md; test files under product/lib/__tests__/ and product/lib/cap-table/__tests__/ | Updated datazoom-qa/COMPLETE_MANIFEST.md; per-plan status annotations; regression report fed into product/lib/__tests__/app-smoke-regression.test.ts | Engineering lead approves manifest before a release tag is cut; review-workflow correctness confirmed by product/lib/__tests__/review-workflow.test.ts (TEST-005) |
| Infrastructure & Deployment Agent (BR-008) | docker-compose.yml; docker/Dockerfile.base, docker/Dockerfile.gpu, docker/Dockerfile.worker, docker/cloud-worker/Dockerfile; .github/workflows/build-images.yml; fly/orchestrator/Dockerfile; product/collaboration-ws/Dockerfile | Updated docs/BACKEND_INFRASTRUCTURE.md; deployment confirmation notes (pattern: docs/archive/completed_action_plans/document_refinement_v1/08_DEPLOYMENT_CONFIRMATION.md); CI image tags pushed to ghcr.io/midwestco/datazoom-{base,worker,gpu}:latest | CI build-images.yml workflow must pass (including retry logic added in commit 960f88c) before any infra document is promoted from draft to current (TECH-001) |
| Schema Migration Agent (DBT-001) | Current DDL in docs/archive/superseded/DATABASE_SCHEMA.md; SQL migration files (docs/archive/add_missing_tables_to_cloud.sql, docs/archive/create_company_settings.sql, docs/activity_page/VERIFY_DELETE_LOGGING.sql); action plans under docs/cap-table/action_plans/01_cap_table_schema_foundation.md and 02_cap_table_rls_and_policies.md | Versioned migration scripts applied to Supabase PostgreSQL instance; updated data-dictionary section in docs/cap-table/OVERVIEW.md; calibration report written to docs/cap-table/calibration_report.md using template docs/cap-table/calibration_report_template.md | DBA or senior engineer reviews RLS policies (plan 02_cap_table_rls_and_policies.md) and threshold policy (docs/cap-table/calibration_threshold_policy.md) before migration runs in production (DBT-002) |
Document Loop
Generated documents become source material through a structured promotion cycle:
-
Draft → Active Action Plan. When an agent completes a task, it writes or updates the relevant
action_plans/NN_<topic>.mdfile (e.g.,docs/cap-table/action_plans/06_equity_extraction_candidates_pipeline.md). This file immediately becomes the operating context for the next agent in the sequence. The index file for each domain (e.g.,docs/cap-table/action_plans/00_ACTION_PLAN_INDEX.md,docs/activity_page/action_plans/00_ACTION_PLAN_INDEX.md) acts as the agent's routing table. -
Active → Complete. Upon human approval, the action plan receives a
_COMPLETEsuffix copy (pattern observed throughoutdocs/activity_page/action_plans/anddocs/archive/completed_action_plans/). The complete copy is retained as an immutable record; the base file continues to serve as a live reference for any agent that needs to reason about that feature's final state. -
Complete → Archived. When a feature is superseded or refactored, the complete plan migrates to
docs/archive/completed_action_plans/<domain>/ordocs/archive/superseded/. The Infrastructure Agent (BR-008) and Schema Migration Agent (DBT-001) are responsible for triggering this retirement when a replacement plan is activated. -
Archived Material as Negative Context. Agents consult
docs/archive/superseded/(e.g.,INTELLIGENT_SYSTEM_ARCHITECTURE.md,smart_model_routing.md,RAG_SYSTEM_OVERVIEW.md) to understand decisions that were reversed, avoiding re-proposing discarded approaches. The QA Documentation Agent (BR-007) cross-referencesdatazoom-qa/COMPLETE_MANIFEST.mdagainst the archive to ensure test coverage tracks current rather than superseded behavior. -
Test Results Feeding Documentation. Passing test suites — particularly
product/lib/__tests__/rag-retrieval-enhanced.test.ts,product/lib/cap-table/__tests__/observability.test.ts, andproduct/lib/__tests__/review-workflow.test.ts— are treated as binding evidence that an action plan's stated behavior is implemented. The QA Documentation Agent writes test outcome summaries back into the relevant action plan's status section, closing the feedback loop between implementation and specification. -
Analytics Closing the Loop. Mixpanel event data described in
docs/archive/MIXPANEL_EVENTS_REPORT.mdand tracked throughproduct/app/api/ai/track-interaction/route.tsfeeds the Activity & Analytics Agent (BR-006), which surfaces usage patterns inproduct/app/(app)/activity/and informs prioritization of new action plans underdocs/action_plans/.
Governance
Human Review Tiers
| Tier | Trigger | Reviewer | Required Action |
|---|---|---|---|
| T1 — Schema Change | Any agent proposes a DDL migration or RLS policy update | DBA or senior engineer | Explicit approval in PR before supabase db push; references docs/cap-table/action_plans/02_cap_table_rls_and_policies.md and calibration_threshold_policy.md |
| T2 — Agent Output to Production UI | Cap-table candidates, risk memos, or due-diligence checklists ready for user display | Product owner | Approve/reject via product/app/api/cap-table/review/[id]/approve/route.ts or equivalent; must not be bypassed even in staging |
| T3 — CI / Infra Promotion | New Docker image or Fly deployment configuration | Engineering lead | build-images.yml workflow passes with no manual overrides; commit 960f88c established retry logic that must not be disabled |
| T4 — Document Retirement | Action plan proposed for archival | Document owner (team lead per domain) | Move to docs/archive/superseded/ or docs/archive/completed_action_plans/ with a dated commit message; update the domain index |
| T5 — QA Release Gate | Pre-release regression run | Engineering lead | product/lib/__tests__/app-smoke-regression.test.ts must pass; datazoom-qa/COMPLETE_MANIFEST.md must reflect current feature set |
Approval Workflow
Pull requests that touch any file under docs/, datazoom-qa/, or product/lib/__tests__/ require at least one human reviewer before merge. The .husky/pre-push hook enforces linting; the .github/workflows/knip.yml workflow enforces that no dead exports are introduced by documentation-adjacent code changes.
Retirement Policy
A document is eligible for retirement when: (a) its corresponding feature has been replaced and the replacement action plan is in _COMPLETE state, (b) the schema it describes no longer matches the live Supabase DDL, or (c) it has been superseded by a newer version with an explicit successor reference. Retired documents are never deleted; they are moved to docs/archive/superseded/ with their original filename intact so that historical agent reasoning can be audited. The docs/DOCUMENTATION_INDEX.md is the single source of truth for which documents are current versus archived, and the Infrastructure Agent (BR-008) is responsible for keeping it synchronized after each deployment cycle.