DataZoom

Document agents

Document agents

Last updated 5/24/2026

Document Agents

Agent Thesis

DataZoom's document agents govern the creation, maintenance, and operational use of every artifact in the docs/ hierarchy, the datazoom-qa/ QA corpus, and the living action-plan network under docs/action_plans/. Each agent treats these files as its primary working memory: consuming upstream documents to derive context, emitting structured outputs that downstream agents and human reviewers can act on, and retiring superseded material into docs/archive/superseded/ when a plan's lifecycle closes. The agent network covers five document domains: (1) architecture and infrastructure specs (docs/AI_SERVICES.md, docs/BACKEND_INFRASTRUCTURE.md, docs/RAG_SYSTEM.md), (2) feature action plans (e.g., docs/cap-table/action_plans/, docs/action_plans/e_sign/), (3) QA and test manifests (datazoom-qa/COMPLETE_MANIFEST.md, datazoom-qa/BUILD_READY_SUMMARY.md), (4) schema and data dictionaries (docs/archive/superseded/DATABASE_SCHEMA.md, migration SQL files), and (5) operational runbooks (docs/archive/start_system.sh, deployment guides). Agents do not generate content from imagination; every claim in an emitted document must be traceable to a source file, API route, database schema field, or CI artifact in the repository.


Agent Roles

AgentInputsOutputsReview Gate
Ingestion Pipeline Agent (BR-001)Raw documents uploaded via product/app/(app)/documents/ UI; Supabase documents table (filename, document_type, full_text, page_count, char_count); Docker worker image ghcr.io/midwestco/datazoom-worker:latest logs from docker/Dockerfile.workersummary field written to documents table; document_chunks rows with embedding VECTOR(384) via sentence-transformers all-MiniLM-L6-v2; status update records consumed by product/app/api/admin/pipeline/health/route.tsHuman reviewer confirms chunk count and embedding integrity via product/app/(app)/admin/pipeline/page.tsx before document is surfaced in chat (UJ-001)
RAG Retrieval Agent (BR-002)document_chunks table (cosine similarity index ivfflat); user query from product/app/(app)/context/components/strategic-input.tsx; conversation history from conversations & messages tablesRanked chunk set passed to LLM (Modal Qwen2.5:32B or local Ollama via llm service on port 8001); cited answer surfaced in product/app/(app)/context/components/conversation-panel.tsx; interaction record written via product/app/api/ai/track-interaction/route.tsCitation system test product/lib/__tests__/citation-system.test.ts must pass; enhanced retrieval validated by product/lib/__tests__/rag-retrieval-enhanced.test.ts (TEST-001)
Cap-Table Extraction Agent (BR-003)Equity documents from documents table where document_type = 'equity'; Modal endpoint triggered by product/app/api/cap-table/extract/route.ts; feature gate evaluated by product/lib/cap-table/__tests__/feature-gate.test.tsCandidate transaction rows queued for review in product/app/api/cap-table/review/route.ts; observability metrics logged (see product/lib/cap-table/__tests__/observability.test.ts); auto-populated cap table via product/app/api/cap-table/auto-populate/route.tsHuman approves or rejects each candidate via product/app/api/cap-table/review/[id]/approve/route.ts and /reject/route.ts; smoke validated by product/lib/__tests__/cap-table-extract-trigger.test.ts (TEST-002)
Due Diligence Checklist Agent (BR-004)Business type profile from product/app/api/business-type-profiles/[profileKey]/route.ts; checklist template from product/app/api/business-types/[typeKey]/checklist/route.ts; document metadata (parties[], key_terms[], effective_date) from documents tablePopulated due-diligence checklist rendered in product/app/(app)/[type]/[id]/views/dd-match-panel.tsx; template events verified by product/lib/due-diligence/__tests__/template-events-contract.test.ts and template-events-route.test.ts (TEST-003)Tech lead reviews template schema normalization (see PR #108) before checklist is locked; chat output validated via product/lib/due-diligence/__tests__/chat-output.test.ts
Advisor / Risk-Memo Agent (BR-005)Strategic context from product/app/(app)/context/components/strategic-overview.tsx; advisor queue processed by product/app/api/advisor/process-queue/route.ts; batch jobs submitted via product/app/api/advisor/batch/route.tsRisk memo document written to product/app/api/advisor/risk-memo/route.ts; strategic options emitted to product/app/api/advisor/strategic-options/route.ts; decision log persisted via product/app/(app)/context/components/save-decision-form.tsxProduct owner reviews risk memo before it propagates to product/app/(app)/advisor/page.tsx; model routing validated by product/lib/__tests__/model-router.test.ts (TEST-004)
Activity & Analytics Agent (BR-006)Events from product/app/api/activity/route.ts; unified feed from product/app/api/activity/unified/feed/route.ts; Mixpanel tracking plan in docs/archive/MIXPANEL_TRACKING_PLAN.mdDaily activity feed in product/app/(app)/activity/activity-content.tsx; calendar entries via product/app/api/activity/unified/calendar/route.ts; metrics exported via product/app/api/activity/export/route.ts; Mixpanel events fired through product/app/api/ai/track-interaction/route.tsMetrics API response validated against schema in product/app/api/activity/metrics/route.ts; calendar implementation verified against docs/activity_page/UNIFIED_CALENDAR_IMPLEMENTATION_COMPLETE.md (MON-001)
QA Documentation Agent (BR-007)datazoom-qa/action_plans/ series (00–10); datazoom-qa/BUILD_READY_SUMMARY.md; test files under product/lib/__tests__/ and product/lib/cap-table/__tests__/Updated datazoom-qa/COMPLETE_MANIFEST.md; per-plan status annotations; regression report fed into product/lib/__tests__/app-smoke-regression.test.tsEngineering lead approves manifest before a release tag is cut; review-workflow correctness confirmed by product/lib/__tests__/review-workflow.test.ts (TEST-005)
Infrastructure & Deployment Agent (BR-008)docker-compose.yml; docker/Dockerfile.base, docker/Dockerfile.gpu, docker/Dockerfile.worker, docker/cloud-worker/Dockerfile; .github/workflows/build-images.yml; fly/orchestrator/Dockerfile; product/collaboration-ws/DockerfileUpdated docs/BACKEND_INFRASTRUCTURE.md; deployment confirmation notes (pattern: docs/archive/completed_action_plans/document_refinement_v1/08_DEPLOYMENT_CONFIRMATION.md); CI image tags pushed to ghcr.io/midwestco/datazoom-{base,worker,gpu}:latestCI build-images.yml workflow must pass (including retry logic added in commit 960f88c) before any infra document is promoted from draft to current (TECH-001)
Schema Migration Agent (DBT-001)Current DDL in docs/archive/superseded/DATABASE_SCHEMA.md; SQL migration files (docs/archive/add_missing_tables_to_cloud.sql, docs/archive/create_company_settings.sql, docs/activity_page/VERIFY_DELETE_LOGGING.sql); action plans under docs/cap-table/action_plans/01_cap_table_schema_foundation.md and 02_cap_table_rls_and_policies.mdVersioned migration scripts applied to Supabase PostgreSQL instance; updated data-dictionary section in docs/cap-table/OVERVIEW.md; calibration report written to docs/cap-table/calibration_report.md using template docs/cap-table/calibration_report_template.mdDBA or senior engineer reviews RLS policies (plan 02_cap_table_rls_and_policies.md) and threshold policy (docs/cap-table/calibration_threshold_policy.md) before migration runs in production (DBT-002)

Document Loop

Generated documents become source material through a structured promotion cycle:

  1. Draft → Active Action Plan. When an agent completes a task, it writes or updates the relevant action_plans/NN_<topic>.md file (e.g., docs/cap-table/action_plans/06_equity_extraction_candidates_pipeline.md). This file immediately becomes the operating context for the next agent in the sequence. The index file for each domain (e.g., docs/cap-table/action_plans/00_ACTION_PLAN_INDEX.md, docs/activity_page/action_plans/00_ACTION_PLAN_INDEX.md) acts as the agent's routing table.

  2. Active → Complete. Upon human approval, the action plan receives a _COMPLETE suffix copy (pattern observed throughout docs/activity_page/action_plans/ and docs/archive/completed_action_plans/). The complete copy is retained as an immutable record; the base file continues to serve as a live reference for any agent that needs to reason about that feature's final state.

  3. Complete → Archived. When a feature is superseded or refactored, the complete plan migrates to docs/archive/completed_action_plans/<domain>/ or docs/archive/superseded/. The Infrastructure Agent (BR-008) and Schema Migration Agent (DBT-001) are responsible for triggering this retirement when a replacement plan is activated.

  4. Archived Material as Negative Context. Agents consult docs/archive/superseded/ (e.g., INTELLIGENT_SYSTEM_ARCHITECTURE.md, smart_model_routing.md, RAG_SYSTEM_OVERVIEW.md) to understand decisions that were reversed, avoiding re-proposing discarded approaches. The QA Documentation Agent (BR-007) cross-references datazoom-qa/COMPLETE_MANIFEST.md against the archive to ensure test coverage tracks current rather than superseded behavior.

  5. Test Results Feeding Documentation. Passing test suites — particularly product/lib/__tests__/rag-retrieval-enhanced.test.ts, product/lib/cap-table/__tests__/observability.test.ts, and product/lib/__tests__/review-workflow.test.ts — are treated as binding evidence that an action plan's stated behavior is implemented. The QA Documentation Agent writes test outcome summaries back into the relevant action plan's status section, closing the feedback loop between implementation and specification.

  6. Analytics Closing the Loop. Mixpanel event data described in docs/archive/MIXPANEL_EVENTS_REPORT.md and tracked through product/app/api/ai/track-interaction/route.ts feeds the Activity & Analytics Agent (BR-006), which surfaces usage patterns in product/app/(app)/activity/ and informs prioritization of new action plans under docs/action_plans/.


Governance

Human Review Tiers

TierTriggerReviewerRequired Action
T1 — Schema ChangeAny agent proposes a DDL migration or RLS policy updateDBA or senior engineerExplicit approval in PR before supabase db push; references docs/cap-table/action_plans/02_cap_table_rls_and_policies.md and calibration_threshold_policy.md
T2 — Agent Output to Production UICap-table candidates, risk memos, or due-diligence checklists ready for user displayProduct ownerApprove/reject via product/app/api/cap-table/review/[id]/approve/route.ts or equivalent; must not be bypassed even in staging
T3 — CI / Infra PromotionNew Docker image or Fly deployment configurationEngineering leadbuild-images.yml workflow passes with no manual overrides; commit 960f88c established retry logic that must not be disabled
T4 — Document RetirementAction plan proposed for archivalDocument owner (team lead per domain)Move to docs/archive/superseded/ or docs/archive/completed_action_plans/ with a dated commit message; update the domain index
T5 — QA Release GatePre-release regression runEngineering leadproduct/lib/__tests__/app-smoke-regression.test.ts must pass; datazoom-qa/COMPLETE_MANIFEST.md must reflect current feature set

Approval Workflow

Pull requests that touch any file under docs/, datazoom-qa/, or product/lib/__tests__/ require at least one human reviewer before merge. The .husky/pre-push hook enforces linting; the .github/workflows/knip.yml workflow enforces that no dead exports are introduced by documentation-adjacent code changes.

Retirement Policy

A document is eligible for retirement when: (a) its corresponding feature has been replaced and the replacement action plan is in _COMPLETE state, (b) the schema it describes no longer matches the live Supabase DDL, or (c) it has been superseded by a newer version with an explicit successor reference. Retired documents are never deleted; they are moved to docs/archive/superseded/ with their original filename intact so that historical agent reasoning can be audited. The docs/DOCUMENTATION_INDEX.md is the single source of truth for which documents are current versus archived, and the Infrastructure Agent (BR-008) is responsible for keeping it synchronized after each deployment cycle.