Currently

Context engine

Context engine

Last updated 5/13/2026

Context Engine: Currently Healthcare Data Platform

Document ID: TECH-CE-001 Version: v0.4 (Build) Domain: Healthcare Data Interoperability / Clinical Document Processing Scope: AI/LLM context architecture, prompt engineering patterns, knowledge flow, and agent coordination across the Currently platform Last Updated: Derived from repository state at commit fc207dd


Table of Contents

  1. Domain Profile
  2. Context Architecture
  3. Knowledge Graph Structure
  4. Prompt Engineering Patterns
  5. Memory Management
  6. Context Density Strategies
  7. Agent Context Protocols
  8. Cross-Session Persistence
  9. Context Window Budget Allocation
  10. Operational Constraints

1. Domain Profile

1.1 Product Identity

Currently is a healthcare data interoperability platform that ingests raw clinical documents from Health Information Exchanges (HIEs) and transforms them into standardized FHIR R4 resources and OMOP CDM 5.4 records. The platform operates at the intersection of clinical standards compliance, PHI data governance, and multi-tenant SaaS architecture.

Any AI agent operating within this context must treat this domain profile as its primary grounding layer. Clinical document processing is not a general-purpose task — every transformation decision has downstream effects on care analytics, research cohorts, and regulatory reporting.

1.2 Core Entity Taxonomy

Entity ClassCanonical ID PrefixDescriptionPHI Classification
Organization (HIE)org_Top-level HIE entity; maps to organizations tableNo PHI
Hospital / Facilityhosp_Healthcare facility within an HIE; maps to hospitals tableNo PHI
User (Platform)usr_Provider, HIE admin, or platform admin; maps to users tableNo PHI
Patientpat_Patient portal user; maps to patient recordsPHI — PHI DB only
Medical Recordrec_Raw uploaded file; tracked in file_registryPHI — PHI DB only
FHIR Bundlefhir_Transformed FHIR R4 resource bundlePHI — PHI DB only
OMOP Recordomop_Mapped OMOP CDM 5.4 rowPHI — PHI DB only
Pipeline Jobjob_Celery task tracking a document through processing stagesNo PHI (metadata only)
Processing Conceptconcept_Athena vocabulary concept (6.3M entries, OMOP schema)Reference data
API Keykey_Organization or patient-level API credentialSensitive, not PHI

1.3 Controlled Vocabulary

AI agents must use the following terminology consistently. Deviations create ambiguity in cross-agent communication.

Clinical Standards

  • HL7 v2 — Legacy pipe-delimited message format (ADT, ORU, ORM, etc.); profiles stored in config/hl7_profiles/*.json
  • CDA/CCDA — Clinical Document Architecture XML; the consolidated variant (CCDA) is the primary inbound format from most HIEs
  • FHIR R4 — HL7 Fast Healthcare Interoperability Resources, Release 4; the outbound standard format
  • OMOP CDM 5.4 — Observational Medical Outcomes Partnership Common Data Model; the outbound analytics format
  • Athena Vocabulary — OHDSI's vocabulary database (6.3M concepts) mapping to LOINC, SNOMED CT, ICD-10-CM, RxNorm; stored in omop_cdm54.concept

Pipeline Stages (in processing order)

  1. parse — Raw HL7 or CDA/CCDA text → structured segment objects
  2. clean — Normalization, deduplication, datetime standardization (see commit b955f51)
  3. fhir — Structured segments → FHIR R4 resource bundle
  4. omop — FHIR R4 bundle → OMOP CDM 5.4 rows with Athena concept mapping
  5. validate — FHIR conformance validation, OMOP referential integrity checks

Infrastructure Terms

  • PHI Cloud SQL — GCP PostgreSQL instance holding all clinical data; strictly isolated
  • Platform Cloud SQL — GCP PostgreSQL instance holding organizational/user data; no PHI
  • Celery Workers — Async task processors running on GKE; handle pipeline stages
  • Redis Broker — Message queue connecting FastAPI to Celery workers
  • DLQ (Dead Letter Queue) — Routing destination for messages that fail processing; see tests/e2e/test_dlq_routing.py

Role Taxonomy

  • platform_admin — Currently staff; access to /admin shell
  • hie_admin — HIE organization administrator; access to /dashboard/network
  • provider — Facility-level clinical user; access to /dashboard
  • patient — Patient portal user; access to /portal

1.4 Regulatory and Compliance Constraints

Every AI agent operating on this codebase must internalize these constraints as non-negotiable context boundaries:

  • HIPAA PHI Isolation: No PHI-bearing data may be referenced, suggested, or processed outside the PHI Cloud SQL instance. The Platform Cloud SQL database intentionally contains zero patient data. This separation is architectural, not optional.
  • FHIR Conformance: Output FHIR resources must conform to R4 profiles. Validation is tested in tests/e2e/test_fhir_full_pipeline.py and toggled via tests/e2e/test_conformance_toggle.py.
  • Terminology Binding: Code systems must bind to Athena vocabulary entries. Local/proprietary codes are not acceptable in OMOP output without concept mapping.
  • Audit Trail: PHI access is audited via /api/web/v1/admin/audit/phi-access. Any AI-assisted feature touching patient records must preserve audit semantics.
  • De-identification Rules: De-identification logic is governed by config/deid/default_rules.csv. AI agents must not suggest alternate de-identification approaches without referencing this baseline.

2. Context Architecture

2.1 Structural Overview

Currently's context architecture is distributed across five distinct layers. Each layer feeds into the next, forming a pipeline of project knowledge that mirrors the clinical data pipeline itself.

┌─────────────────────────────────────────────────────────────┐
│  LAYER 5: AGENT EXECUTION CONTEXT                           │
│  Runtime context assembled per-task from layers below       │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│  LAYER 4: RULE PROFILES (.cursor/rules/*.mdc)               │
│  Agent behavior constraints per domain                      │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│  LAYER 3: CONFIGURATION KNOWLEDGE (config/**)               │
│  HL7 profiles, FHIR mappings, vocabulary crosswalks,        │
│  de-identification rules                                    │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│  LAYER 2: STRUCTURAL SCHEMA (Database + API definitions)    │
│  Drizzle ORM schemas, FastAPI route contracts,              │
│  Clerk org/role model                                       │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│  LAYER 1: CANONICAL DOCUMENTATION                           │
│  README.md, CURRENTLY_STATUS_REPORT.md,                     │
│  this Context Engine document                               │
└─────────────────────────────────────────────────────────────┘

2.2 Rule Profile System

The .cursor/rules/ directory contains four Markdown-based rule profiles that function as scoped system prompts. These files define the behavioral constraints for AI-assisted development within each subsystem.

FileDomain ScopeKey Constraints Encoded
.cursor/rules/project-conventions.mdcPlatform-wideNaming conventions, ID systems, commit format
.cursor/rules/python-pipeline.mdcFastAPI + Celery pipelinePython 3.11 patterns, FHIR/OMOP output contracts, PHI handling rules
.cursor/rules/testing.mdcAll test layersHypothesis property-based testing requirements, e2e fixture patterns
.cursor/rules/web-frontend.mdcNext.js web appTypeScript conventions, Clerk auth patterns, scope shell boundaries

These rule files are the primary mechanism for injecting persistent, domain-specific context into any AI coding session without requiring the agent to re-derive constraints from first principles.

2.3 Configuration as Context

The config/ directory is a rich, structured knowledge base that serves as grounding context for pipeline agents. It is not merely runtime configuration — it encodes clinical knowledge that agents must treat as authoritative.

HL7 Message Profiles (config/hl7_profiles/) Eighteen JSON files defining expected segment structure for each message type. Key profiles:

  • oru_r01.json — Observation Result (lab results, the most common inbound type)
  • adt_a01.json through adt_a31.json — ADT admit/discharge/transfer events
  • base_profile.json — Shared segment definitions inherited by all message types
  • field_definitions.json — Canonical field metadata used during parsing

When an agent generates or modifies parsing code, it must validate proposed logic against the relevant profile JSON before suggesting changes.

FHIR Mapping Tables (config/fhir_mappings/)

  • message_type_registry.json — Maps HL7 message type codes to target FHIR resource types
  • vocabulary_maps.json — Code system crosswalk from HL7 table values to FHIR/OMOP terminology

HL7 Table Definitions (config/hl7_tables/0001.json0091.json) Lookup tables for coded HL7 fields. Agents writing parsing or validation logic must reference these tables rather than hardcoding values.

Context Crosswalk (config/crosswalk/hl7_table_contexts.json) Maps HL7 table codes to semantic contexts, enabling agents to understand what a coded value means in clinical terms.

De-identification Rules (config/deid/default_rules.csv) The authoritative de-identification ruleset. Agents must treat this as a read-only constraint unless explicitly tasked with modifying de-identification policy.

2.4 Database Schema as Context

The Platform Cloud SQL schema (documented in the Database Schema section and managed via Drizzle ORM) provides structural context for any agent working on organizational data flows. Key context boundaries encoded in the schema:

  • organizations table: HIE-level entities; settings JSONB field holds per-org configuration that pipeline agents may need to honor
  • file_registry table: File processing metadata including status lifecycle (pending → transferred → processing → completed → failed) and file_hash for deduplication
  • users table: Role context required for access control decisions; clerk_role field is the authoritative role source

The PHI Cloud SQL schema (not reproduced here for security reasons) holds medical_records, FHIR bundles, and omop_cdm54.concept. Agents must not generate code that queries PHI tables without explicit PHI-scoped context.


3. Knowledge Graph Structure

3.1 Document Node Registry

The following documents form the primary nodes of Currently's knowledge graph. Cross-references between them constitute the edges.

Node IDDocumentTypeAuthority Level
TECH-CE-001This documentContext EngineHighest — meta-document
TECH-ARCH-001README.mdArchitecture overviewHigh
TECH-ARCH-002documentation/CURRENTLY_STATUS_REPORT.mdStatus + evaluationHigh
TECH-RULE-001.cursor/rules/project-conventions.mdcAgent rule profileHigh
TECH-RULE-002.cursor/rules/python-pipeline.mdcAgent rule profileHigh
TECH-RULE-003.cursor/rules/testing.mdcAgent rule profileHigh
TECH-RULE-004.cursor/rules/web-frontend.mdcAgent rule profileHigh
TECH-SCHEMA-001Platform DB schema (Drizzle)Structural schemaHigh
TECH-CFG-001config/hl7_profiles/ (18 files)Configuration knowledgeHigh
TECH-CFG-002config/fhir_mappings/ (2 files)Configuration knowledgeHigh
TECH-CFG-003config/hl7_tables/ (80+ files)Configuration knowledgeMedium
TECH-CFG-004config/deid/default_rules.csvPolicyImmutable (policy)
TECH-CFG-005config/crosswalk/hl7_table_contexts.jsonConfiguration knowledgeMedium

3.2 Cross-Reference Map

The following cross-references represent the highest-value edges in the knowledge graph. When an agent navigates from one node to another via these references, it should carry the semantic context of both nodes.

TECH-CE-001 (this document)
  ├── references → TECH-ARCH-001 (README.md) [architecture grounding]
  ├── references → TECH-RULE-001..004 (rule profiles) [behavioral constraints]
  ├── references → TECH-SCHEMA-001 (DB schema) [entity definitions]
  └── references → TECH-CFG-001..005 (config files) [clinical knowledge]

TECH-RULE-002 (python-pipeline.mdc)
  ├── governs → src/api/ (FastAPI routes)
  ├── governs → src/pipeline/ (processing stages)
  ├── references → TECH-CFG-001 (HL7 profiles) [parsing behavior]
  ├── references → TECH-CFG-002 (FHIR mappings) [transformation targets]
  └── references → TECH-CFG-004 (deid rules) [PHI handling]

TECH-SCHEMA-001 (Platform DB)
  ├── defines → organizations [HIE entity model]
  ├── defines → hospitals [facility model]
  ├── defines → file_registry [processing lifecycle]
  └── constrains → TECH-RULE-004 (web frontend) [API contract shape]

TECH-CFG-001 (HL7 profiles)
  ├── constrains → src/pipeline/parse/ [parsing logic]
  ├── references → TECH-CFG-003 (HL7 tables) [coded field values]
  └── feeds → TECH-CFG-002 (FHIR mappings) [transformation path]

3.3 Test File Knowledge Edges

The test suite encodes implicit domain knowledge that agents must treat as authoritative validation criteria.

Test FileKnowledge EncodedAgent Relevance
tests/e2e/test_fhir_full_pipeline.pyEnd-to-end FHIR conformance criteriaAny agent modifying transformation logic
tests/e2e/test_hl7_full_pipeline.pyHL7 parsing correctness for full message setAny agent modifying parsing logic
tests/e2e/test_omop_full_pipeline.pyOMOP CDM output correctnessAny agent touching concept mapping
tests/e2e/test_canonical_consistency.pyCross-format canonical representationAgents working on multi-format output
tests/conformance/test_fuzz.pyEdge case and adversarial input handlingAgents modifying input validation
tests/conformance/test_property_based.pyHypothesis-generated property testsAgents modifying core transformation logic
tests/e2e/test_dlq_routing.pyDead letter queue failure semanticsAgents modifying error handling
tests/e2e/test_y1_y2_edge_cases.pyYear-range datetime edge casesAgents touching date normalization (see commit b955f51)
tests/e2e/test_omop_csv_round_trip.pyOMOP export round-trip fidelityAgents modifying OMOP serialization
tests/e2e/test_rule_audit.pyDe-identification rule audit trailAgents touching PHI processing

3.4 API Route Knowledge Graph

API routes encode the contract between the web application and downstream services. The following route clusters carry distinct semantic context that agents must preserve:

Admin Cluster (/api/web/admin/**, /api/web/v1/admin/**)

  • Context: Platform staff operations; full organizational visibility
  • Key routes: /api/web/v1/admin/audit/phi-access — audit semantics must be preserved in any changes
  • Agent constraint: Changes to admin routes require considering both platform admin UX (/admin shell) and security audit implications

Facility/HIE Cluster (/api/web/facilities/**, /api/web/hies/**)

  • Context: Multi-tenant organizational data; organization scoping is mandatory
  • Key routes: /api/web/facilities/:id/medical-records — bridges platform DB (metadata) to PHI DB (records)
  • Agent constraint: Must enforce organization_id scoping on all queries

Authentication Cluster (/api/web/auth/**, /api/web/session)

  • Context: Clerk webhook synchronization; sync-user route maintains consistency between Clerk and Platform DB
  • Agent constraint: Changes must preserve Clerk ↔ Platform DB bidirectional sync semantics

Patient Portal Cluster (/api/web/onboarding/patient, /api/web/invitations/)

  • Context: Patient-facing flows; invitation token lifecycle
  • Agent constraint: PHI access controls apply from first contact

4. Prompt Engineering Patterns

4.1 Pipeline Stage Decomposition Pattern

The most important prompt pattern for Currently is stage-scoped context injection. The clinical document pipeline has five discrete stages (parse → clean → fhir → omop → validate). When an agent is tasked with work on any stage, it must receive context scoped to that stage's inputs, outputs, and contracts — not the full pipeline.

Pattern: Stage Context Frame

STAGE: {stage_name}
INPUT CONTRACT: {description of expected input format}
OUTPUT CONTRACT: {description of required output format}
GOVERNING CONFIG: {relevant config files}
VALIDATION CRITERIA: {relevant test files}
CONSTRAINT: {PHI handling requirement for this stage}

TASK: {specific change or question}

Example instantiation for the FHIR transformation stage:

STAGE: fhir
INPUT CONTRACT: Cleaned segment object dict from HL7 parser or 
                parsed CDA structure; all datetimes normalized to 
                FHIR dashed format per commit b955f51
OUTPUT CONTRACT: Valid FHIR R4 Bundle JSON; resource types per 
                 config/fhir_mappings/message_type_registry.json
GOVERNING CONFIG: config/fhir_mappings/vocabulary_maps.json,
                  config/hl7_profiles/base_profile.json
VALIDATION CRITERIA: tests/e2e/test_fhir_full_pipeline.py,
                     tests/e2e/test_canonical_consistency.py
CONSTRAINT: No PHI written to stdout, logs, or non-PHI DB

TASK: Add support for MedicationAdministration resources from 
      RDE_O11 messages

4.2 Multi-Tenant Scoping Pattern

Any prompt involving organizational data must include explicit multi-tenancy context. Missing this context produces code that leaks cross-organization data.

Pattern: Tenant Scope Frame

TENANT MODEL:
  - Organizations (HIEs) own Hospitals (Facilities)
  - Users belong to one Organization
  - All queries must scope to organization_id
  - Clerk org_id maps to organizations.clerk_org_id

ACTIVE TENANT: {organization context for this task}
ROLE: {user role — platform_admin | hie_admin | provider | patient}
SCOPE SHELL: {/admin | /dashboard/network | /dashboard | /portal}

4.3 PHI Boundary Assertion Pattern

Before any prompt that touches data retrieval, this assertion must be prepended to prevent context contamination:

Pattern: PHI Boundary Declaration

PHI BOUNDARY ASSERTION:
  Platform DB (orchestrator, 192.168.0.251): organizations, 
    hospitals, users, file_registry — NO PHI
  PHI DB (GCP Cloud SQL): medical_records, FHIR bundles, 
    omop_cdm54 — ALL PHI, HIPAA-governed

This task operates on: {PLATFORM_DB | PHI_DB | BOTH}
If BOTH: cross-references must use only non-PHI identifiers 
         (file_hash, job_id) as join keys

4.4 Conformance Test Anchoring Pattern

When generating or modifying transformation logic, agents must anchor their output to the conformance test suite. This prevents generating valid-looking code that fails domain-specific correctness criteria.

Pattern: Test Anchor Frame

CONFORMANCE ANCHOR:
  This change must pass:
  - tests/e2e/test_{stage}_full_pipeline.py
  - tests/conformance/test_property_based.py (Hypothesis)
  - tests/conformance/test_fuzz.py (adversarial inputs)
  
  Known edge cases in scope:
  - tests/e2e/test_y1_y2_edge_cases.py (datetime normalization)
  - tests/e2e/test_omop_csv_round_trip.py (if OMOP output affected)
  
  Do not generate code that achieves the nominal case by 
  hardcoding or special-casing test inputs.

4.5 Configuration-Grounded Suggestion Pattern

When an agent proposes changes to parsing or mapping logic, it must ground suggestions in the actual configuration files rather than general HL7/FHIR knowledge. The config files encode the platform's specific interpretation of standards.

Pattern: Config Grounding Frame

GROUNDING SOURCE: config/hl7_profiles/{message_type}.json
FIELD REFERENCE: {segment}.{field_index} as defined in 
                 config/hl7_profiles/field_definitions.json
TABLE REFERENCE: config/hl7_tables/{table_number}.json 
                 for coded values
CONTEXT: config/crosswalk/hl7_table_contexts.json for 
         semantic meaning

Propose changes consistent with these definitions. If a 
proposed change requires modifying a config file, state 
the config change explicitly and separately from the 
code change.

4.6 Rig Agent Pattern (Observed from PR History)

Pull requests #51–#60 reveal a recurring pattern where a "Rig" agent generates targeted copy and code changes against specific source files. This pattern has characteristics of a retrieval-augmented generation workflow where the agent receives:

  1. A specific source file path (e.g., landing-hero.tsx)
  2. A change specification ("Replace 'Get Started' with 'Start now'")
  3. No broader codebase context

This is a narrow-context, high-precision pattern appropriate for isolated copy or UI changes. However, the volume of near-duplicate PRs (#51–#60 all address similar copy changes) suggests the Rig agent lacks cross-PR deduplication context. The recommended fix is to inject a "pending changes" summary into Rig agent context before each PR generation.


5. Memory Management

5.1 Context Persistence Layers

Currently's architecture provides three levels of context persistence, each with different durability and retrieval characteristics.

Level 1 — Ephemeral Session Context Lives only within a single agent session. Includes: current file contents, local variable state, conversation history. Lost when the session ends. Maximum useful window: the context window of the underlying model minus reserved space for configuration context (see Section 9).

Level 2 — Committed Code Context Survives across sessions via git. The commit history is a compressed memory of past decisions. Agents should treat recent commits as implicit memory of recent state changes:

  • b955f51 encodes the decision that compact HL7 dates must be normalized to FHIR dashed format
  • 9df7a3c (Y7/Y8/Y9/R2) encodes a pipeline schema expansion decision
  • 02461e4 encodes a decision about production FHIR validation authentication

When an agent encounters an unfamiliar behavior in the codebase, commit messages are the first retrieval target.

Level 3 — Configuration and Rule Persistence The .cursor/rules/*.mdc files and config/**/*.json files are the most durable form of project memory. They encode decisions that have been elevated from ephemeral session context into permanent project knowledge.

5.2 Hypothesis Database as Implicit Memory

The .hypothesis/ directory contains 50+ stored constants and examples from property-based test runs. These represent a learned corpus of edge cases that have been discovered and retained across test sessions. Agents modifying code covered by Hypothesis tests must treat these stored examples as accumulated domain memory — failing a stored example is evidence that a proposed change violates a previously-discovered invariant.

5.3 Summarization Strategy for Long Pipelines

When a full pipeline context exceeds available context window budget, apply the following summarization hierarchy:

  1. Keep verbatim: The specific stage being modified, its governing config files, and its test files
  2. Summarize to contract: Adjacent pipeline stages reduced to input/output contracts only
  3. Reference by ID: Distant stages and documentation reduced to document node IDs (e.g., TECH-CFG-001)
  4. Drop entirely: Unrelated scope shells (e.g., /portal patient UI when working on OMOP mapping)

Summarization template for adjacent pipeline stages:

ADJACENT STAGE [{stage_name}]: 
  Receives {input_type}, produces {output_type}. 
  Contract defined in {test_file}. Not modified in this task.

5.4 Cross-Session State via file_registry

The file_registry table's status field (pending → transferred → processing → completed → failed) functions as persistent cross-session state for pipeline jobs. Agents working on pipeline resumption, retry logic, or error handling must read this status lifecycle as the ground truth for where a job stands, rather than inferring state from log output.


6. Context Density Strategies

6.1 Signal-to-Noise Principles

In a healthcare interoperability domain, low-density context is dangerous — a misunderstood HL7 field or missing PHI constraint can produce clinically incorrect output or a compliance violation. The following strategies maximize signal density.

Strategy 1 — Vocabulary Pre-loading Before any clinical domain task, inject the entity taxonomy (Section 1.2) and controlled vocabulary (Section 1.3) as a compressed header. This costs approximately 400 tokens but eliminates the need for the agent to infer terminology from context, saving 2,000–5,000 tokens of clarification overhead per session.

Strategy 2 — Config-First Grounding Reference specific config file entries rather than describing their content. Instead of: "The ORU_R01 message has an OBX segment with observation values" — use: "Per config/hl7_profiles/oru_r01.json, OBX fields 3/4/5." The latter is unambiguous and costs fewer tokens.

Strategy 3 — Negative Space Declaration Explicitly state what is out of scope. In a multi-tenant, multi-standard platform, scope ambiguity is the primary source of context pollution. Declaring "This task does not touch PHI DB, patient portal, or OMOP output" at the start of a context window eliminates an entire category of potentially incorrect suggestions.

Strategy 4 — Test File as Specification Rather than describing desired behavior in prose, point to the test file that encodes it. tests/e2e/test_fhir_full_pipeline.py contains more precise FHIR output specification than any natural language description. Agents should be instructed to read the test as the specification, not as validation after the fact.

Strategy 5 — Role-Scoped API Context The 50 API routes are organized into clear clusters. When an agent works on a route, inject only the routes in the same cluster plus the shared auth middleware context. Injecting all 50 routes adds token overhead without adding relevant signal.

6.2 Context Compression for HL7 Table Values

The config/hl7_tables/ directory contains 80+ JSON files. Loading all of them into context is prohibitively expensive. The recommended compression strategy:

  1. Identify which tables are referenced by the fields in scope (via config/crosswalk/hl7_table_contexts.json)
  2. Load only the relevant table files
  3. For referenced tables, include only the coded values actually present in the input data being processed, not the full table

This reduces HL7 table context from ~50,000 tokens to typically under 500 tokens for a specific parsing task.

6.3 Dockerfile Layer as Context Signal

The four Dockerfiles encode implicit architectural separation:

  • Dockerfile — Main API server (src/api/main:app)
  • Dockerfile.pipeline — Pipeline processing container
  • Dockerfile.worker — Celery worker container
  • Dockerfile.nlp — NLP processing container (separate Python environment)

The existence of Dockerfile.nlp as a separate image indicates that NLP/ML model loading is isolated from the core pipeline. Agents must not assume NLP capabilities are available in the main API or worker containers.


7. Agent Context Protocols

7.1 Agent Taxonomy

Based on the repository's structure and the observed PR patterns, the following agent types operate against this codebase:

Agent IDNameDomainContext Profile
AGENT-001Pipeline EngineerFastAPI + Celery pipeline.cursor/rules/python-pipeline.mdc + relevant HL7/FHIR config
AGENT-002Web EngineerNext.js + TypeScript.cursor/rules/web-frontend.mdc + API route contracts
AGENT-003Test EngineerAll test layers.cursor/rules/testing.mdc + conformance criteria
AGENT-004Rig AgentUI copy + targeted changesSpecific source file + change spec (narrow context)
AGENT-005Platform AdminInfrastructure + CI/CDDockerfile set + GitHub Actions workflows
AGENT-006Schema EngineerDatabase + Drizzle ORMTECH-SCHEMA-001 + multi-tenant constraints

7.2 Context Handoff Protocol

When work transitions between agents (e.g., AGENT-001 completes a pipeline change that AGENT-002 must expose in the UI), a context handoff packet must be constructed:

Handoff Packet Structure

handoff_from: AGENT-001
handoff_to: AGENT-002
change_summary: |
  Added MedicationAdministration FHIR resource output from 
  RDE_O11 pipeline stage. Output shape: {resource_type, 
  medication_reference, subject_reference, effective_period}
api_impact:
  - route: /api/web/facilities/:id/medical-records
  - new_field: medication_administrations[] in FHIR bundle response
  - breaking_change: false
db_impact: none (PHI DB change, no Platform DB schema change)
test_files_updated:
  - tests/e2e/test_fhir_full_pipeline.py
  - tests/e2e/test_canonical_consistency.py
constraints_for_recipient:
  - FHIR bundle shape is defined by test_fhir_full_pipeline.py
  - PHI boundary: bundle contents are PHI; only metadata in 
    Platform DB responses

7.3 Rig Agent Context Protocol

The Rig agent (observed in PRs