Operator console

Last updated 5/3/2026

DataZoom Operator Console

Document ID: OPS-001
Version: 1.0
Classification: Internal — Operations
Audience: Platform engineers, SRE, support engineers, on-call responders

System Overview
Admin Dashboard & Routes
Internal Tooling Inventory
Operational Workflows
Monitoring & Alerting
Support Workflows
Environment Management
Secret Management
Health Check Endpoints & Status

1. System Overview

DataZoom (midwestco/datazoom) is an AI-powered document analysis platform built on Next.js 15 (App Router), Supabase (PostgreSQL + pgvector), Clerk multi-tenant authentication, and a distributed Python worker fleet for document ingestion and LLM inference. The operational surface spans three distinct compute tiers: the Next.js web application, the Docker-based worker fleet (embedding, reranking, ingest), and the cloud inference layer (Modal/RunPod via orchestrator on Fly.io).

Production topology:

Tier	Runtime	Image
Web application	Next.js 15 / Vercel	—
Collaboration WebSocket	Node.js	`product/collaboration-ws/Dockerfile`
Base worker (LLM proxy)	Python	`ghcr.io/midwestco/datazoom-base:latest`
Ingest worker	Python	`ghcr.io/midwestco/datazoom-worker:latest`
GPU worker (embed + rerank)	Python	`ghcr.io/midwestco/datazoom-gpu:latest`
Cloud worker	Python	`docker/cloud-worker/Dockerfile`
LLM runtime	Ollama	`ollama/ollama:latest`
Orchestrator	Go/Python	`fly/orchestrator/Dockerfile` (Fly.io)

2. Admin Dashboard & Routes

2.1 Admin UI Pages

The admin surface is scoped under the (app) route group and protected by Clerk organization authentication. The primary operator page is:

Route	File	Purpose
`/admin/pipeline`	`product/app/(app)/admin/pipeline/page.tsx`	Document processing pipeline monitor — displays job queue depth, worker status, cloud routing decisions, and retry controls

2.2 Admin API Routes

All admin API routes live under /api/admin/ and require Clerk authentication with organization-scoped tokens. The cloud proxy endpoint uses Clerk auth directly (not withOrgAuth) per commit d037000.

Route ID	Path	File	Function
API-ADM-001	`GET/POST /api/admin/pipeline`	`product/app/api/admin/pipeline/route.ts`	Read pipeline queue state; trigger manual pipeline operations
API-ADM-002	`POST /api/admin/pipeline/cloud`	`product/app/api/admin/pipeline/cloud/route.ts`	Proxy commands to cloud inference workers (RunPod/Modal); cloud action error handling enforced
API-ADM-003	`GET /api/admin/pipeline/health`	`product/app/api/admin/pipeline/health/route.ts`	Returns fleet health from orchestrator (not stale serverless endpoint); RunPod status derived from `cloudStatus`
API-ADM-004	`POST /api/admin/pipeline/retry`	`product/app/api/admin/pipeline/retry/route.ts`	Retry failed or stuck pipeline jobs
API-ADM-005	`GET /api/admin/routing-stats`	`product/app/api/admin/routing-stats/route.ts`	Model routing statistics: local vs. cloud dispatch counts, latency percentiles

2.3 Supporting Operational Routes

These routes are used by operators for data inspection and corrective actions:

Route ID	Path	Purpose
API-OPS-001	`GET /api/activity/metrics`	Aggregate activity metrics across org
API-OPS-002	`GET /api/activity/export`	Export activity log as CSV/JSON for audit
API-OPS-003	`GET /api/activity/unified/refresh`	Force-refresh materialized activity views
API-OPS-004	`GET /api/cap-table/health`	Cap table data integrity check
API-OPS-005	`POST /api/cap-table/review/{id}/approve`	Operator approval of pending equity extraction candidates
API-OPS-006	`POST /api/cap-table/review/{id}/reject`	Operator rejection of extraction candidates
API-OPS-007	`POST /api/cap-table/transactions/{id}/void`	Void a cap table transaction (requires operator role)
API-OPS-008	`GET /api/company-settings/linkage-mismatches`	Surface document-to-company linkage integrity errors
API-OPS-009	`POST /api/analysis/regenerate`	Force-regenerate AI analysis for a document
API-OPS-010	`POST /api/advisor/process-queue`	Manually trigger advisor queue processing

3. Internal Tooling Inventory

3.1 Shell Scripts

Script	Location	Purpose
`setup.sh`	`/setup.sh`	First-time developer environment bootstrap: installs Python deps, configures Supabase, seeds initial schema
`start_system.sh`	`docs/archive/start_system.sh`	Legacy startup script for local full-stack launch (archived; superseded by `docker compose`)
`entrypoint.sh`	`docker/cloud-worker/entrypoint.sh`	Container entrypoint for cloud worker: initializes BullMQ Redis connection over TLS (Upstash, `ssl_cert_reqs=None`), starts worker process

3.2 Docker Compose Profiles

The docker-compose.yml supports four named profiles for selective service startup:

Profile	Command	Services Started
`full`	`COMPOSE_PROFILES=full docker compose up -d`	All services including Ollama LLM runtime and LLM proxy
`gpu`	`COMPOSE_PROFILES=gpu docker compose up -d`	Embedding service, reranker, Ollama
`worker`	`COMPOSE_PROFILES=worker docker compose up -d`	Ingest worker only
`infra`	`COMPOSE_PROFILES=infra docker compose up -d`	Monitoring stack (reserved for future use)

docker-compose.module1.yml and docker/docker-compose.module1-ports.yml provide port-exposed variants for module 1 development.

3.3 CI/CD Tooling

Located in .github/workflows/:

Workflow	File	Trigger	Function
`build-images.yml`	`.github/workflows/build-images.yml`	Push to main/release branches	Builds `datazoom-base`, `datazoom-worker`, `datazoom-gpu`, and `cloud-worker` Docker images; pushes to `ghcr.io/midwestco/`; uses `crane copy` for tag promotion with retry logic
`knip.yml`	`.github/workflows/knip.yml`	PR and push	Dead code detection via Knip; fails build on unused exports

Image registry: ghcr.io/midwestco/

ghcr.io/midwestco/datazoom-base:latest
ghcr.io/midwestco/datazoom-worker:latest
ghcr.io/midwestco/datazoom-gpu:latest

Cloud worker image is built directly on GitHub Actions and pushed to ghcr.io (commit 07c91b9).

3.4 Database Utilities (SQL Scripts)

Operational SQL scripts are maintained under docs/archive/ for manual execution against Supabase:

Script	Path	Purpose
`check_embedding_status.sql`	`docs/archive/check_embedding_status.sql`	Verify embedding coverage across `document_chunks`
`check_timeline.sql`	`docs/archive/check_timeline.sql`	Inspect `timeline_events` for gaps or anomalies
`add_missing_tables_to_cloud.sql`	`docs/archive/add_missing_tables_to_cloud.sql`	Migration script for adding tables to cloud Supabase instance
`create_company_settings.sql`	`docs/archive/create_company_settings.sql`	Bootstrap company settings table
`VERIFY_DELETE_LOGGING.sql`	`docs/activity_page/VERIFY_DELETE_LOGGING.sql`	Validate delete events are being captured in activity log

3.5 Git Hooks

Hook	Location	Action
`pre-commit`	`.husky/pre-commit`	Lint and format checks before commit
`pre-push`	`.husky/pre-push`	Runs test suite before push to remote

3.6 Queue Infrastructure

Document processing jobs are managed via BullMQ backed by Upstash Redis. Workers connect using redis.asyncio with ssl_cert_reqs=None for Upstash TLS compatibility (fixed in commit b718932). The cloud worker entrypoint initializes this connection on container start.

4. Operational Workflows

4.1 Deployment

Next.js Application (Vercel)

Merge pull request to main branch.
Vercel auto-deploys on merge. Preview deployments are generated for all PRs.
.vercelignore controls which files are excluded from the deployment bundle.
Environment variables must be configured in the Vercel project dashboard before deployment (see Section 8).

Docker Worker Fleet

# 1. Build updated images (CI handles this automatically on push)
docker pull ghcr.io/midwestco/datazoom-base:latest
docker pull ghcr.io/midwestco/datazoom-worker:latest
docker pull ghcr.io/midwestco/datazoom-gpu:latest

# 2. Deploy with appropriate profile
COMPOSE_PROFILES=full docker compose up -d

# 3. Verify health
docker compose ps
curl http://localhost:8001/health   # LLM proxy
curl http://localhost:11434/api/tags  # Ollama

Cloud Worker (Fly.io)

The cloud worker is deployed to Fly.io using the config at docker/cloud-worker/fly.toml. Orchestrator is deployed separately from fly/orchestrator/Dockerfile.

# Deploy orchestrator
cd fly/orchestrator
fly deploy

# Deploy cloud worker
cd docker/cloud-worker
fly deploy

Note: Orchestrator requires 2048 MB memory for performance CPU (see commit 4e7cd14).

4.2 Rollback

Vercel Rollback

Navigate to Vercel project → Deployments.
Identify the last known-good deployment.
Click Promote to Production.
Verify health check endpoints respond (see Section 9).

Docker Worker Rollback

# Pull previous image by digest or tag
docker pull ghcr.io/midwestco/datazoom-worker:<previous-sha>

# Update docker-compose.yml image tag, then redeploy
COMPOSE_PROFILES=worker docker compose up -d

# Confirm old container is replaced
docker compose ps

Database Migration Rollback

DataZoom uses Supabase migrations. If a migration must be reversed:

Connect to Supabase Studio or psql using the service key.
Execute the inverse SQL manually (no automated down-migration scripts are present in the repository).
Update the migration tracking table as appropriate.

4.3 Feature Flags

Feature flags are controlled via the Supabase company_settings table and the cap table feature gate system documented in product/lib/cap-table/__tests__/feature-gate.test.ts. The cap table feature is the primary gated feature as of the current version.

Cap Table Feature Gate:

Gate is evaluated per organization.
Operators can enable/disable via the company_settings record for a given org.
The API endpoint GET /api/company-settings returns the current gate state.
POST /api/company-settings updates settings including feature flag state.

Rollout procedure (per docs/cap-table/action_plans/12_rollout_feature_flag_and_rollback.md):

Enable for internal test org first.
Monitor GET /api/cap-table/health for data integrity issues.
Expand to pilot customer orgs.
Full rollout by toggling the default in company_settings.

To disable a feature for a specific org:

UPDATE company_settings
SET cap_table_enabled = false
WHERE org_id = '<clerk-org-id>';

4.4 User Management

User and organization management is handled through Clerk. DataZoom does not maintain a separate user table for auth — Clerk is the system of record.

Common operator actions:

Action	Method
View organization members	Clerk Dashboard → Organizations
Remove a user from an org	Clerk Dashboard → Organization → Members → Remove
Reset user session	Clerk Dashboard → Users → Sessions → Revoke
Proxy Clerk API calls internally	`POST /api/clerk/proxy`

Multi-tenant isolation: All data operations are scoped by Clerk organization ID. The withOrgAuth middleware enforces this at the API layer. The cloud proxy endpoint is an exception and uses Clerk auth directly without withOrgAuth (commit d037000).

5. Monitoring & Alerting

5.1 Observability Stack

Signal	Tool	Configuration
User analytics	Mixpanel	Tracked via `POST /api/ai/track-interaction`; event catalog in `docs/archive/MIXPANEL_EVENTS_REPORT.md`
Error tracking	Sentry	`SENTRY_DSN` configured in `.env.services` (optional but recommended for production)
Container logs	Docker json-file driver	Max 50 MB per file, 5 files retained (configured in `docker-compose.yml` `x-logging`)
Cap table observability	Internal	`product/lib/cap-table/__tests__/observability.test.ts` validates instrumentation

5.2 What Is Observed

Pipeline health:

Job queue depth (BullMQ / Upstash Redis)
Worker pod state: provisioning → warming → ready (pod warmup tracking added in commit 580b80b)
Cloud inference routing decisions: local vs. Modal/RunPod dispatch (visible at GET /api/admin/routing-stats)
RunPod fleet status derived from orchestrator cloudStatus (not stale serverless endpoint — commit 088f7b4)

Document processing:

Embedding coverage across document_chunks (via check_embedding_status.sql)
Failed and stuck jobs visible at GET /api/admin/pipeline
Extraction candidate review queue depth at GET /api/cap-table/review

Activity tracking:

User actions logged to activity log table; materialized views refresh via POST /api/activity/unified/refresh
Daily metrics accessible at GET /api/activity/metrics

Cap table integrity:

GET /api/cap-table/health returns data integrity status
GET /api/company-settings/linkage-mismatches surfaces document-to-company linkage errors

5.3 Ollama Service Health

Ollama is health-checked by Docker Compose:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
  interval: 30s
  timeout: 10s
  retries: 5
  start_period: 30s

If Ollama fails 5 consecutive checks (5 minutes), Docker will mark the container unhealthy. The LLM proxy on port 8001 depends on this service.

5.4 Alerting Escalation Path

DataZoom does not yet have a formalized PagerDuty or OpsGenie integration in the repository. Escalation is currently manual:

Severity	Condition	Response
P1	Web application returns 5xx for >5% of requests	Immediate Vercel rollback; notify engineering lead
P1	All worker pods unreachable (`/api/admin/pipeline/health` returns error)	Restart Docker worker fleet; check Fly.io orchestrator status
P2	BullMQ queue depth >100 stuck jobs	Use `POST /api/admin/pipeline/retry` to replay; investigate worker logs
P2	RunPod fleet shows no `ready` pods	Check orchestrator on Fly.io; inspect cloud worker logs
P3	Embedding coverage <95%	Run `check_embedding_status.sql`; trigger re-embedding via worker
P3	Cap table linkage mismatches detected	Inspect `GET /api/company-settings/linkage-mismatches`; run corrective SQL

6. Support Workflows

6.1 Ticket Triage

All inbound support issues should be categorized on receipt:

Category	Indicators	Owning Team
Auth / Access	User cannot log in, org not visible, permission denied	Platform (Clerk config)
Document Processing	Document stuck in processing, no embeddings, missing analysis	Data pipeline
AI / RAG Quality	Incorrect answers, missing citations, hallucinations	ML / RAG
Cap Table	Wrong ownership percentages, extraction errors, voiding needed	Cap table feature team
Billing / Wallet	Credits not deducted or over-deducted	Billing (`docs/billing_wallet/`)
Performance	Slow queries, high latency on chat	Platform / DB

6.2 Common Issues and Resolution Playbooks

ISSUE-001: Document stuck in processing state

Symptoms: Document uploaded but no chunks appear in document_chunks; no embedding visible.

Resolution:

Check GET /api/admin/pipeline for the job in the queue.
If job is in failed state, use POST /api/admin/pipeline/retry with the job ID.
If retry fails, inspect worker container logs: docker compose logs worker --tail=100.
Verify Redis/BullMQ connection: confirm Upstash TLS credentials in .env.services.
If embedding service is unhealthy, restart GPU profile: COMPOSE_PROFILES=gpu docker compose restart.

ISSUE-002: AI chat returning no results / empty context

Symptoms: Chat responds with no citations; RAG retrieval returns zero chunks.

Resolution:

Run check_embedding_status.sql against Supabase to confirm document_chunks.embedding is non-null for the document.
If embeddings are missing, the ingest worker did not complete — follow ISSUE-001 steps.
If embeddings are present, verify the ivfflat index is not corrupted: REINDEX INDEX CONCURRENTLY document_chunks_embedding_idx;
Test vector search directly in Supabase SQL editor with a sample query vector.
If search is operational but chat fails, check Modal/Ollama routing via GET /api/admin/routing-stats.

ISSUE-003: Cloud inference unavailable (Modal/RunPod)

Symptoms: POST /api/admin/pipeline/cloud returns errors; advisor queue stalls.

Resolution:

Check GET /api/admin/pipeline/health — inspect cloudStatus field.
Verify orchestrator on Fly.io is running: fly status -a <orchestrator-app-name>.
Check pod warmup state: pods cycle through provisioning → warming → ready (commit 580b80b). Allow up to 5 minutes for cold start.
If RunPod is the issue, confirm RunPod API key in secrets is valid (see Section 8).
As fallback, ensure local Ollama is running (COMPOSE_PROFILES=gpu docker compose up -d) — the model router will fall back to local inference.
Monitor model routing via GET /api/admin/routing-stats to confirm fallback is active.

ISSUE-004: Cap table shows incorrect ownership

Symptoms: Org reports wrong percentages or missing shareholders in cap table view.

Resolution:

Review pending extraction candidates at GET /api/cap-table/review.
Approve correct candidates via POST /api/cap-table/review/{id}/approve.
Reject incorrect candidates via POST /api/cap-table/review/{id}/reject.
If a confirmed transaction is wrong, void it: POST /api/cap-table/transactions/{id}/void.
Re-run extraction if source document was recently updated: POST /api/cap-table/extract with the document ID.
Check for linkage mismatches: GET /api/company-settings/linkage-mismatches.
Verify calculation engine output using the test fixtures in product/lib/cap-table/__tests__/calc.test.ts as reference.

ISSUE-005: Activity feed not updating

Symptoms: Activity page (/activity) shows stale data; recent actions not reflected.

Resolution:

Force refresh materialized views: POST /api/activity/unified/refresh.
If feed is still stale, verify delete logging is functioning: run docs/activity_page/VERIFY_DELETE_LOGGING.sql.
Check that the activity_log table is being written to by inspecting recent rows in Supabase.
Review calendar and day views specifically: GET /api/activity/unified/calendar and GET /api/activity/unified/day.

ISSUE-006: Collaboration WebSocket disconnects

Symptoms: Real-time document collaboration drops; users see stale state.

Resolution:

Check the collaboration WebSocket service: docker compose logs collaboration-ws --tail=50.
Verify the @tiptap/y-tiptap dependency is present (PR #112 added this fix).
Check GET /api/collaboration/token is returning valid tokens.
Restart the service: docker compose restart collaboration-ws.

6.3 Regression Testing

The smoke regression suite at product/lib/__tests__/app-smoke-regression.test.ts should be run after any production incident resolution to confirm system integrity:

cd product
npm test -- app-smoke-regression

7. Environment Management

7.1 Environment Configuration

DataZoom operates across three environments. Configuration is driven by environment variables in .env.services (worker fleet) and Vercel project settings (Next.js application).

Environment	Next.js	Workers	Database
Development	`localhost:3000`	`docker compose` local	`supabase start` (local)
Staging	Vercel preview deployment	Docker on staging host	Supabase staging project
Production	Vercel production deployment	Docker on production host	Supabase production project

7.2 Environment Variables

The canonical template is .env.services.example. Operators must copy this to .env.services before starting the worker fleet.

Required variables:

Variable	Used By	Description
`SUPABASE_URL`	All services	Supabase project URL
`SUPABASE_SERVICE_KEY`	All services	Service role key (bypasses RLS)
`CLERK_SECRET_KEY`	Next.js API	Clerk backend SDK authentication
`NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`	Next.js frontend	Clerk frontend SDK
`OLLAMA_HOST`	LLM proxy, worker	Ollama service URL (default: `http://ollama:11434`)
`REDIS_URL`	BullMQ workers	Upstash Redis TLS URL
`SENTRY_DSN`	All services	Sentry error ingestion (optional)

Supabase local development:

supabase start   # starts local Postgres + pgvector + Auth + Storage
supabase stop    # stop local stack
supabase db reset  # reset schema to latest migrations

7.3 Docker Image Configuration

Images are sourced from ghcr.io/midwestco/ in production. For local development, images can be built directly:

docker build -f docker/Dockerfile.base -t datazoom-base .
docker build -f docker/Dockerfile.worker -t datazoom-worker .
docker build -f docker/Dockerfile.gpu -t datazoom-gpu .

The docker-compose.yml OLLAMA_NUM_PARALLEL: "4" and OLLAMA_MAX_LOADED_MODELS: "3" settings should be tuned to the available host hardware in production.

7.4 WireGuard VPN (Worker Networking)

The worker fleet can be configured to communicate over WireGuard for secure inter-service networking. Configuration template is at docker/wireguard/wg0.conf.example. See docker/wireguard/README.md for setup instructions.

8. Secret Management

8.1 Secret Inventory

Secret ID	Name	Location	Consumer	Rotation Frequency
SEC-001	`SUPABASE_URL`	`.env.services` / Vercel	All services	On project migration
SEC-002	`SUPABASE_SERVICE_KEY`	`.env.services` / Vercel	Server-side API, workers	Quarterly
SEC-003	`CLERK_SECRET_KEY`	Vercel env / host env	Next.js API routes	On compromise or quarterly
SEC-004	`NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`	Vercel env	Next.js frontend	On Clerk key rotation
SEC-005	`REDIS_URL`	`.env.services`	BullMQ workers	On Upstash key rotation
SEC-006	`SENTRY_DSN`	`.env.services` / Vercel	Sentry SDK	On project recreation
SEC-007	RunPod API Key	`.env.services` / Fly secrets	Cloud worker, orchestrator	Quarterly
SEC-008	Modal API Token	`.env.services` / Fly secrets	Cloud inference	Quarterly
SEC-009	Fly.io Deploy Token	GitHub Actions secrets	`build-images.yml` workflow	Annually or on compromise
SEC-010	GitHub Container Registry Token	GitHub Actions	`build-images.yml` workflow	Managed by GitHub

8.2 Secret Storage by Environment

Environment	Storage Method
Development	`.env.services` file (gitignored per `.gitignore`)
CI/CD	GitHub Actions encrypted secrets
Production (Next.js)	Vercel project environment variables
Production (Workers)	Host environment or Docker secrets passed at runtime
Production (Fly.io)	`fly secrets set` — encrypted at rest in Fly vault

8.3 Rotation Procedures

Rotating SUPABASE_SERVICE_KEY (SEC-002):

Generate new service role key in Supabase Dashboard → Settings → API.
Update Vercel environment variable (triggers re-deployment).
Update .env.services on all worker hosts.
Restart worker fleet: COMPOSE_PROFILES=full docker compose down && COMPOSE_PROFILES=full docker compose up -d.
Confirm GET /api/admin/pipeline/health returns healthy.

Rotating CLERK_SECRET_KEY (SEC-003):

Generate new secret key in Clerk Dashboard → API Keys.
Update Vercel environment variable.
Vercel will redeploy automatically.
Revoke the old key in Clerk Dashboard only after confirming new deployment is healthy.

Rotating Fly.io secrets:

fly secrets set RUNPOD_API_KEY=<new-key> -a <app-name>
fly secrets set MODAL_TOKEN_ID=<id> MODAL_TOKEN_SECRET=<secret> -a <app-name>

Fly.io automatically restarts the application after secret update.

8.4 Security Notes

.env.services.example is the only secrets-related file committed to the repository. The actual .env.services file must never be committed.
The .claude/settings.local.json and .mcp.json files present in the repository should be reviewed to ensure no credentials are embedded before any repository access is shared externally.
SUPABASE_SERVICE_KEY bypasses Row Level Security. Its use must be restricted to server-side API routes and workers only — never exposed to the client.

9. Health Check Endpoints & Status

9.1 API Health Endpoints

Endpoint ID	Path	Method	Returns	Normal Response
HC-001	`/api/admin/pipeline/health`	GET	Worker fleet status, cloud inference state, queue depth	`{ status: "healthy", cloudStatus: "ready", queueDepth: 0 }`
HC-002	`/api/cap-table/health`	GET	Cap table data integrity report	`{ status: "ok", pendingReview: 0, mismatches: 0 }`
HC-003	`http://localhost:11434/api/tags`	GET	Ollama loaded models	JSON array of model names
HC-004	`http://localhost:8001/health`	GET	LLM proxy service status	`{ status: "ok" }`

9.2 Infrastructure Health Checks

Docker Compose service health:

docker compose ps                    # shows health status for all services
docker compose logs ollama --tail=20  # Ollama logs
docker inspect datazoom_ollama_1 --format='{{.State.Health.Status}}'

Supabase local stack:

supabase status   # shows all local Supabase services and their URLs

Fly.io orchestrator:

fly status -a <orchestrator-app-name>
fly logs -a <orchestrator-app-name>

9.3 Vercel Deployment Status

Vercel deployment status is accessible at the Vercel project dashboard. DataZoom does not maintain a public status page as of the current version. The README badge [![Status](https://img.shields.io/badge/status-production-green)]() is static and not connected to a live status endpoint.

Recommended operator checks after any deployment:

GET /api/admin/pipeline/health — confirms backend workers are reachable from the Next.js layer.
GET /api/cap-table/health — confirms database read path is intact.
GET /api/activity/metrics — confirms Supabase query path for aggregated data.
GET /api/admin/routing-stats — confirms model routing is operating and recording decisions.
Execute smoke regression: cd product && npm test -- app-smoke-regression.

9.4 Queue Health (BullMQ / Upstash)

There is no dedicated BullMQ dashboard endpoint in the repository. Queue state is observable through:

GET /api/admin/pipeline — returns queue snapshot including failed, waiting, and active job counts.
Upstash Redis console — direct inspection of queue keys if credentials are available.

Jobs that remain in active state for >10 minutes without completion should be considered stuck and retried via POST /api/admin/pipeline/retry.

Last updated: derived from repository state as of commits through 960f88c. Maintain this document in sync with changes to admin routes, Docker infrastructure, and secret requirements.

Operator console

DataZoom Operator Console

Table of Contents

1. System Overview

2. Admin Dashboard & Routes

2.1 Admin UI Pages

2.2 Admin API Routes

2.3 Supporting Operational Routes

3. Internal Tooling Inventory

3.1 Shell Scripts

3.2 Docker Compose Profiles

3.3 CI/CD Tooling

3.4 Database Utilities (SQL Scripts)

3.5 Git Hooks

3.6 Queue Infrastructure

4. Operational Workflows

4.1 Deployment

Next.js Application (Vercel)

Docker Worker Fleet

Cloud Worker (Fly.io)

4.2 Rollback

Vercel Rollback

Docker Worker Rollback

Database Migration Rollback

4.3 Feature Flags

4.4 User Management

5. Monitoring & Alerting

5.1 Observability Stack

5.2 What Is Observed

5.3 Ollama Service Health

5.4 Alerting Escalation Path

6. Support Workflows

6.1 Ticket Triage

6.2 Common Issues and Resolution Playbooks

6.3 Regression Testing

7. Environment Management

7.1 Environment Configuration

7.2 Environment Variables

7.3 Docker Image Configuration

7.4 WireGuard VPN (Worker Networking)

8. Secret Management

8.1 Secret Inventory

8.2 Secret Storage by Environment

8.3 Rotation Procedures

8.4 Security Notes

9. Health Check Endpoints & Status

9.1 API Health Endpoints

9.2 Infrastructure Health Checks

9.3 Vercel Deployment Status

9.4 Queue Health (BullMQ / Upstash)