DataZoom

Operator console

Operator console

Last updated 5/3/2026

DataZoom Operator Console

Document ID: OPS-001
Version: 1.0
Classification: Internal — Operations
Audience: Platform engineers, SRE, support engineers, on-call responders


Table of Contents

  1. System Overview
  2. Admin Dashboard & Routes
  3. Internal Tooling Inventory
  4. Operational Workflows
  5. Monitoring & Alerting
  6. Support Workflows
  7. Environment Management
  8. Secret Management
  9. Health Check Endpoints & Status

1. System Overview

DataZoom (midwestco/datazoom) is an AI-powered document analysis platform built on Next.js 15 (App Router), Supabase (PostgreSQL + pgvector), Clerk multi-tenant authentication, and a distributed Python worker fleet for document ingestion and LLM inference. The operational surface spans three distinct compute tiers: the Next.js web application, the Docker-based worker fleet (embedding, reranking, ingest), and the cloud inference layer (Modal/RunPod via orchestrator on Fly.io).

Production topology:

TierRuntimeImage
Web applicationNext.js 15 / Vercel
Collaboration WebSocketNode.jsproduct/collaboration-ws/Dockerfile
Base worker (LLM proxy)Pythonghcr.io/midwestco/datazoom-base:latest
Ingest workerPythonghcr.io/midwestco/datazoom-worker:latest
GPU worker (embed + rerank)Pythonghcr.io/midwestco/datazoom-gpu:latest
Cloud workerPythondocker/cloud-worker/Dockerfile
LLM runtimeOllamaollama/ollama:latest
OrchestratorGo/Pythonfly/orchestrator/Dockerfile (Fly.io)

2. Admin Dashboard & Routes

2.1 Admin UI Pages

The admin surface is scoped under the (app) route group and protected by Clerk organization authentication. The primary operator page is:

RouteFilePurpose
/admin/pipelineproduct/app/(app)/admin/pipeline/page.tsxDocument processing pipeline monitor — displays job queue depth, worker status, cloud routing decisions, and retry controls

2.2 Admin API Routes

All admin API routes live under /api/admin/ and require Clerk authentication with organization-scoped tokens. The cloud proxy endpoint uses Clerk auth directly (not withOrgAuth) per commit d037000.

Route IDPathFileFunction
API-ADM-001GET/POST /api/admin/pipelineproduct/app/api/admin/pipeline/route.tsRead pipeline queue state; trigger manual pipeline operations
API-ADM-002POST /api/admin/pipeline/cloudproduct/app/api/admin/pipeline/cloud/route.tsProxy commands to cloud inference workers (RunPod/Modal); cloud action error handling enforced
API-ADM-003GET /api/admin/pipeline/healthproduct/app/api/admin/pipeline/health/route.tsReturns fleet health from orchestrator (not stale serverless endpoint); RunPod status derived from cloudStatus
API-ADM-004POST /api/admin/pipeline/retryproduct/app/api/admin/pipeline/retry/route.tsRetry failed or stuck pipeline jobs
API-ADM-005GET /api/admin/routing-statsproduct/app/api/admin/routing-stats/route.tsModel routing statistics: local vs. cloud dispatch counts, latency percentiles

2.3 Supporting Operational Routes

These routes are used by operators for data inspection and corrective actions:

Route IDPathPurpose
API-OPS-001GET /api/activity/metricsAggregate activity metrics across org
API-OPS-002GET /api/activity/exportExport activity log as CSV/JSON for audit
API-OPS-003GET /api/activity/unified/refreshForce-refresh materialized activity views
API-OPS-004GET /api/cap-table/healthCap table data integrity check
API-OPS-005POST /api/cap-table/review/{id}/approveOperator approval of pending equity extraction candidates
API-OPS-006POST /api/cap-table/review/{id}/rejectOperator rejection of extraction candidates
API-OPS-007POST /api/cap-table/transactions/{id}/voidVoid a cap table transaction (requires operator role)
API-OPS-008GET /api/company-settings/linkage-mismatchesSurface document-to-company linkage integrity errors
API-OPS-009POST /api/analysis/regenerateForce-regenerate AI analysis for a document
API-OPS-010POST /api/advisor/process-queueManually trigger advisor queue processing

3. Internal Tooling Inventory

3.1 Shell Scripts

ScriptLocationPurpose
setup.sh/setup.shFirst-time developer environment bootstrap: installs Python deps, configures Supabase, seeds initial schema
start_system.shdocs/archive/start_system.shLegacy startup script for local full-stack launch (archived; superseded by docker compose)
entrypoint.shdocker/cloud-worker/entrypoint.shContainer entrypoint for cloud worker: initializes BullMQ Redis connection over TLS (Upstash, ssl_cert_reqs=None), starts worker process

3.2 Docker Compose Profiles

The docker-compose.yml supports four named profiles for selective service startup:

ProfileCommandServices Started
fullCOMPOSE_PROFILES=full docker compose up -dAll services including Ollama LLM runtime and LLM proxy
gpuCOMPOSE_PROFILES=gpu docker compose up -dEmbedding service, reranker, Ollama
workerCOMPOSE_PROFILES=worker docker compose up -dIngest worker only
infraCOMPOSE_PROFILES=infra docker compose up -dMonitoring stack (reserved for future use)

docker-compose.module1.yml and docker/docker-compose.module1-ports.yml provide port-exposed variants for module 1 development.

3.3 CI/CD Tooling

Located in .github/workflows/:

WorkflowFileTriggerFunction
build-images.yml.github/workflows/build-images.ymlPush to main/release branchesBuilds datazoom-base, datazoom-worker, datazoom-gpu, and cloud-worker Docker images; pushes to ghcr.io/midwestco/; uses crane copy for tag promotion with retry logic
knip.yml.github/workflows/knip.ymlPR and pushDead code detection via Knip; fails build on unused exports

Image registry: ghcr.io/midwestco/

ghcr.io/midwestco/datazoom-base:latest
ghcr.io/midwestco/datazoom-worker:latest
ghcr.io/midwestco/datazoom-gpu:latest

Cloud worker image is built directly on GitHub Actions and pushed to ghcr.io (commit 07c91b9).

3.4 Database Utilities (SQL Scripts)

Operational SQL scripts are maintained under docs/archive/ for manual execution against Supabase:

ScriptPathPurpose
check_embedding_status.sqldocs/archive/check_embedding_status.sqlVerify embedding coverage across document_chunks
check_timeline.sqldocs/archive/check_timeline.sqlInspect timeline_events for gaps or anomalies
add_missing_tables_to_cloud.sqldocs/archive/add_missing_tables_to_cloud.sqlMigration script for adding tables to cloud Supabase instance
create_company_settings.sqldocs/archive/create_company_settings.sqlBootstrap company settings table
VERIFY_DELETE_LOGGING.sqldocs/activity_page/VERIFY_DELETE_LOGGING.sqlValidate delete events are being captured in activity log

3.5 Git Hooks

HookLocationAction
pre-commit.husky/pre-commitLint and format checks before commit
pre-push.husky/pre-pushRuns test suite before push to remote

3.6 Queue Infrastructure

Document processing jobs are managed via BullMQ backed by Upstash Redis. Workers connect using redis.asyncio with ssl_cert_reqs=None for Upstash TLS compatibility (fixed in commit b718932). The cloud worker entrypoint initializes this connection on container start.


4. Operational Workflows

4.1 Deployment

Next.js Application (Vercel)

  1. Merge pull request to main branch.
  2. Vercel auto-deploys on merge. Preview deployments are generated for all PRs.
  3. .vercelignore controls which files are excluded from the deployment bundle.
  4. Environment variables must be configured in the Vercel project dashboard before deployment (see Section 8).

Docker Worker Fleet

# 1. Build updated images (CI handles this automatically on push)
docker pull ghcr.io/midwestco/datazoom-base:latest
docker pull ghcr.io/midwestco/datazoom-worker:latest
docker pull ghcr.io/midwestco/datazoom-gpu:latest

# 2. Deploy with appropriate profile
COMPOSE_PROFILES=full docker compose up -d

# 3. Verify health
docker compose ps
curl http://localhost:8001/health   # LLM proxy
curl http://localhost:11434/api/tags  # Ollama

Cloud Worker (Fly.io)

The cloud worker is deployed to Fly.io using the config at docker/cloud-worker/fly.toml. Orchestrator is deployed separately from fly/orchestrator/Dockerfile.

# Deploy orchestrator
cd fly/orchestrator
fly deploy

# Deploy cloud worker
cd docker/cloud-worker
fly deploy

Note: Orchestrator requires 2048 MB memory for performance CPU (see commit 4e7cd14).

4.2 Rollback

Vercel Rollback

  1. Navigate to Vercel project → Deployments.
  2. Identify the last known-good deployment.
  3. Click Promote to Production.
  4. Verify health check endpoints respond (see Section 9).

Docker Worker Rollback

# Pull previous image by digest or tag
docker pull ghcr.io/midwestco/datazoom-worker:<previous-sha>

# Update docker-compose.yml image tag, then redeploy
COMPOSE_PROFILES=worker docker compose up -d

# Confirm old container is replaced
docker compose ps

Database Migration Rollback

DataZoom uses Supabase migrations. If a migration must be reversed:

  1. Connect to Supabase Studio or psql using the service key.
  2. Execute the inverse SQL manually (no automated down-migration scripts are present in the repository).
  3. Update the migration tracking table as appropriate.

4.3 Feature Flags

Feature flags are controlled via the Supabase company_settings table and the cap table feature gate system documented in product/lib/cap-table/__tests__/feature-gate.test.ts. The cap table feature is the primary gated feature as of the current version.

Cap Table Feature Gate:

  • Gate is evaluated per organization.
  • Operators can enable/disable via the company_settings record for a given org.
  • The API endpoint GET /api/company-settings returns the current gate state.
  • POST /api/company-settings updates settings including feature flag state.

Rollout procedure (per docs/cap-table/action_plans/12_rollout_feature_flag_and_rollback.md):

  1. Enable for internal test org first.
  2. Monitor GET /api/cap-table/health for data integrity issues.
  3. Expand to pilot customer orgs.
  4. Full rollout by toggling the default in company_settings.

To disable a feature for a specific org:

UPDATE company_settings
SET cap_table_enabled = false
WHERE org_id = '<clerk-org-id>';

4.4 User Management

User and organization management is handled through Clerk. DataZoom does not maintain a separate user table for auth — Clerk is the system of record.

Common operator actions:

ActionMethod
View organization membersClerk Dashboard → Organizations
Remove a user from an orgClerk Dashboard → Organization → Members → Remove
Reset user sessionClerk Dashboard → Users → Sessions → Revoke
Proxy Clerk API calls internallyPOST /api/clerk/proxy

Multi-tenant isolation: All data operations are scoped by Clerk organization ID. The withOrgAuth middleware enforces this at the API layer. The cloud proxy endpoint is an exception and uses Clerk auth directly without withOrgAuth (commit d037000).


5. Monitoring & Alerting

5.1 Observability Stack

SignalToolConfiguration
User analyticsMixpanelTracked via POST /api/ai/track-interaction; event catalog in docs/archive/MIXPANEL_EVENTS_REPORT.md
Error trackingSentrySENTRY_DSN configured in .env.services (optional but recommended for production)
Container logsDocker json-file driverMax 50 MB per file, 5 files retained (configured in docker-compose.yml x-logging)
Cap table observabilityInternalproduct/lib/cap-table/__tests__/observability.test.ts validates instrumentation

5.2 What Is Observed

Pipeline health:

  • Job queue depth (BullMQ / Upstash Redis)
  • Worker pod state: provisioning → warming → ready (pod warmup tracking added in commit 580b80b)
  • Cloud inference routing decisions: local vs. Modal/RunPod dispatch (visible at GET /api/admin/routing-stats)
  • RunPod fleet status derived from orchestrator cloudStatus (not stale serverless endpoint — commit 088f7b4)

Document processing:

  • Embedding coverage across document_chunks (via check_embedding_status.sql)
  • Failed and stuck jobs visible at GET /api/admin/pipeline
  • Extraction candidate review queue depth at GET /api/cap-table/review

Activity tracking:

  • User actions logged to activity log table; materialized views refresh via POST /api/activity/unified/refresh
  • Daily metrics accessible at GET /api/activity/metrics

Cap table integrity:

  • GET /api/cap-table/health returns data integrity status
  • GET /api/company-settings/linkage-mismatches surfaces document-to-company linkage errors

5.3 Ollama Service Health

Ollama is health-checked by Docker Compose:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
  interval: 30s
  timeout: 10s
  retries: 5
  start_period: 30s

If Ollama fails 5 consecutive checks (5 minutes), Docker will mark the container unhealthy. The LLM proxy on port 8001 depends on this service.

5.4 Alerting Escalation Path

DataZoom does not yet have a formalized PagerDuty or OpsGenie integration in the repository. Escalation is currently manual:

SeverityConditionResponse
P1Web application returns 5xx for >5% of requestsImmediate Vercel rollback; notify engineering lead
P1All worker pods unreachable (/api/admin/pipeline/health returns error)Restart Docker worker fleet; check Fly.io orchestrator status
P2BullMQ queue depth >100 stuck jobsUse POST /api/admin/pipeline/retry to replay; investigate worker logs
P2RunPod fleet shows no ready podsCheck orchestrator on Fly.io; inspect cloud worker logs
P3Embedding coverage <95%Run check_embedding_status.sql; trigger re-embedding via worker
P3Cap table linkage mismatches detectedInspect GET /api/company-settings/linkage-mismatches; run corrective SQL

6. Support Workflows

6.1 Ticket Triage

All inbound support issues should be categorized on receipt:

CategoryIndicatorsOwning Team
Auth / AccessUser cannot log in, org not visible, permission deniedPlatform (Clerk config)
Document ProcessingDocument stuck in processing, no embeddings, missing analysisData pipeline
AI / RAG QualityIncorrect answers, missing citations, hallucinationsML / RAG
Cap TableWrong ownership percentages, extraction errors, voiding neededCap table feature team
Billing / WalletCredits not deducted or over-deductedBilling (docs/billing_wallet/)
PerformanceSlow queries, high latency on chatPlatform / DB

6.2 Common Issues and Resolution Playbooks


ISSUE-001: Document stuck in processing state

Symptoms: Document uploaded but no chunks appear in document_chunks; no embedding visible.

Resolution:

  1. Check GET /api/admin/pipeline for the job in the queue.
  2. If job is in failed state, use POST /api/admin/pipeline/retry with the job ID.
  3. If retry fails, inspect worker container logs: docker compose logs worker --tail=100.
  4. Verify Redis/BullMQ connection: confirm Upstash TLS credentials in .env.services.
  5. If embedding service is unhealthy, restart GPU profile: COMPOSE_PROFILES=gpu docker compose restart.

ISSUE-002: AI chat returning no results / empty context

Symptoms: Chat responds with no citations; RAG retrieval returns zero chunks.

Resolution:

  1. Run check_embedding_status.sql against Supabase to confirm document_chunks.embedding is non-null for the document.
  2. If embeddings are missing, the ingest worker did not complete — follow ISSUE-001 steps.
  3. If embeddings are present, verify the ivfflat index is not corrupted: REINDEX INDEX CONCURRENTLY document_chunks_embedding_idx;
  4. Test vector search directly in Supabase SQL editor with a sample query vector.
  5. If search is operational but chat fails, check Modal/Ollama routing via GET /api/admin/routing-stats.

ISSUE-003: Cloud inference unavailable (Modal/RunPod)

Symptoms: POST /api/admin/pipeline/cloud returns errors; advisor queue stalls.

Resolution:

  1. Check GET /api/admin/pipeline/health — inspect cloudStatus field.
  2. Verify orchestrator on Fly.io is running: fly status -a <orchestrator-app-name>.
  3. Check pod warmup state: pods cycle through provisioning → warming → ready (commit 580b80b). Allow up to 5 minutes for cold start.
  4. If RunPod is the issue, confirm RunPod API key in secrets is valid (see Section 8).
  5. As fallback, ensure local Ollama is running (COMPOSE_PROFILES=gpu docker compose up -d) — the model router will fall back to local inference.
  6. Monitor model routing via GET /api/admin/routing-stats to confirm fallback is active.

ISSUE-004: Cap table shows incorrect ownership

Symptoms: Org reports wrong percentages or missing shareholders in cap table view.

Resolution:

  1. Review pending extraction candidates at GET /api/cap-table/review.
  2. Approve correct candidates via POST /api/cap-table/review/{id}/approve.
  3. Reject incorrect candidates via POST /api/cap-table/review/{id}/reject.
  4. If a confirmed transaction is wrong, void it: POST /api/cap-table/transactions/{id}/void.
  5. Re-run extraction if source document was recently updated: POST /api/cap-table/extract with the document ID.
  6. Check for linkage mismatches: GET /api/company-settings/linkage-mismatches.
  7. Verify calculation engine output using the test fixtures in product/lib/cap-table/__tests__/calc.test.ts as reference.

ISSUE-005: Activity feed not updating

Symptoms: Activity page (/activity) shows stale data; recent actions not reflected.

Resolution:

  1. Force refresh materialized views: POST /api/activity/unified/refresh.
  2. If feed is still stale, verify delete logging is functioning: run docs/activity_page/VERIFY_DELETE_LOGGING.sql.
  3. Check that the activity_log table is being written to by inspecting recent rows in Supabase.
  4. Review calendar and day views specifically: GET /api/activity/unified/calendar and GET /api/activity/unified/day.

ISSUE-006: Collaboration WebSocket disconnects

Symptoms: Real-time document collaboration drops; users see stale state.

Resolution:

  1. Check the collaboration WebSocket service: docker compose logs collaboration-ws --tail=50.
  2. Verify the @tiptap/y-tiptap dependency is present (PR #112 added this fix).
  3. Check GET /api/collaboration/token is returning valid tokens.
  4. Restart the service: docker compose restart collaboration-ws.

6.3 Regression Testing

The smoke regression suite at product/lib/__tests__/app-smoke-regression.test.ts should be run after any production incident resolution to confirm system integrity:

cd product
npm test -- app-smoke-regression

7. Environment Management

7.1 Environment Configuration

DataZoom operates across three environments. Configuration is driven by environment variables in .env.services (worker fleet) and Vercel project settings (Next.js application).

EnvironmentNext.jsWorkersDatabase
Developmentlocalhost:3000docker compose localsupabase start (local)
StagingVercel preview deploymentDocker on staging hostSupabase staging project
ProductionVercel production deploymentDocker on production hostSupabase production project

7.2 Environment Variables

The canonical template is .env.services.example. Operators must copy this to .env.services before starting the worker fleet.

Required variables:

VariableUsed ByDescription
SUPABASE_URLAll servicesSupabase project URL
SUPABASE_SERVICE_KEYAll servicesService role key (bypasses RLS)
CLERK_SECRET_KEYNext.js APIClerk backend SDK authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEYNext.js frontendClerk frontend SDK
OLLAMA_HOSTLLM proxy, workerOllama service URL (default: http://ollama:11434)
REDIS_URLBullMQ workersUpstash Redis TLS URL
SENTRY_DSNAll servicesSentry error ingestion (optional)

Supabase local development:

supabase start   # starts local Postgres + pgvector + Auth + Storage
supabase stop    # stop local stack
supabase db reset  # reset schema to latest migrations

7.3 Docker Image Configuration

Images are sourced from ghcr.io/midwestco/ in production. For local development, images can be built directly:

docker build -f docker/Dockerfile.base -t datazoom-base .
docker build -f docker/Dockerfile.worker -t datazoom-worker .
docker build -f docker/Dockerfile.gpu -t datazoom-gpu .

The docker-compose.yml OLLAMA_NUM_PARALLEL: "4" and OLLAMA_MAX_LOADED_MODELS: "3" settings should be tuned to the available host hardware in production.

7.4 WireGuard VPN (Worker Networking)

The worker fleet can be configured to communicate over WireGuard for secure inter-service networking. Configuration template is at docker/wireguard/wg0.conf.example. See docker/wireguard/README.md for setup instructions.


8. Secret Management

8.1 Secret Inventory

Secret IDNameLocationConsumerRotation Frequency
SEC-001SUPABASE_URL.env.services / VercelAll servicesOn project migration
SEC-002SUPABASE_SERVICE_KEY.env.services / VercelServer-side API, workersQuarterly
SEC-003CLERK_SECRET_KEYVercel env / host envNext.js API routesOn compromise or quarterly
SEC-004NEXT_PUBLIC_CLERK_PUBLISHABLE_KEYVercel envNext.js frontendOn Clerk key rotation
SEC-005REDIS_URL.env.servicesBullMQ workersOn Upstash key rotation
SEC-006SENTRY_DSN.env.services / VercelSentry SDKOn project recreation
SEC-007RunPod API Key.env.services / Fly secretsCloud worker, orchestratorQuarterly
SEC-008Modal API Token.env.services / Fly secretsCloud inferenceQuarterly
SEC-009Fly.io Deploy TokenGitHub Actions secretsbuild-images.yml workflowAnnually or on compromise
SEC-010GitHub Container Registry TokenGitHub Actionsbuild-images.yml workflowManaged by GitHub

8.2 Secret Storage by Environment

EnvironmentStorage Method
Development.env.services file (gitignored per .gitignore)
CI/CDGitHub Actions encrypted secrets
Production (Next.js)Vercel project environment variables
Production (Workers)Host environment or Docker secrets passed at runtime
Production (Fly.io)fly secrets set — encrypted at rest in Fly vault

8.3 Rotation Procedures

Rotating SUPABASE_SERVICE_KEY (SEC-002):

  1. Generate new service role key in Supabase Dashboard → Settings → API.
  2. Update Vercel environment variable (triggers re-deployment).
  3. Update .env.services on all worker hosts.
  4. Restart worker fleet: COMPOSE_PROFILES=full docker compose down && COMPOSE_PROFILES=full docker compose up -d.
  5. Confirm GET /api/admin/pipeline/health returns healthy.

Rotating CLERK_SECRET_KEY (SEC-003):

  1. Generate new secret key in Clerk Dashboard → API Keys.
  2. Update Vercel environment variable.
  3. Vercel will redeploy automatically.
  4. Revoke the old key in Clerk Dashboard only after confirming new deployment is healthy.

Rotating Fly.io secrets:

fly secrets set RUNPOD_API_KEY=<new-key> -a <app-name>
fly secrets set MODAL_TOKEN_ID=<id> MODAL_TOKEN_SECRET=<secret> -a <app-name>

Fly.io automatically restarts the application after secret update.

8.4 Security Notes

  • .env.services.example is the only secrets-related file committed to the repository. The actual .env.services file must never be committed.
  • The .claude/settings.local.json and .mcp.json files present in the repository should be reviewed to ensure no credentials are embedded before any repository access is shared externally.
  • SUPABASE_SERVICE_KEY bypasses Row Level Security. Its use must be restricted to server-side API routes and workers only — never exposed to the client.

9. Health Check Endpoints & Status

9.1 API Health Endpoints

Endpoint IDPathMethodReturnsNormal Response
HC-001/api/admin/pipeline/healthGETWorker fleet status, cloud inference state, queue depth{ status: "healthy", cloudStatus: "ready", queueDepth: 0 }
HC-002/api/cap-table/healthGETCap table data integrity report{ status: "ok", pendingReview: 0, mismatches: 0 }
HC-003http://localhost:11434/api/tagsGETOllama loaded modelsJSON array of model names
HC-004http://localhost:8001/healthGETLLM proxy service status{ status: "ok" }

9.2 Infrastructure Health Checks

Docker Compose service health:

docker compose ps                    # shows health status for all services
docker compose logs ollama --tail=20  # Ollama logs
docker inspect datazoom_ollama_1 --format='{{.State.Health.Status}}'

Supabase local stack:

supabase status   # shows all local Supabase services and their URLs

Fly.io orchestrator:

fly status -a <orchestrator-app-name>
fly logs -a <orchestrator-app-name>

9.3 Vercel Deployment Status

Vercel deployment status is accessible at the Vercel project dashboard. DataZoom does not maintain a public status page as of the current version. The README badge [![Status](https://img.shields.io/badge/status-production-green)]() is static and not connected to a live status endpoint.

Recommended operator checks after any deployment:

  1. GET /api/admin/pipeline/health — confirms backend workers are reachable from the Next.js layer.
  2. GET /api/cap-table/health — confirms database read path is intact.
  3. GET /api/activity/metrics — confirms Supabase query path for aggregated data.
  4. GET /api/admin/routing-stats — confirms model routing is operating and recording decisions.
  5. Execute smoke regression: cd product && npm test -- app-smoke-regression.

9.4 Queue Health (BullMQ / Upstash)

There is no dedicated BullMQ dashboard endpoint in the repository. Queue state is observable through:

  • GET /api/admin/pipeline — returns queue snapshot including failed, waiting, and active job counts.
  • Upstash Redis console — direct inspection of queue keys if credentials are available.

Jobs that remain in active state for >10 minutes without completion should be considered stuck and retried via POST /api/admin/pipeline/retry.


Last updated: derived from repository state as of commits through 960f88c. Maintain this document in sync with changes to admin routes, Docker infrastructure, and secret requirements.