Cloud Infrastructure Strategy β€” Microsoft Azure

This document covers the target production architecture for Evergrn on Azure β€” from a single local instance to a scalable, cloud-hosted deployment. It addresses load balancing, containerization, database concurrency, connection pooling, and the code changes required before launch.


Dev Environment β€” Currently Deployed

Resource group: evergrn-dev | Region: Canada Central (App Service), East US 2 (PostgreSQL)

Component Azure Resource Details Status
Source control Azure DevOps repo https://dev.azure.com/evergrn β€” all code in single private repo βœ… Live
Coming soon page Azure Static Web Apps (Free) www.evergrn.co β†’ green-flower-03302980f.7.azurestaticapps.net βœ… Live
API Azure App Service (B1 Basic) evergrn-api-dev.azurewebsites.net (canadacentral) βœ… Deployed
Database Azure Database for PostgreSQL Flexible Server evergrn-db.postgres.database.azure.com β€” Burstable B1ms, East US 2 βœ… Live
File storage Azure Blob Storage (Cool tier) evergrnuploads β€” containers: jobs, logos, headshots, id-docs, addresses βœ… Live
Observability Azure Application Insights evergrn-api-insights (canadacentral) β€” auto-instruments Express, Prisma, Stripe βœ… Live
Secrets Azure Key Vault (Standard, RBAC) evergrn-vault-dev β€” DATABASE-URL, JWT-SECRET, STRIPE-*, AZURE-STORAGE-CONNECTION-STRING, APPLICATIONINSIGHTS-CONNECTION-STRING βœ… Live
Email Azure Communication Services evergrn-comms (East US) β€” custom domain noreply@evergrn.co verified with DKIM/SPF; ACS_CONNECTION_STRING + ACS_SENDER set as App Service env vars βœ… Live
Pkg storage Azure Blob Storage (evergrnpkgstore) deployments/ container β€” App Service runs from timestamped zip blob βœ… Live
Document Intelligence Azure AI Document Intelligence evergrn-doc-intel-46238.cognitiveservices.azure.com (East US 2) β€” driver's license OCR (prebuilt-idDocument) + insurance certificate OCR (prebuilt-read); DOC_INTEL_ENDPOINT + DOC_INTEL_KEY env vars βœ… Live
CI/CD pipeline Azure DevOps Pipeline (Static Web Apps) Auto-deploys coming-soon/ on push to main βœ… Live

Dev API endpoint

https://evergrn-api-dev-c7dxhkf3ctcgdqby.canadacentral-01.azurewebsites.net

Dev App Service config

What's NOT yet in dev Azure (still local or not built)

Concern Current Next step
Web frontend Vite dev server (localhost:5173) Included in staging/prod deploys via client/dist/ in zip
Notification queue In-memory Map Azure Cache for Redis (pre-launch requirement)
Rate limit store In-memory (per-instance) Same Redis instance as notification queue
iOS app EAS build pending Apple Developer approval EAS submit after account approved
DMARC record Not configured for evergrn.co Add _dmarc.evergrn.co TXT record (low priority)

Staging Environment β€” Currently Deployed

Resource group: evergrn-stage | Region: Canada Central (App Service), East US 2 (PostgreSQL)

Architecture (as of 2026-06-28): Web/API Split

The staging environment is split into two separate services. This was done to allow the web UI to be protected by Entra Conditional Access (MFA) without interfering with the iOS app's direct API access.

Component Azure Resource URL Status
Web frontend Azure Static Web App (Standard) evergrn-web-staging web.staging.evergrn.co βœ… Live
API Azure App Service (B1 Basic) evergrn-api-stage staging.evergrn.co βœ… Live
Database Azure PostgreSQL Flexible Server evergrn-db-stage evergrn-db-stage.postgres.database.azure.com β€” Standard_B1ms βœ… Live
File storage Azure Blob Storage Shares evergrnuploads with dev βœ… Shared
Email Azure Communication Services Shares evergrn-comms with dev βœ… Shared
Pkg storage Azure Blob Storage (evergrnpkgstore) Shares deployments/ with dev βœ… Shared
SSL Azure Managed Certificate + SWA cert Both domains HTTPS βœ… Live
CI/CD (API) Azure DevOps Pipeline deploy-staging (ID: 2) Manual trigger + approval gate βœ… Live

Access control

Web frontend (web.staging.evergrn.co) β€” Protected by Entra Conditional Access:

API (staging.evergrn.co) β€” Open (no IP restriction, no Easy Auth):

Web frontend (SWA) config

To deploy the web frontend (SWA)

# Build with staging mode (sets VITE_API_BASE=https://staging.evergrn.co)
Set-Location c:\Repos\evergrn\client
npx vite build --mode staging

# Deploy to SWA
npx @azure/static-web-apps-cli deploy .\dist `
  --deployment-token "<token from SWA secrets>" `
  --env production

Deployment token: retrieve with az staticwebapp secrets list --name evergrn-web-staging --resource-group evergrn-stage --query "properties.apiKey" -o tsv

To release to staging (API)

  1. Push changes to main in ADO
  2. Go to https://dev.azure.com/evergrn/evergrn/_build?definitionId=2
  3. Click Run pipeline β†’ approve the manual gate when prompted

Staging App Service config

Staging database


Current State vs. Target State

Concern Current (dev) Target (Azure production)
API server Azure App Service B1 (evergrn-api-dev) βœ… Azure Container Apps (multiple containerized instances)
Frontend Vite dev server on port 5173 (local) Static build in Azure Blob Storage, served via Azure Front Door
Load balancing None Azure Front Door (global CDN + load balancer + WAF in one)
Database Azure PostgreSQL Flexible Server (evergrn-db, East US) βœ… Same service, zone-redundant HA, PgBouncer on port 6432
DB connections App Service β†’ PostgreSQL direct (port 5432) Container Apps β†’ built-in PgBouncer β†’ PostgreSQL Flexible Server
File uploads Azure Blob Storage (evergrnuploads, Cool tier) βœ… Same β€” add Front Door CDN in front
Notification queue In-memory Map (per-instance, lost on restart) Azure Cache for Redis
Secrets Azure Key Vault (evergrn-vault-dev) + Managed Identity βœ… Same pattern β€” Key Vault + Managed Identity
TLS/HTTPS App Service managed cert (HTTPS) βœ… Terminated at Azure Front Door β€” all traffic HTTPS
Payout cron Not yet built Azure Functions (Timer Trigger) β€” runs once daily at 8 AM
CI/CD Azure DevOps: Pipeline ID 2 (API deploy) + Pipeline ID 4 (SWA web deploy) βœ… Full pipeline: API container build + push + frontend deploy
Monitoring Azure Application Insights (evergrn-api-insights) βœ… Same β€” add custom alerts and dashboards

Target Architecture

                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚         Azure DNS                      β”‚
                      β”‚  evergrn.co β†’ Azure Front Door         β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚            Azure Front Door (Premium)           β”‚
              β”‚                                                 β”‚
              β”‚  β€’ Global anycast routing (150+ edge nodes)     β”‚
              β”‚  β€’ SSL/TLS termination (free managed certs)     β”‚
              β”‚  β€’ CDN caching for static assets                β”‚
              β”‚  β€’ Web Application Firewall (WAF)               β”‚
              β”‚  β€’ Path-based routing rules                     β”‚
              β”‚  β€’ Health probes to origin groups               β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                β”‚
               /api/* routes β”‚                β”‚ /* all other routes
                             β”‚                β”‚
                             β–Ό                β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Azure Container     β”‚  β”‚  Azure Blob Storage         β”‚
              β”‚  Apps Environment    β”‚  β”‚  (Static Website hosting)   β”‚
              β”‚                      β”‚  β”‚                             β”‚
              β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”β”‚  β”‚  client/dist/ build output β”‚
              β”‚  β”‚API β”‚ β”‚API β”‚ β”‚API β”‚β”‚  β”‚  index.html + assets        β”‚
              β”‚  β”‚ #1 β”‚ β”‚ #2 β”‚ β”‚ #N β”‚β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚  β””β”€β”€β”¬β”€β”˜ β””β”€β”€β”¬β”€β”˜ β””β”€β”€β”¬β”€β”˜β”‚
              β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜  β”‚
              β”‚      Auto-scaling     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚               β”‚                  β”‚
         β–Ό               β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Azure Database β”‚ β”‚ Azure Cache   β”‚ β”‚ Azure Blob Storage   β”‚
β”‚ for PostgreSQL β”‚ β”‚ for Redis     β”‚ β”‚ (uploads container)  β”‚
β”‚ Flexible Serverβ”‚ β”‚               β”‚ β”‚                      β”‚
β”‚                β”‚ β”‚ β€’ Notif queue β”‚ β”‚ jobs/{jobId}-*.jpg   β”‚
β”‚ Built-in       β”‚ β”‚ β€’ Rate limit  β”‚ β”‚ logos/{providerId}   β”‚
β”‚ PgBouncer βœ“   β”‚ β”‚   state       β”‚ β”‚                      β”‚
β”‚                β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Served via Front Doorβ”‚
β”‚ Zone-redundant β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ HA (2 AZs)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚         Supporting Services           β”‚
         β”‚                                       β”‚
         β”‚  Azure Container Registry (ACR)       β”‚
         β”‚    β†’ stores Docker images             β”‚
         β”‚                                       β”‚
         β”‚  Azure Key Vault                      β”‚
         β”‚    β†’ DATABASE_URL, JWT_SECRET, STRIPE β”‚
         β”‚                                       β”‚
         β”‚  Azure Managed Identity               β”‚
         β”‚    β†’ Container Apps β†’ Key Vault/ACR   β”‚
         β”‚      with no stored credentials       β”‚
         β”‚                                       β”‚
         β”‚  Azure Monitor + Application Insights β”‚
         β”‚    β†’ logs, metrics, alerts, traces    β”‚
         β”‚                                       β”‚
         β”‚  Azure Functions (Timer Trigger)      β”‚
         β”‚    β†’ daily 8 AM payout cron           β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Breakdown

1. Azure Front Door β€” The Front Door Load Balancer

Azure Front Door is Microsoft's global entry point service β€” it is literally named for the role you described. It combines what would otherwise require multiple services (CDN, load balancer, WAF) into one:

Tier recommendation: Front Door Standard for MVP (~$35/month base). Upgrade to Premium when WAF policy tuning and advanced routing are needed.

2. Frontend β€” Azure Blob Storage (Static Website)

The React app builds to static files β€” plain HTML, JS, and CSS. No container is needed to serve them.

npm run build          # produces client/dist/
az storage blob upload-batch \
  --source client/dist/ \
  --destination '$web' \
  --account-name evergnweb

Azure Blob Storage's Static Website feature serves the $web container directly. Front Door sits in front of it and handles HTTPS, caching, and compression.

Single-page app routing: configure a custom error document in Blob Storage static website settings pointing index.html as both the index and the 404 document. This ensures React Router handles all client-side routes instead of getting a 404 from the storage layer.

Why not a container for the frontend? A Container App running Nginx to serve static files costs ~$15-30/month and adds a container to manage. Blob Storage + Front Door costs under $5/month and delivers files from the nearest edge node. Containerize the frontend only if you adopt server-side rendering (Next.js). Not applicable here.

3. API β€” Azure Container Apps

Azure Container Apps is Azure's serverless container platform β€” equivalent to AWS Fargate. You define what container to run and how to scale it; Azure manages the underlying infrastructure.

Advantages over AKS for this use case:

Container App configuration:

# containerapp.yaml (simplified)
name: evergrn-api
image: evergrnacr.azurecr.io/evergrn-api:latest
resources:
  cpu: 0.5
  memory: 1Gi
scale:
  minReplicas: 2      # always 2 for HA across zones
  maxReplicas: 10
  rules:
    - name: http-scaling
      http:
        metadata:
          concurrentRequests: 50   # scale out when >50 concurrent requests per replica
ingress:
  external: false     # traffic comes only from Front Door, not directly from internet
  targetPort: 3000

Dockerfile (to be created at project root):

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npx prisma generate
EXPOSE 3000
CMD ["node", "server.js"]

Azure Container Registry (ACR) stores built images. CI/CD pushes a new image tag on every merge to main, then triggers a Container Apps revision update.

4. Database β€” Azure Database for PostgreSQL Flexible Server

Azure's managed PostgreSQL service with two features critical to this architecture:

Zone-redundant high availability: The Flexible Server maintains a synchronous standby in a second Availability Zone. If the primary fails, Azure automatically promotes the standby and updates the connection endpoint β€” typically within 60-120 seconds, zero data loss (synchronous replication).

Built-in PgBouncer: Flexible Server includes PgBouncer as a managed toggle β€” no separate service to deploy or maintain. Enable it in the portal or via CLI:

az postgres flexible-server parameter set \
  --resource-group evergrn-rg \
  --server-name evergrn-db \
  --name pgbouncer.enabled \
  --value on

Once enabled, your DATABASE_URL points to port 6432 (PgBouncer) instead of 5432 (direct PostgreSQL). PgBouncer multiplexes all Container App connections onto a small fixed pool of real database connections:

10 API containers Γ— 10 Prisma connections = 100 client connections β†’ PgBouncer
PgBouncer β†’ 20-30 actual connections to PostgreSQL

Instance sizing:

Storage: Start at 128 GB with auto-grow enabled up to 16 TB.

Read replica (Phase 2): Add one read replica for provider insights, admin analytics, and GodMode queries. Route all writes to the primary endpoint; route read-heavy queries to the replica endpoint.


Data Layer β€” Concurrency & Transaction Safety

This is the most critical section. With multiple Container App replicas handling requests simultaneously, several operations in the current codebase are unsafe without proper transaction isolation.

The core problem

Each Container App replica maintains its own Prisma connection pool. Without explicit transactions, two replicas can interleave reads and writes on the same rows:

Replica 1:  SELECT job WHERE id=X  β†’  status=QUOTED  βœ“ (ok to accept)
Replica 2:  SELECT job WHERE id=X  β†’  status=QUOTED  βœ“ (ok to accept)
Replica 1:  UPDATE job SET status=ACCEPTED            ← both succeed
Replica 2:  UPDATE job SET status=ACCEPTED            ← job double-accepted

Solution: Prisma transactions + SELECT FOR UPDATE

PostgreSQL's SELECT FOR UPDATE acquires a row-level lock. Any other transaction attempting the same lock on that row blocks until the first transaction commits or rolls back. Combined with prisma.$transaction(), critical sections become atomic.

Operations that require transaction protection (to be fixed before launch):

Quote acceptance β€” POST /quotes/:id/accept

// Current (unsafe β€” race condition with multiple replicas):
const quote = await prisma.quote.findUnique({ where: { id } })
if (job.status !== 'QUOTED') throw error
await prisma.quote.update(...)
await prisma.quote.updateMany(...)
await prisma.job.update(...)

// Required (safe β€” row lock prevents concurrent acceptance):
await prisma.$transaction(async (tx) => {
  // Lock the job row β€” all other concurrent transactions block here until this commits
  const [job] = await tx.$queryRaw`
    SELECT * FROM "Job" WHERE id = ${jobId} FOR UPDATE
  `
  if (job.status !== 'QUOTED') throw new Error('Job no longer available')
  await tx.quote.update({ where: { id }, data: { status: 'ACCEPTED' } })
  await tx.quote.updateMany({
    where: { jobId, id: { not: id } },
    data: { status: 'REJECTED' }
  })
  await tx.job.update({ where: { id: jobId }, data: { status: 'ACCEPTED', providerId } })
})

Job status transitions β€” PATCH /jobs/:id/status

Two replicas must not both advance the same job status simultaneously:

await prisma.$transaction(async (tx) => {
  const [job] = await tx.$queryRaw`
    SELECT * FROM "Job" WHERE id = ${id} FOR UPDATE
  `
  if (!validTransition(job.status, newStatus)) throw new Error('Invalid transition')
  await tx.job.update({ where: { id }, data: { status: newStatus } })
})

Payout cron β€” payment capture

The Azure Function Timer Trigger must be idempotent β€” if it crashes and restarts mid-run, it must not double-capture. Lock the payment row before capturing:

await prisma.$transaction(async (tx) => {
  const [payment] = await tx.$queryRaw`
    SELECT * FROM "Payment"
    WHERE id = ${id} AND status = 'HELD'
    FOR UPDATE
  `
  if (!payment) return  // another run already processed this
  await tx.payment.update({ where: { id }, data: { status: 'PROCESSING' } })
  // Stripe capture happens outside the transaction (external HTTP call)
})
await stripe.paymentIntents.capture(paymentIntentId)
await prisma.payment.update({ where: { id }, data: { status: 'PAID' } })

PgBouncer mode compatibility

Azure's built-in PgBouncer runs in transaction pooling mode by default β€” a connection is returned to the pool after each transaction, not held for the lifetime of the client. This is compatible with Prisma and the SELECT FOR UPDATE patterns above, because each FOR UPDATE lock lives and dies inside a single $transaction() call. The lock does not need to persist across multiple roundtrips, so transaction pooling works correctly.

Notification Queue β€” Azure Cache for Redis

The current notificationQueue.js uses an in-memory Map. With multiple Container App replicas this breaks: if Replica 1 enqueues a notification and Replica 2 handles the poll from that user, Replica 2's Map is empty and the notification is never delivered.

Replace with Azure Cache for Redis:

// src/config/notificationQueue.js β€” updated for production
const { createClient } = require('redis')
const redis = createClient({ url: process.env.REDIS_URL })
await redis.connect()

async function enqueue(role, userId, notification) {
  // TTL of 1 hour β€” notification expires if never polled
  await redis.setEx(`notif:${role}:${userId}`, 3600, JSON.stringify(notification))
}

async function dequeue(role, userId) {
  // GETDEL is atomic β€” read and delete in one operation
  // Two concurrent polls cannot both receive the same notification
  const val = await redis.getDel(`notif:${role}:${userId}`)
  return val ? JSON.parse(val) : null
}

Redis also replaces the express-rate-limit in-memory store β€” rate limit counters must be shared across all replicas or each replica has its own independent counter, defeating the purpose.

Azure Cache for Redis tier: Basic C0 (250 MB) β€” ~$17/month. Sufficient for notification queue and rate limiting at MVP scale. Upgrade to Standard C1 (replicated, ~$110/month) for production HA when needed.


File Storage β€” Azure Blob Storage + Lifecycle Management

The uploads/ directory is local to each container replica. When Azure Container Apps replaces a replica (deploy, crash, scale-in), uploaded files are lost. With multiple replicas, a file uploaded to Replica 1 is invisible to Replica 2.

Storage container structure

Two Blob Storage containers, each with its own access and lifecycle policy:

evergrn-assets (Storage Account)
β”‚
β”œβ”€β”€ jobs/          ← all job photos (before, after, general)
β”‚   └── {jobId}-{timestamp}.jpg
β”‚
└── logos/         ← provider logos (permanent, always Hot)
    └── {providerId}.jpg

Files are served at https://cdn.evergrn.co/jobs/... β€” Front Door caches them at edge nodes. Blob Storage is set to private; all access goes through Front Door, never directly to the storage URL.

Access control: Container Apps reads/writes using a Managed Identity with Storage Blob Data Contributor role β€” no storage account keys stored anywhere.

Code migration

Replace Multer disk storage with direct Azure Blob SDK streaming β€” files go to Blob Storage and never touch the container filesystem:

// src/middleware/upload.js
const { BlobServiceClient } = require('@azure/storage-blob')
const { DefaultAzureCredential } = require('@azure/identity')
const multer = require('multer')

const blobService = new BlobServiceClient(
  `https://${process.env.STORAGE_ACCOUNT}.blob.core.windows.net`,
  new DefaultAzureCredential()   // uses Container App's Managed Identity β€” no keys
)

// In-memory storage (multer holds file in buffer, we push to Blob)
const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 10 * 1024 * 1024 } })

async function uploadToBlob(container, blobName, buffer, mimetype) {
  const client = blobService.getContainerClient(container).getBlockBlobClient(blobName)
  await client.uploadData(buffer, { blobHTTPHeaders: { blobContentType: mimetype } })
  return `https://cdn.evergrn.co/${container}/${blobName}`
}

module.exports = { upload, uploadToBlob }

Blob lifecycle management policy

Azure Blob Storage lifecycle rules automatically move blobs between access tiers on a schedule. There is no charge for the policy itself β€” cost differences come from the tier rates and retrieval fees described below.

Access tiers and what they mean:

Tier Storage cost (per GB/month) Retrieval cost Access speed Use for
Hot $0.018 Free Instant Active, frequently accessed files
Cool $0.010 $0.01/GB Instant Infrequent access, files > 30 days old
Cold $0.0036 $0.02/GB Instant Rare access, files > 90 days old
Archive $0.00099 $0.02/GB + rehydration Hours (rehydration) Compliance hold only

Lifecycle policy for job photos (jobs/ prefix):

{
  "rules": [
    {
      "name": "job-photos-lifecycle",
      "enabled": true,
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": ["blockBlob"],
          "prefixMatch": ["jobs/"]
        },
        "actions": {
          "baseBlob": {
            "tierToCool":    { "daysAfterModificationGreaterThan": 90  },
            "tierToCold":    { "daysAfterModificationGreaterThan": 365 },
            "tierToArchive": { "daysAfterModificationGreaterThan": 730 },
            "delete":        { "daysAfterModificationGreaterThan": 1095 }
          }
        }
      }
    },
    {
      "name": "logos-permanent",
      "enabled": true,
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": ["blockBlob"],
          "prefixMatch": ["logos/"]
        },
        "actions": {
          "baseBlob": {}
        }
      }
    }
  ]
}

Why these thresholds:

Apply the policy via Azure CLI:

az storage account management-policy create \
  --account-name evergnassets \
  --resource-group evergrn-rg \
  --policy @lifecycle-policy.json

Front Door caching for photos

Front Door caches photos at edge nodes after the first request. Subsequent requests for the same photo are served from cache β€” no round-trip to Blob Storage and no retrieval fee. Set long cache TTLs on job photos since they never change after upload:

Cache-Control: public, max-age=31536000, immutable

This means a before-photo uploaded today is cached at the edge nearest each user for up to a year. The first request costs an origin read; every request after that is free from cache until the CDN TTL expires.


Secrets β€” Azure Key Vault + Managed Identity

Azure Key Vault stores all secrets. Azure Managed Identity gives the Container Apps service a cryptographically verified Azure identity β€” no passwords, no keys, no .env files in the container image.

# Store secrets in Key Vault
az keyvault secret set --vault-name evergrn-kv --name DATABASE-URL    --value "..."
az keyvault secret set --vault-name evergrn-kv --name JWT-SECRET       --value "..."
az keyvault secret set --vault-name evergrn-kv --name STRIPE-SECRET    --value "..."
az keyvault secret set --vault-name evergrn-kv --name REDIS-URL        --value "..."
az keyvault secret set --vault-name evergrn-kv --name STRIPE-WEBHOOK-SECRET --value "..."

# Grant Container App's Managed Identity access
az keyvault set-policy --name evergrn-kv \
  --object-id <container-app-managed-identity-id> \
  --secret-permissions get list

At runtime, Container Apps fetches secrets from Key Vault and injects them as environment variables. No human ever handles the raw values after initial setup.


Payout Cron β€” Azure Functions (Timer Trigger)

Rather than running the cron inside an API container (which requires one designated replica to handle it, complicating scaling), use an Azure Function with a Timer Trigger. Azure Functions are purpose-built for scheduled jobs β€” they run on-demand, cost nothing when idle, and have no container to keep running.

// src/functions/payoutCron.js
const { app } = require('@azure/functions')

app.timer('dailyPayout', {
  schedule: '0 8 * * *',   // 8:00 AM daily (server time β€” use UTC and account for offset)
  handler: async (myTimer, context) => {
    const cutoff = new Date()
    cutoff.setHours(8, 0, 0, 0)   // today at 08:00:00

    const eligiblePayments = await prisma.payment.findMany({
      where: { status: 'HELD', releaseAt: { lte: cutoff } },
      include: { job: { include: { report: true, provider: true } } }
    })

    for (const payment of eligiblePayments) {
      if (payment.job.report) {
        // Dispute filed β€” block payout, flag for admin
        await prisma.payment.update({
          where: { id: payment.id },
          data: { status: 'DISPUTED' }
        })
        continue
      }
      // Lock, capture, create ProviderPayout ledger entry
      await captureAndRecordPayout(payment, cutoff)
    }
  }
})

Azure Functions runs in the same Resource Group and VNet as Container Apps, so it has private access to the database (no public internet exposure).


CI/CD β€” Azure DevOps Pipelines

Every merge to main triggers the pipeline:

# azure-pipelines.yml
trigger:
  branches:
    include: [main]

stages:
  - stage: BuildAPI
    jobs:
      - job: Docker
        steps:
          - task: Docker@2
            inputs:
              containerRegistry: evergrn-acr
              repository: evergrn-api
              command: buildAndPush
              tags: $(Build.BuildId)

          - task: AzureContainerApps@1
            inputs:
              azureSubscription: evergrn-service-connection
              containerAppName: evergrn-api
              imageToDeploy: evergrnacr.azurecr.io/evergrn-api:$(Build.BuildId)

  - stage: BuildFrontend
    jobs:
      - job: StaticSite
        steps:
          - script: cd client && npm ci && npm run build
          - task: AzureFileCopy@4
            inputs:
              SourcePath: client/dist/
              Destination: AzureBlob
              storage: evergrnweb
              ContainerName: $web
          - task: AzureCLI@2
            inputs:
              inlineScript: |
                az afd endpoint purge \
                  --profile-name evergrn-fd \
                  --endpoint-name evergrn \
                  --content-paths "/*"

Container Apps performs a rolling update β€” new revisions receive traffic incrementally after passing health checks. Old revisions are decommissioned. Zero downtime during deploys.


Monitoring β€” Azure Monitor + Application Insights

Add the Application Insights SDK to the Node.js API:

// server.js β€” add at the very top, before all other requires
const appInsights = require('applicationinsights')
appInsights.setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
  .setAutoCollectRequests(true)
  .setAutoCollectExceptions(true)
  .setAutoCollectDependencies(true)  // automatically traces PostgreSQL queries
  .start()

This provides out-of-the-box:


Migration Phases

Phase 1 β€” Launch

Minimum viable Azure setup. Single region (East US 2 recommended for East Coast rural focus), zone-redundant database, 2 Container App replicas.

Component Azure Service Est. Monthly Cost
Front door + CDN Azure Front Door Standard ~$35
API containers Container Apps (2 replicas, 0.5 vCPU / 1 GB) ~$30
Container registry Azure Container Registry Basic ~$5
Database PostgreSQL Flexible Server Standard_B2ms, zone-redundant HA ~$130
Redis Azure Cache for Redis Basic C0 (250 MB) ~$17
File storage Azure Blob Storage + Front Door ~$5–15
Secrets Azure Key Vault ~$5
Functions Azure Functions Consumption Plan (payout cron) ~$1
Monitoring Application Insights (5 GB/month free tier) ~$0–10
DNS Azure DNS ~$1
Total ~$265–290/month

Phase 2 β€” Early Growth

Phase 3 β€” Scale


Code Changes Required Before Go-Live

See current-documentation/go-live-checklist.md for the authoritative status of all launch blockers.


What Does NOT Need to Change


Cost Scaling Analysis β€” Flat Rate vs. Consumption

Not all Azure services scale the same way. Understanding which costs are fixed, which grow linearly with users, and which grow with data volume is essential for financial planning and avoiding surprise bills.

Service-by-service breakdown

Service Pricing model What drives growth Growth curve
Azure Front Door Standard Base fee + per GB egress + per 10k requests Traffic volume, data served Linear with users
Azure Container Apps Per vCPU-second + per GiB-second Concurrent active users Linear with traffic
PostgreSQL Flexible Server Flat per instance size Nothing β€” until you resize Step-function
Azure Cache for Redis Flat per cache tier Nothing β€” until you upgrade tier Step-function
Azure Blob Storage (storage) Per GB stored Total photos accumulated over time Linear with jobs
Azure Blob Storage (egress) Per GB served to internet Photo views by customers and providers Linear with activity
Azure Container Registry Flat (Basic $5/month) Nothing at this scale Flat
Azure Key Vault Per 10k operations Negligible at any scale Effectively flat
Azure Functions (payout cron) Per execution (first 1M free) Nothing β€” 31 executions/month Free indefinitely
Azure DNS Per zone + per million queries Negligible Effectively flat
Application Insights First 5 GB/month free, then per GB Request volume, log verbosity Linear with traffic

The two cost growth patterns

Pattern 1 β€” Linear with traffic (the variable costs)

These services charge for actual consumption and scale directly with userbase size:

Azure Container Apps is the clearest signal. It charges per vCPU-second and per GiB-second of actual runtime across all replicas. As auto-scaling spins up more replicas to handle concurrent users, the bill grows proportionally. This is the most honest cost signal you have β€” if Container Apps spend doubles, your traffic doubled. Budget roughly:

~$0.04 per vCPU-hour per replica
2 replicas minimum: ~$30/month base
10 replicas at peak: ~$145/month

Azure Front Door charges for data transferred out to end users. Every API response and every photo served through Front Door costs:

API responses are small (kilobytes of JSON). The meaningful egress driver is photos β€” every time a customer views their job history, or a provider's logo loads, that's data out. Front Door's CDN cache significantly reduces this: a photo cached at the edge is served from cache for free after the first fetch. For photo-heavy pages, expect most reads to hit cache within the first week of a job's existence.

Application Insights ingestion grows with request volume. The free 5 GB/month covers a comfortable number of requests. At scale, set a daily ingestion cap to prevent runaway logging from an unexpected traffic spike from bloating the bill.

Pattern 2 β€” Step-function (the fixed costs that jump on upgrade)

These services have a flat monthly rate until you outgrow the tier, at which point you upgrade and the cost jumps to the next tier's flat rate:

PostgreSQL Flexible Server is the biggest step. The instance size is fixed regardless of query volume (within its capacity). You will not see the DB bill creep up gradually β€” it stays flat until you hit a resource ceiling (CPU, RAM, connections, IOPS), then you upgrade and the cost jumps:

Standard_B2ms (launch):   ~$130/month
Standard_D2ds_v4:         ~$180/month  (+$50 jump)
Standard_D4ds_v4:         ~$350/month  (+$170 jump)

Monitor CPU utilization and connection count in Azure Monitor. A sustained CPU average over 70% or connection count over 80% of the instance's max is your signal to resize.

Azure Cache for Redis is the same pattern. Basic C1 (1 GB, ~$55/month) handles the notification queue and rate limiting comfortably until:

The Redis upgrade is predictable: you'll see it coming in the Azure Monitor memory metric.

Storage cost growth β€” the compound risk

Blob Storage costs grow with the total volume of photos stored, not just new uploads. Without lifecycle management, every photo ever taken is permanently billed at the Hot tier rate. This is the one cost that compounds indefinitely and can quietly become significant:

Assumptions:
  Average photos per completed job: 8 (2 general + 3 before + 3 after)
  Average photo size (after sharp compression): 1.5 MB
  Storage per completed job: ~12 MB

At 100 jobs/month:    1.2 GB new storage/month  β†’  year 1 total: ~14 GB   β†’  ~$0.25/month
At 500 jobs/month:    6 GB new storage/month    β†’  year 1 total: ~72 GB   β†’  ~$1.30/month
At 2,000 jobs/month:  24 GB new storage/month   β†’  year 1 total: ~290 GB  β†’  ~$5.20/month

These numbers look low because lifecycle management is doing its job β€” photos are moving to Cool ($0.010/GB) and Cold ($0.0036/GB) tiers as they age, so the average cost per GB across the whole corpus is well below the Hot rate. Without lifecycle management, the same 290 GB would cost ~$5.22/month at Hot rate regardless β€” not catastrophic, but it grows every month and never comes down.

The more meaningful storage cost at scale is egress, not rest. If customers regularly browse job history with photos:

1,000 monthly active users each loading 10 photos/session at 1.5 MB each:
= 15 GB egress/month from Blob Storage origin

Front Door caches aggressively, so assume 20% cache miss rate:
= 3 GB actual origin egress billed  β†’  ~$0.024/month
+ Front Door CDN egress to users:
= 15 GB Γ— $0.008/GB  β†’  ~$0.12/month

Front Door's CDN makes photo egress economics very favorable. The cost only becomes notable at tens of thousands of active users.

Summary β€” where to watch the bill

Cost driver When it becomes significant Monitoring signal
Container Apps compute Early growth β€” first scaling event Replica count in Azure Monitor
Front Door egress Mid-scale β€” thousands of daily active users GB transferred in Front Door metrics
PostgreSQL instance resize When you hit CPU/connection ceiling DB CPU avg > 70%, connections > 80% of max
Blob Storage accumulation Gradual β€” managed by lifecycle policy Total stored GB in Storage metrics
Redis tier upgrade When HA or memory > 800 MB Redis memory percentage used
Application Insights ingestion High traffic with verbose logging Daily ingestion GB in App Insights

The two costs to set Azure Monitor alerts on from day one:

  1. Container Apps replica count β€” if you're hitting max replicas (10), you need to either increase the max or investigate why traffic is that high (could be a bot/abuse scenario)
  2. PostgreSQL CPU β€” sustained high CPU on the database is the most common chokepoint and the most expensive fix if caught late