Cloud Infrastructure Strategy β Microsoft Azure
This document covers the target production architecture for Evergrn on Azure β from a single local instance to a scalable, cloud-hosted deployment. It addresses load balancing, containerization, database concurrency, connection pooling, and the code changes required before launch.
Dev Environment β Currently Deployed
Resource group: evergrn-dev | Region: Canada Central (App Service), East US 2 (PostgreSQL)
| Component | Azure Resource | Details | Status |
|---|---|---|---|
| Source control | Azure DevOps repo | https://dev.azure.com/evergrn β all code in single private repo |
β Live |
| Coming soon page | Azure Static Web Apps (Free) | www.evergrn.co β green-flower-03302980f.7.azurestaticapps.net |
β Live |
| API | Azure App Service (B1 Basic) | evergrn-api-dev.azurewebsites.net (canadacentral) |
β Deployed |
| Database | Azure Database for PostgreSQL Flexible Server | evergrn-db.postgres.database.azure.com β Burstable B1ms, East US 2 |
β Live |
| File storage | Azure Blob Storage (Cool tier) | evergrnuploads β containers: jobs, logos, headshots, id-docs, addresses |
β Live |
| Observability | Azure Application Insights | evergrn-api-insights (canadacentral) β auto-instruments Express, Prisma, Stripe |
β Live |
| Secrets | Azure Key Vault (Standard, RBAC) | evergrn-vault-dev β DATABASE-URL, JWT-SECRET, STRIPE-*, AZURE-STORAGE-CONNECTION-STRING, APPLICATIONINSIGHTS-CONNECTION-STRING |
β Live |
| Azure Communication Services | evergrn-comms (East US) β custom domain noreply@evergrn.co verified with DKIM/SPF; ACS_CONNECTION_STRING + ACS_SENDER set as App Service env vars |
β Live | |
| Pkg storage | Azure Blob Storage (evergrnpkgstore) |
deployments/ container β App Service runs from timestamped zip blob |
β Live |
| Document Intelligence | Azure AI Document Intelligence | evergrn-doc-intel-46238.cognitiveservices.azure.com (East US 2) β driver's license OCR (prebuilt-idDocument) + insurance certificate OCR (prebuilt-read); DOC_INTEL_ENDPOINT + DOC_INTEL_KEY env vars |
β Live |
| CI/CD pipeline | Azure DevOps Pipeline (Static Web Apps) | Auto-deploys coming-soon/ on push to main |
β Live |
Dev API endpoint
https://evergrn-api-dev-c7dxhkf3ctcgdqby.canadacentral-01.azurewebsites.net
Dev App Service config
- Plan: B1 Basic (~$13/month), Linux, Node 22 LTS
- Deployment method: Run from Package β
WEBSITE_RUN_FROM_PACKAGEpoints to a zip blob inevergrnpkgstore. To redeploy:.\deploy.ps1 - NODE_ENV:
development - GodMode (
/godmoderoute,/adminAPI routes): enabled (dev only)
What's NOT yet in dev Azure (still local or not built)
| Concern | Current | Next step |
|---|---|---|
| Web frontend | Vite dev server (localhost:5173) | Included in staging/prod deploys via client/dist/ in zip |
| Notification queue | In-memory Map | Azure Cache for Redis (pre-launch requirement) |
| Rate limit store | In-memory (per-instance) | Same Redis instance as notification queue |
| iOS app | EAS build pending Apple Developer approval | EAS submit after account approved |
| DMARC record | Not configured for evergrn.co | Add _dmarc.evergrn.co TXT record (low priority) |
Staging Environment β Currently Deployed
Resource group: evergrn-stage | Region: Canada Central (App Service), East US 2 (PostgreSQL)
Architecture (as of 2026-06-28): Web/API Split
The staging environment is split into two separate services. This was done to allow the web UI to be protected by Entra Conditional Access (MFA) without interfering with the iOS app's direct API access.
| Component | Azure Resource | URL | Status |
|---|---|---|---|
| Web frontend | Azure Static Web App (Standard) evergrn-web-staging |
web.staging.evergrn.co |
β Live |
| API | Azure App Service (B1 Basic) evergrn-api-stage |
staging.evergrn.co |
β Live |
| Database | Azure PostgreSQL Flexible Server evergrn-db-stage |
evergrn-db-stage.postgres.database.azure.com β Standard_B1ms |
β Live |
| File storage | Azure Blob Storage | Shares evergrnuploads with dev |
β Shared |
| Azure Communication Services | Shares evergrn-comms with dev |
β Shared | |
| Pkg storage | Azure Blob Storage (evergrnpkgstore) |
Shares deployments/ with dev |
β Shared |
| SSL | Azure Managed Certificate + SWA cert | Both domains HTTPS | β Live |
| CI/CD (API) | Azure DevOps Pipeline deploy-staging (ID: 2) |
Manual trigger + approval gate | β Live |
Access control
Web frontend (web.staging.evergrn.co) β Protected by Entra Conditional Access:
- CA policy:
Require MFA - Evergrn Staging(enabled 2026-06-28) - Targets Entra app:
Evergrn Staging Access(appId:eb7e1feb-8290-4d6d-ac20-5f71d89da306) - Entra Security Defaults disabled (required for CA policies to work)
- Entra ID P1 license assigned to
kdavis@evergrn.co - Auth provider configured in
client/public/staticwebapp.config.json(custom OIDC β Entra) - SWA app settings:
ENTRA_CLIENT_ID,ENTRA_CLIENT_SECRET
API (staging.evergrn.co) β Open (no IP restriction, no Easy Auth):
- Easy Auth remains disabled on the App Service β B1 tier causes VNETFailure when the auth sidecar tries to set up VNet
- The iOS app authenticates via the Express API's own JWT flow; CA policy does not apply to it
- Access control is handled at the web layer only
Web frontend (SWA) config
- SKU: Standard ($9/month) β required for custom OIDC auth provider
- Default hostname:
red-dune-043d1ce0f.7.azurestaticapps.net - Custom domain:
web.staging.evergrn.co(CNAME in GoDaddy β SWA default hostname) - Auth: custom OIDC provider pointing to
https://login.microsoftonline.com/93a4d22f-b942-4a6d-a3e6-be71843021a3/v2.0 - All routes require
authenticatedrole β unauthenticated users redirected to/.auth/login/evergrn navigationFallbackservesindex.htmlfor React Router routes; API paths excluded from fallback- Fetch interceptor in
client/src/main.jsxprependsVITE_API_BASE=https://staging.evergrn.coto all relative API calls
To deploy the web frontend (SWA)
# Build with staging mode (sets VITE_API_BASE=https://staging.evergrn.co)
Set-Location c:\Repos\evergrn\client
npx vite build --mode staging
# Deploy to SWA
npx @azure/static-web-apps-cli deploy .\dist `
--deployment-token "<token from SWA secrets>" `
--env production
Deployment token: retrieve with az staticwebapp secrets list --name evergrn-web-staging --resource-group evergrn-stage --query "properties.apiKey" -o tsv
To release to staging (API)
- Push changes to
mainin ADO - Go to
https://dev.azure.com/evergrn/evergrn/_build?definitionId=2 - Click Run pipeline β approve the manual gate when prompted
Staging App Service config
- Plan: B1 Basic (~$13/month), Linux, Node 22 LTS
- Deployment method: Run from Package β same
evergrnpkgstoreas dev - NODE_ENV:
staging - GodMode: disabled β
/adminAPI routes not registered,/godmodeReact route suppressed byimport.meta.env.DEV - CORS allowlist includes
web.staging.evergrn.coand the SWA default hostname
Staging database
- App user:
evergrnstage(no DDL; SELECT/INSERT/UPDATE/DELETE only) - Admin user:
evergrn_admin/3vergrn!(DDL β for migrations) - Host:
evergrn-db-stage.postgres.database.azure.com - All migrations applied; isolated from dev data
Current State vs. Target State
| Concern | Current (dev) | Target (Azure production) |
|---|---|---|
| API server | Azure App Service B1 (evergrn-api-dev) β
|
Azure Container Apps (multiple containerized instances) |
| Frontend | Vite dev server on port 5173 (local) | Static build in Azure Blob Storage, served via Azure Front Door |
| Load balancing | None | Azure Front Door (global CDN + load balancer + WAF in one) |
| Database | Azure PostgreSQL Flexible Server (evergrn-db, East US) β
|
Same service, zone-redundant HA, PgBouncer on port 6432 |
| DB connections | App Service β PostgreSQL direct (port 5432) | Container Apps β built-in PgBouncer β PostgreSQL Flexible Server |
| File uploads | Azure Blob Storage (evergrnuploads, Cool tier) β
|
Same β add Front Door CDN in front |
| Notification queue | In-memory Map (per-instance, lost on restart) | Azure Cache for Redis |
| Secrets | Azure Key Vault (evergrn-vault-dev) + Managed Identity β
|
Same pattern β Key Vault + Managed Identity |
| TLS/HTTPS | App Service managed cert (HTTPS) β | Terminated at Azure Front Door β all traffic HTTPS |
| Payout cron | Not yet built | Azure Functions (Timer Trigger) β runs once daily at 8 AM |
| CI/CD | Azure DevOps: Pipeline ID 2 (API deploy) + Pipeline ID 4 (SWA web deploy) β | Full pipeline: API container build + push + frontend deploy |
| Monitoring | Azure Application Insights (evergrn-api-insights) β
|
Same β add custom alerts and dashboards |
Target Architecture
ββββββββββββββββββββββββββββββββββββββββββ
β Azure DNS β
β evergrn.co β Azure Front Door β
βββββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Azure Front Door (Premium) β
β β
β β’ Global anycast routing (150+ edge nodes) β
β β’ SSL/TLS termination (free managed certs) β
β β’ CDN caching for static assets β
β β’ Web Application Firewall (WAF) β
β β’ Path-based routing rules β
β β’ Health probes to origin groups β
ββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ
β β
/api/* routes β β /* all other routes
β β
βΌ βΌ
ββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β Azure Container β β Azure Blob Storage β
β Apps Environment β β (Static Website hosting) β
β β β β
β ββββββ ββββββ βββββββ β client/dist/ build output β
β βAPI β βAPI β βAPI ββ β index.html + assets β
β β #1 β β #2 β β #N ββ βββββββββββββββββββββββββββββββ
β ββββ¬ββ ββββ¬ββ ββββ¬βββ
β βββββββββ΄βββββββ β
β Auto-scaling β
ββββββββββββ¬βββββββββββββ
β
βββββββββββββββββΌβββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β Azure Database β β Azure Cache β β Azure Blob Storage β
β for PostgreSQL β β for Redis β β (uploads container) β
β Flexible Serverβ β β β β
β β β β’ Notif queue β β jobs/{jobId}-*.jpg β
β Built-in β β β’ Rate limit β β logos/{providerId} β
β PgBouncer β β β state β β β
β β βββββββββββββββββ β Served via Front Doorβ
β Zone-redundant β ββββββββββββββββββββββββ
β HA (2 AZs) β
ββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββ
β Supporting Services β
β β
β Azure Container Registry (ACR) β
β β stores Docker images β
β β
β Azure Key Vault β
β β DATABASE_URL, JWT_SECRET, STRIPE β
β β
β Azure Managed Identity β
β β Container Apps β Key Vault/ACR β
β with no stored credentials β
β β
β Azure Monitor + Application Insights β
β β logs, metrics, alerts, traces β
β β
β Azure Functions (Timer Trigger) β
β β daily 8 AM payout cron β
βββββββββββββββββββββββββββββββββββββββββ
Component Breakdown
1. Azure Front Door β The Front Door Load Balancer
Azure Front Door is Microsoft's global entry point service β it is literally named for the role you described. It combines what would otherwise require multiple services (CDN, load balancer, WAF) into one:
- Global anycast routing: requests are answered from the nearest of 150+ edge locations worldwide, minimizing latency regardless of where the customer is
- SSL/TLS termination: Front Door holds the TLS certificate (Azure-managed, auto-renewed, free). All traffic from clients to Front Door is HTTPS. Traffic from Front Door to Container Apps runs over Microsoft's private backbone inside the Azure network
- Path-based routing: requests to
/api/*are forwarded to the Container Apps origin group; all other requests serve from Blob Storage - Web Application Firewall: blocks OWASP Top 10 attacks (SQLi, XSS, etc.) at the edge before they reach your API
- Health probes: Front Door polls
GET /healthon each Container App replica and stops routing to unhealthy instances automatically - CDN caching: static assets (JS, CSS, images) are cached at edge nodes and served without touching the origin
Tier recommendation: Front Door Standard for MVP (~$35/month base). Upgrade to Premium when WAF policy tuning and advanced routing are needed.
2. Frontend β Azure Blob Storage (Static Website)
The React app builds to static files β plain HTML, JS, and CSS. No container is needed to serve them.
npm run build # produces client/dist/
az storage blob upload-batch \
--source client/dist/ \
--destination '$web' \
--account-name evergnweb
Azure Blob Storage's Static Website feature serves the $web container directly. Front Door sits in front of it and handles HTTPS, caching, and compression.
Single-page app routing: configure a custom error document in Blob Storage static website settings pointing index.html as both the index and the 404 document. This ensures React Router handles all client-side routes instead of getting a 404 from the storage layer.
Why not a container for the frontend? A Container App running Nginx to serve static files costs ~$15-30/month and adds a container to manage. Blob Storage + Front Door costs under $5/month and delivers files from the nearest edge node. Containerize the frontend only if you adopt server-side rendering (Next.js). Not applicable here.
3. API β Azure Container Apps
Azure Container Apps is Azure's serverless container platform β equivalent to AWS Fargate. You define what container to run and how to scale it; Azure manages the underlying infrastructure.
Advantages over AKS for this use case:
- No Kubernetes cluster to manage or pay for when idle
- Built-in KEDA-based auto-scaling (scale to zero in dev/staging, 2βN in production)
- Native integration with Azure Container Registry, Key Vault, and Managed Identity
- Simpler operations for a team that isn't running a dedicated platform team
Container App configuration:
# containerapp.yaml (simplified)
name: evergrn-api
image: evergrnacr.azurecr.io/evergrn-api:latest
resources:
cpu: 0.5
memory: 1Gi
scale:
minReplicas: 2 # always 2 for HA across zones
maxReplicas: 10
rules:
- name: http-scaling
http:
metadata:
concurrentRequests: 50 # scale out when >50 concurrent requests per replica
ingress:
external: false # traffic comes only from Front Door, not directly from internet
targetPort: 3000
Dockerfile (to be created at project root):
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npx prisma generate
EXPOSE 3000
CMD ["node", "server.js"]
Azure Container Registry (ACR) stores built images. CI/CD pushes a new image tag on every merge to main, then triggers a Container Apps revision update.
4. Database β Azure Database for PostgreSQL Flexible Server
Azure's managed PostgreSQL service with two features critical to this architecture:
Zone-redundant high availability: The Flexible Server maintains a synchronous standby in a second Availability Zone. If the primary fails, Azure automatically promotes the standby and updates the connection endpoint β typically within 60-120 seconds, zero data loss (synchronous replication).
Built-in PgBouncer: Flexible Server includes PgBouncer as a managed toggle β no separate service to deploy or maintain. Enable it in the portal or via CLI:
az postgres flexible-server parameter set \
--resource-group evergrn-rg \
--server-name evergrn-db \
--name pgbouncer.enabled \
--value on
Once enabled, your DATABASE_URL points to port 6432 (PgBouncer) instead of 5432 (direct PostgreSQL). PgBouncer multiplexes all Container App connections onto a small fixed pool of real database connections:
10 API containers Γ 10 Prisma connections = 100 client connections β PgBouncer
PgBouncer β 20-30 actual connections to PostgreSQL
Instance sizing:
- MVP launch:
Standard_B2ms(2 vCPU, 8 GB RAM) β ~170 max connections, zone-redundant HA β ~$130/month - Early growth:
Standard_D2ds_v4(2 vCPU, 8 GB RAM, faster storage) β ~$180/month - Scale:
Standard_D4ds_v4(4 vCPU, 16 GB RAM) β upgrade without downtime via portal
Storage: Start at 128 GB with auto-grow enabled up to 16 TB.
Read replica (Phase 2): Add one read replica for provider insights, admin analytics, and GodMode queries. Route all writes to the primary endpoint; route read-heavy queries to the replica endpoint.
Data Layer β Concurrency & Transaction Safety
This is the most critical section. With multiple Container App replicas handling requests simultaneously, several operations in the current codebase are unsafe without proper transaction isolation.
The core problem
Each Container App replica maintains its own Prisma connection pool. Without explicit transactions, two replicas can interleave reads and writes on the same rows:
Replica 1: SELECT job WHERE id=X β status=QUOTED β (ok to accept)
Replica 2: SELECT job WHERE id=X β status=QUOTED β (ok to accept)
Replica 1: UPDATE job SET status=ACCEPTED β both succeed
Replica 2: UPDATE job SET status=ACCEPTED β job double-accepted
Solution: Prisma transactions + SELECT FOR UPDATE
PostgreSQL's SELECT FOR UPDATE acquires a row-level lock. Any other transaction attempting the same lock on that row blocks until the first transaction commits or rolls back. Combined with prisma.$transaction(), critical sections become atomic.
Operations that require transaction protection (to be fixed before launch):
Quote acceptance β POST /quotes/:id/accept
// Current (unsafe β race condition with multiple replicas):
const quote = await prisma.quote.findUnique({ where: { id } })
if (job.status !== 'QUOTED') throw error
await prisma.quote.update(...)
await prisma.quote.updateMany(...)
await prisma.job.update(...)
// Required (safe β row lock prevents concurrent acceptance):
await prisma.$transaction(async (tx) => {
// Lock the job row β all other concurrent transactions block here until this commits
const [job] = await tx.$queryRaw`
SELECT * FROM "Job" WHERE id = ${jobId} FOR UPDATE
`
if (job.status !== 'QUOTED') throw new Error('Job no longer available')
await tx.quote.update({ where: { id }, data: { status: 'ACCEPTED' } })
await tx.quote.updateMany({
where: { jobId, id: { not: id } },
data: { status: 'REJECTED' }
})
await tx.job.update({ where: { id: jobId }, data: { status: 'ACCEPTED', providerId } })
})
Job status transitions β PATCH /jobs/:id/status
Two replicas must not both advance the same job status simultaneously:
await prisma.$transaction(async (tx) => {
const [job] = await tx.$queryRaw`
SELECT * FROM "Job" WHERE id = ${id} FOR UPDATE
`
if (!validTransition(job.status, newStatus)) throw new Error('Invalid transition')
await tx.job.update({ where: { id }, data: { status: newStatus } })
})
Payout cron β payment capture
The Azure Function Timer Trigger must be idempotent β if it crashes and restarts mid-run, it must not double-capture. Lock the payment row before capturing:
await prisma.$transaction(async (tx) => {
const [payment] = await tx.$queryRaw`
SELECT * FROM "Payment"
WHERE id = ${id} AND status = 'HELD'
FOR UPDATE
`
if (!payment) return // another run already processed this
await tx.payment.update({ where: { id }, data: { status: 'PROCESSING' } })
// Stripe capture happens outside the transaction (external HTTP call)
})
await stripe.paymentIntents.capture(paymentIntentId)
await prisma.payment.update({ where: { id }, data: { status: 'PAID' } })
PgBouncer mode compatibility
Azure's built-in PgBouncer runs in transaction pooling mode by default β a connection is returned to the pool after each transaction, not held for the lifetime of the client. This is compatible with Prisma and the SELECT FOR UPDATE patterns above, because each FOR UPDATE lock lives and dies inside a single $transaction() call. The lock does not need to persist across multiple roundtrips, so transaction pooling works correctly.
Notification Queue β Azure Cache for Redis
The current notificationQueue.js uses an in-memory Map. With multiple Container App replicas this breaks: if Replica 1 enqueues a notification and Replica 2 handles the poll from that user, Replica 2's Map is empty and the notification is never delivered.
Replace with Azure Cache for Redis:
// src/config/notificationQueue.js β updated for production
const { createClient } = require('redis')
const redis = createClient({ url: process.env.REDIS_URL })
await redis.connect()
async function enqueue(role, userId, notification) {
// TTL of 1 hour β notification expires if never polled
await redis.setEx(`notif:${role}:${userId}`, 3600, JSON.stringify(notification))
}
async function dequeue(role, userId) {
// GETDEL is atomic β read and delete in one operation
// Two concurrent polls cannot both receive the same notification
const val = await redis.getDel(`notif:${role}:${userId}`)
return val ? JSON.parse(val) : null
}
Redis also replaces the express-rate-limit in-memory store β rate limit counters must be shared across all replicas or each replica has its own independent counter, defeating the purpose.
Azure Cache for Redis tier: Basic C0 (250 MB) β ~$17/month. Sufficient for notification queue and rate limiting at MVP scale. Upgrade to Standard C1 (replicated, ~$110/month) for production HA when needed.
File Storage β Azure Blob Storage + Lifecycle Management
The uploads/ directory is local to each container replica. When Azure Container Apps replaces a replica (deploy, crash, scale-in), uploaded files are lost. With multiple replicas, a file uploaded to Replica 1 is invisible to Replica 2.
Storage container structure
Two Blob Storage containers, each with its own access and lifecycle policy:
evergrn-assets (Storage Account)
β
βββ jobs/ β all job photos (before, after, general)
β βββ {jobId}-{timestamp}.jpg
β
βββ logos/ β provider logos (permanent, always Hot)
βββ {providerId}.jpg
Files are served at https://cdn.evergrn.co/jobs/... β Front Door caches them at edge nodes. Blob Storage is set to private; all access goes through Front Door, never directly to the storage URL.
Access control: Container Apps reads/writes using a Managed Identity with Storage Blob Data Contributor role β no storage account keys stored anywhere.
Code migration
Replace Multer disk storage with direct Azure Blob SDK streaming β files go to Blob Storage and never touch the container filesystem:
// src/middleware/upload.js
const { BlobServiceClient } = require('@azure/storage-blob')
const { DefaultAzureCredential } = require('@azure/identity')
const multer = require('multer')
const blobService = new BlobServiceClient(
`https://${process.env.STORAGE_ACCOUNT}.blob.core.windows.net`,
new DefaultAzureCredential() // uses Container App's Managed Identity β no keys
)
// In-memory storage (multer holds file in buffer, we push to Blob)
const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 10 * 1024 * 1024 } })
async function uploadToBlob(container, blobName, buffer, mimetype) {
const client = blobService.getContainerClient(container).getBlockBlobClient(blobName)
await client.uploadData(buffer, { blobHTTPHeaders: { blobContentType: mimetype } })
return `https://cdn.evergrn.co/${container}/${blobName}`
}
module.exports = { upload, uploadToBlob }
Blob lifecycle management policy
Azure Blob Storage lifecycle rules automatically move blobs between access tiers on a schedule. There is no charge for the policy itself β cost differences come from the tier rates and retrieval fees described below.
Access tiers and what they mean:
| Tier | Storage cost (per GB/month) | Retrieval cost | Access speed | Use for |
|---|---|---|---|---|
| Hot | $0.018 | Free | Instant | Active, frequently accessed files |
| Cool | $0.010 | $0.01/GB | Instant | Infrequent access, files > 30 days old |
| Cold | $0.0036 | $0.02/GB | Instant | Rare access, files > 90 days old |
| Archive | $0.00099 | $0.02/GB + rehydration | Hours (rehydration) | Compliance hold only |
Lifecycle policy for job photos (jobs/ prefix):
{
"rules": [
{
"name": "job-photos-lifecycle",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["jobs/"]
},
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 90 },
"tierToCold": { "daysAfterModificationGreaterThan": 365 },
"tierToArchive": { "daysAfterModificationGreaterThan": 730 },
"delete": { "daysAfterModificationGreaterThan": 1095 }
}
}
}
},
{
"name": "logos-permanent",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["logos/"]
},
"actions": {
"baseBlob": {}
}
}
}
]
}
Why these thresholds:
- Day 0β90 (Hot): Covers the 72-hour dispute window, the review period, and any near-term job history the customer or provider checks. Instant access, no retrieval fee. Photos are actively useful in this window.
- Day 90β365 (Cool): Occasional lookups β a customer checking last summer's lawn job. Still instant access but slightly cheaper storage. Retrieval fee of $0.01/GB is negligible for occasional lookups.
- Day 365β730 (Cold): Files are over a year old. Accessed almost never β possibly if a dispute resurfaces during a legal matter. Storage is 80% cheaper than Hot.
- Day 730β1095 (Archive): Two-year-old files in a legal/compliance hold only. Rehydration takes hours so this is not suitable for any user-facing access. Keep for audit trail.
- Day 1095+ (Delete): Three years. No reasonable business need to retain job photos beyond this. Deleting keeps the storage cost from compounding indefinitely.
Apply the policy via Azure CLI:
az storage account management-policy create \
--account-name evergnassets \
--resource-group evergrn-rg \
--policy @lifecycle-policy.json
Front Door caching for photos
Front Door caches photos at edge nodes after the first request. Subsequent requests for the same photo are served from cache β no round-trip to Blob Storage and no retrieval fee. Set long cache TTLs on job photos since they never change after upload:
Cache-Control: public, max-age=31536000, immutable
This means a before-photo uploaded today is cached at the edge nearest each user for up to a year. The first request costs an origin read; every request after that is free from cache until the CDN TTL expires.
Secrets β Azure Key Vault + Managed Identity
Azure Key Vault stores all secrets. Azure Managed Identity gives the Container Apps service a cryptographically verified Azure identity β no passwords, no keys, no .env files in the container image.
# Store secrets in Key Vault
az keyvault secret set --vault-name evergrn-kv --name DATABASE-URL --value "..."
az keyvault secret set --vault-name evergrn-kv --name JWT-SECRET --value "..."
az keyvault secret set --vault-name evergrn-kv --name STRIPE-SECRET --value "..."
az keyvault secret set --vault-name evergrn-kv --name REDIS-URL --value "..."
az keyvault secret set --vault-name evergrn-kv --name STRIPE-WEBHOOK-SECRET --value "..."
# Grant Container App's Managed Identity access
az keyvault set-policy --name evergrn-kv \
--object-id <container-app-managed-identity-id> \
--secret-permissions get list
At runtime, Container Apps fetches secrets from Key Vault and injects them as environment variables. No human ever handles the raw values after initial setup.
Payout Cron β Azure Functions (Timer Trigger)
Rather than running the cron inside an API container (which requires one designated replica to handle it, complicating scaling), use an Azure Function with a Timer Trigger. Azure Functions are purpose-built for scheduled jobs β they run on-demand, cost nothing when idle, and have no container to keep running.
// src/functions/payoutCron.js
const { app } = require('@azure/functions')
app.timer('dailyPayout', {
schedule: '0 8 * * *', // 8:00 AM daily (server time β use UTC and account for offset)
handler: async (myTimer, context) => {
const cutoff = new Date()
cutoff.setHours(8, 0, 0, 0) // today at 08:00:00
const eligiblePayments = await prisma.payment.findMany({
where: { status: 'HELD', releaseAt: { lte: cutoff } },
include: { job: { include: { report: true, provider: true } } }
})
for (const payment of eligiblePayments) {
if (payment.job.report) {
// Dispute filed β block payout, flag for admin
await prisma.payment.update({
where: { id: payment.id },
data: { status: 'DISPUTED' }
})
continue
}
// Lock, capture, create ProviderPayout ledger entry
await captureAndRecordPayout(payment, cutoff)
}
}
})
Azure Functions runs in the same Resource Group and VNet as Container Apps, so it has private access to the database (no public internet exposure).
CI/CD β Azure DevOps Pipelines
Every merge to main triggers the pipeline:
# azure-pipelines.yml
trigger:
branches:
include: [main]
stages:
- stage: BuildAPI
jobs:
- job: Docker
steps:
- task: Docker@2
inputs:
containerRegistry: evergrn-acr
repository: evergrn-api
command: buildAndPush
tags: $(Build.BuildId)
- task: AzureContainerApps@1
inputs:
azureSubscription: evergrn-service-connection
containerAppName: evergrn-api
imageToDeploy: evergrnacr.azurecr.io/evergrn-api:$(Build.BuildId)
- stage: BuildFrontend
jobs:
- job: StaticSite
steps:
- script: cd client && npm ci && npm run build
- task: AzureFileCopy@4
inputs:
SourcePath: client/dist/
Destination: AzureBlob
storage: evergrnweb
ContainerName: $web
- task: AzureCLI@2
inputs:
inlineScript: |
az afd endpoint purge \
--profile-name evergrn-fd \
--endpoint-name evergrn \
--content-paths "/*"
Container Apps performs a rolling update β new revisions receive traffic incrementally after passing health checks. Old revisions are decommissioned. Zero downtime during deploys.
Monitoring β Azure Monitor + Application Insights
Add the Application Insights SDK to the Node.js API:
// server.js β add at the very top, before all other requires
const appInsights = require('applicationinsights')
appInsights.setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
.setAutoCollectRequests(true)
.setAutoCollectExceptions(true)
.setAutoCollectDependencies(true) // automatically traces PostgreSQL queries
.start()
This provides out-of-the-box:
- Request rate, response times, error rate per endpoint
- Full distributed tracing (API β PostgreSQL β external calls)
- Exception tracking with stack traces
- Custom alerts: API error rate > 1%, DB CPU > 80%, Redis memory > 80%
Migration Phases
Phase 1 β Launch
Minimum viable Azure setup. Single region (East US 2 recommended for East Coast rural focus), zone-redundant database, 2 Container App replicas.
| Component | Azure Service | Est. Monthly Cost |
|---|---|---|
| Front door + CDN | Azure Front Door Standard | ~$35 |
| API containers | Container Apps (2 replicas, 0.5 vCPU / 1 GB) | ~$30 |
| Container registry | Azure Container Registry Basic | ~$5 |
| Database | PostgreSQL Flexible Server Standard_B2ms, zone-redundant HA | ~$130 |
| Redis | Azure Cache for Redis Basic C0 (250 MB) | ~$17 |
| File storage | Azure Blob Storage + Front Door | ~$5β15 |
| Secrets | Azure Key Vault | ~$5 |
| Functions | Azure Functions Consumption Plan (payout cron) | ~$1 |
| Monitoring | Application Insights (5 GB/month free tier) | ~$0β10 |
| DNS | Azure DNS | ~$1 |
| Total | ~$265β290/month |
Phase 2 β Early Growth
- Enable Container Apps auto-scaling (2β10 replicas based on concurrent HTTP requests)
- Add PostgreSQL read replica; route provider insights and admin queries to replica
- Upgrade Redis to Standard C1 (replicated for HA, no data loss on failover)
- Upgrade Front Door to Premium tier for managed WAF rule sets
- Add Azure Monitor alerts: replicas at max count, DB failover events, Redis evictions
- Upgrade Container App replicas from burstable to dedicated vCPU for predictable latency
Phase 3 β Scale
- Migrate PostgreSQL from Flexible Server to Azure Cosmos DB for PostgreSQL (formerly Citus) β distributed PostgreSQL that shards data across nodes for horizontal write scaling. The Prisma connection string is the only change required in the application code.
- Add secondary Azure region (e.g., West US 2) with Front Door routing β active/passive failover
- Move payout cron to an Azure Durable Function for reliable long-running orchestration with built-in retry and checkpoint logic
Code Changes Required Before Go-Live
See current-documentation/go-live-checklist.md for the authoritative status of all launch blockers.
What Does NOT Need to Change
- Prisma ORM β works identically against local PostgreSQL and Azure Database for PostgreSQL. Only
DATABASE_URLchanges (to PgBouncer port 6432 instead of 5432 direct). - JWT auth strategy β stateless JWTs are already correct for horizontal scaling across replicas. No session store needed.
- Stripe integration β all Stripe API calls are stateless. No changes required. Webhook signature verification is already implemented.
- React Native mobile app β connects to the API via HTTPS. Only
BASE_URLinapi.jschanges (from the localhost.run tunnel tohttps://api.evergrn.co). - Expo push notifications β
expo-server-sdkcalls Expo's external servers. No infrastructure change needed. - Password hashing, bcrypt rounds β unchanged.
Cost Scaling Analysis β Flat Rate vs. Consumption
Not all Azure services scale the same way. Understanding which costs are fixed, which grow linearly with users, and which grow with data volume is essential for financial planning and avoiding surprise bills.
Service-by-service breakdown
| Service | Pricing model | What drives growth | Growth curve |
|---|---|---|---|
| Azure Front Door Standard | Base fee + per GB egress + per 10k requests | Traffic volume, data served | Linear with users |
| Azure Container Apps | Per vCPU-second + per GiB-second | Concurrent active users | Linear with traffic |
| PostgreSQL Flexible Server | Flat per instance size | Nothing β until you resize | Step-function |
| Azure Cache for Redis | Flat per cache tier | Nothing β until you upgrade tier | Step-function |
| Azure Blob Storage (storage) | Per GB stored | Total photos accumulated over time | Linear with jobs |
| Azure Blob Storage (egress) | Per GB served to internet | Photo views by customers and providers | Linear with activity |
| Azure Container Registry | Flat (Basic $5/month) | Nothing at this scale | Flat |
| Azure Key Vault | Per 10k operations | Negligible at any scale | Effectively flat |
| Azure Functions (payout cron) | Per execution (first 1M free) | Nothing β 31 executions/month | Free indefinitely |
| Azure DNS | Per zone + per million queries | Negligible | Effectively flat |
| Application Insights | First 5 GB/month free, then per GB | Request volume, log verbosity | Linear with traffic |
The two cost growth patterns
Pattern 1 β Linear with traffic (the variable costs)
These services charge for actual consumption and scale directly with userbase size:
Azure Container Apps is the clearest signal. It charges per vCPU-second and per GiB-second of actual runtime across all replicas. As auto-scaling spins up more replicas to handle concurrent users, the bill grows proportionally. This is the most honest cost signal you have β if Container Apps spend doubles, your traffic doubled. Budget roughly:
~$0.04 per vCPU-hour per replica
2 replicas minimum: ~$30/month base
10 replicas at peak: ~$145/month
Azure Front Door charges for data transferred out to end users. Every API response and every photo served through Front Door costs:
$0.008/GBfor the first 10 TB/month$0.006/GBafter that
API responses are small (kilobytes of JSON). The meaningful egress driver is photos β every time a customer views their job history, or a provider's logo loads, that's data out. Front Door's CDN cache significantly reduces this: a photo cached at the edge is served from cache for free after the first fetch. For photo-heavy pages, expect most reads to hit cache within the first week of a job's existence.
Application Insights ingestion grows with request volume. The free 5 GB/month covers a comfortable number of requests. At scale, set a daily ingestion cap to prevent runaway logging from an unexpected traffic spike from bloating the bill.
Pattern 2 β Step-function (the fixed costs that jump on upgrade)
These services have a flat monthly rate until you outgrow the tier, at which point you upgrade and the cost jumps to the next tier's flat rate:
PostgreSQL Flexible Server is the biggest step. The instance size is fixed regardless of query volume (within its capacity). You will not see the DB bill creep up gradually β it stays flat until you hit a resource ceiling (CPU, RAM, connections, IOPS), then you upgrade and the cost jumps:
Standard_B2ms (launch): ~$130/month
Standard_D2ds_v4: ~$180/month (+$50 jump)
Standard_D4ds_v4: ~$350/month (+$170 jump)
Monitor CPU utilization and connection count in Azure Monitor. A sustained CPU average over 70% or connection count over 80% of the instance's max is your signal to resize.
Azure Cache for Redis is the same pattern. Basic C1 (1 GB, ~$55/month) handles the notification queue and rate limiting comfortably until:
- The 1 GB memory limit is approached (rate limit keys + session data accumulate)
- You need HA (failover) β that requires Standard tier (~$110/month)
The Redis upgrade is predictable: you'll see it coming in the Azure Monitor memory metric.
Storage cost growth β the compound risk
Blob Storage costs grow with the total volume of photos stored, not just new uploads. Without lifecycle management, every photo ever taken is permanently billed at the Hot tier rate. This is the one cost that compounds indefinitely and can quietly become significant:
Assumptions:
Average photos per completed job: 8 (2 general + 3 before + 3 after)
Average photo size (after sharp compression): 1.5 MB
Storage per completed job: ~12 MB
At 100 jobs/month: 1.2 GB new storage/month β year 1 total: ~14 GB β ~$0.25/month
At 500 jobs/month: 6 GB new storage/month β year 1 total: ~72 GB β ~$1.30/month
At 2,000 jobs/month: 24 GB new storage/month β year 1 total: ~290 GB β ~$5.20/month
These numbers look low because lifecycle management is doing its job β photos are moving to Cool ($0.010/GB) and Cold ($0.0036/GB) tiers as they age, so the average cost per GB across the whole corpus is well below the Hot rate. Without lifecycle management, the same 290 GB would cost ~$5.22/month at Hot rate regardless β not catastrophic, but it grows every month and never comes down.
The more meaningful storage cost at scale is egress, not rest. If customers regularly browse job history with photos:
1,000 monthly active users each loading 10 photos/session at 1.5 MB each:
= 15 GB egress/month from Blob Storage origin
Front Door caches aggressively, so assume 20% cache miss rate:
= 3 GB actual origin egress billed β ~$0.024/month
+ Front Door CDN egress to users:
= 15 GB Γ $0.008/GB β ~$0.12/month
Front Door's CDN makes photo egress economics very favorable. The cost only becomes notable at tens of thousands of active users.
Summary β where to watch the bill
| Cost driver | When it becomes significant | Monitoring signal |
|---|---|---|
| Container Apps compute | Early growth β first scaling event | Replica count in Azure Monitor |
| Front Door egress | Mid-scale β thousands of daily active users | GB transferred in Front Door metrics |
| PostgreSQL instance resize | When you hit CPU/connection ceiling | DB CPU avg > 70%, connections > 80% of max |
| Blob Storage accumulation | Gradual β managed by lifecycle policy | Total stored GB in Storage metrics |
| Redis tier upgrade | When HA or memory > 800 MB | Redis memory percentage used |
| Application Insights ingestion | High traffic with verbose logging | Daily ingestion GB in App Insights |
The two costs to set Azure Monitor alerts on from day one:
- Container Apps replica count β if you're hitting max replicas (10), you need to either increase the max or investigate why traffic is that high (could be a bot/abuse scenario)
- PostgreSQL CPU β sustained high CPU on the database is the most common chokepoint and the most expensive fix if caught late