Every capability on this page is included on every plan. We don't
tier-gate diagnostics, backups, observability, or compliance — the
product is the product. The plan tier picks resource size, not
feature set.
CRITICALfinding-7a3c2 · 14:32:18 UTC
Cart query reaching 14s p99 in checkout
Evidence
trace 4e9a82c: browser → nginx → php-fpm → mysql
slowest span: SELECT * FROM cart_items WHERE cart_id = ?
19,840 rows · 14.2s · no index used
1. Add index: cart_items (cart_id, created_at)
2. Eager-load product.images to drop N+1
Expected p99: 14s → ~120ms
Diagnostics
Detect, diagnose, report, recommend, contain. Five steps, in that order.
Other hosts kill first and explain never. We diagnose first, capture
evidence, and contain only when there's no other option.
Diagnostic reports with finding ID, evidence, code location, and
recommended fix — delivered via dashboard, email, API, and MCP.
Per-tenant code index traces SQL queries back to the function that
issued them.
Catch the bug before it ships. Pre-deploy static
analysis runs in the deploy pipeline — the same code index flags N+1
query patterns before the code reaches production, not after it slows
down checkout.
Our agent reads three surfaces at once: telemetry, your code, and the
live runtime state.
Want us to ship the fix? Optional code-patch service — every patch
goes through human review before merge.
WARNINGfinding-9b1f4 · 02:15:44 UTC
PHP Fatal in Stripe webhook handler
Evidence
trace 8c7e3a1: stripe webhook → handler::process
exception: Undefined index "payment_intent"
occurred: 18 times in last 24h · 0.4% of webhooks
side effect: 7 orders left in "awaiting_capture"
1. Guard with array_key_exists() before access
2. Verify Stripe signature before parsing payload
3. Replay the 18 affected events from the audit trail
Included on every plan.
Observability
100% trace capture. No sampling. Per-tenant cost attribution from day
one.
ClickHouse-backed unified store: metrics, logs, traces, business events,
and cost data all SQL-queryable in one place.
Every Temporal workflow logs to the store for process mining.
Per-service cost attribution — every byte tagged to a tenant and a
workload.
Agents query the telemetry store with SQL — same surface our ops team
uses.
12-month audit log retention per PCI Req 10.
Included on every plan.
Self-Healing
A restart destroys the evidence. So we diagnose first — capture the
forensics, find the root cause, then take the least destructive action —
and we hold both your application and the platform underneath it to that
rule. The difference is who pulls the trigger: on your app, you do; on our
own infrastructure, we do.
Most hosts kill first and explain never. You get a ticket that says "high
CPU detected, we killed your pod" — no diagnosis, no root cause, no record
of what went wrong. We do the opposite: nothing is restarted until the
diagnostic state is captured and written down.
Your application: you hold the switch.
When something in your app degrades, we diagnose it and tell you exactly
what we'd do to fix it. Whether we actually do it is your call — with one
exception.
Security detection is always on. Threat detection
cannot be disabled. Reverse shells, container escapes, crypto mining: we
see them on every tenant, every container, no opt-in. The response
starts human-in-the-loop and graduates to automatic only for patterns
proven safe over time.
Everything else is opt-in, off by default. Restarting
PHP-FPM, killing a stuck MySQL query, pausing a runaway cron, clearing a
cache — business-logic fixes that bridge the gap until the real fix
ships. You enable them one action type at a time, or leave them all off
and we just tell you what we'd do.
The platform underneath: it heals itself to the same standard.
The infrastructure your app runs on — the proxies in the request path, the
databases, the deploy engine — follows the same diagnose-first discipline.
Because it's our infrastructure, we act on it without waiting for you.
Least destructive first:
Cancel the offending work — kill the runaway query,
cancel the stuck task, drop the bad connection. The cheapest fix, and it
resolves most incidents on its own.
Drain, don't drop — stop sending new work to a degraded
instance instead of killing it mid-request.
Replace, then retire — scale up a healthy replacement
first, shift traffic, then drain the degraded one. No gap in capacity.
Restart as last resort — only after the diagnosis is
captured, never before. The evidence survives the fix.
Our own databases never get a generic drain-and-restart: we kill specific
queries, fail over to a verified-ready replica, and make one change at a
time. The destructive actions — a failover, a restart — stay behind a
human approval gate. Permanently.
Included on every plan.
Agent Surface
Same APIs, same data, same access as human operators. Point your coding
agent at our MCP server and it drives the platform from your IDE.
Two MCP servers. External for your developer agent (Claude Code, Cursor,
and the rest), internal for our ops team. ~35 tools at launch.
Deploy, rollback, restart, restore via MCP tool call.
Query traces, logs, metrics by tenant + endpoint + window.
Tenant API keys with 3 scopes: admin, deploy, read-only.
Included on every plan.
Deploys
Push to GitHub. Or SSH in and rsync. We accept both.
Code is built into an immutable image, deployed blue/green, and gated by
health + canary metrics before traffic shifts. Failed deploys auto-revert
with full forensics preserved.
Webhook-triggered builds for git-backed tenants.
CLI synsmarts deploy for workspace-backed tenants.
True zero-downtime deploys — not a best-effort rolling
restart. The new build comes up beside the live one and takes
zero production traffic until it clears its health and canary checks. The
cutover is atomic: requests in flight finish on the old slot while new
requests land on the new one. Because each slot runs its own object cache,
the new slot is already warm — no cold start, no thundering-herd cache
stampede the moment you go live. Sessions ride a shared session store
across the switch, so a customer mid-checkout never gets logged out or
loses their cart. And if the new build fails its gates, traffic never
moved in the first place — the deploy auto-reverts and your customers
never saw it. No maintenance window, no "be right back" page, no dropped
requests.
A real staging environment, not a toy. Spin up a second
instance at the smallest tier and you get a true staging environment:
the full production stack, real AWS, real diagnostics — not a stripped
sandbox that behaves differently from prod. Deploy the same git SHA to
both instances to promote a verified build, and run the diagnostic report
against staging before it ever reaches your customers.
Multi-model code review on your deploys. Prism is the
review gate we run on our own platform changes: every change goes in
front of a panel of independent frontier AI models that have to reach
consensus before it ships. One reviewer has blind spots — a panel that
must agree catches what any single model misses. Opt your deploys in and
the same gate reviews your code before it reaches the build.
Migration safety: we run it before you do. Most hosts
read your migration files and guess whether they're safe. But schema
migrations and data patches execute arbitrary code — file content alone
can't tell you what a Magento Data Patch will do to your data. So we
don't guess. When a deploy touches the schema, we clone your live
database, run the migration against the throwaway clone, and classify
what actually happened before a single byte of production data is
touched.
Code-only — the majority of deploys. No schema change,
no classification overhead, straight to blue/green.
Additive — new columns, tables, or indexes. Safe to
apply while your current code keeps serving traffic. Zero downtime.
Breaking — drops, renames, new NOT-NULL columns.
Gated behind your explicit approval, a maintenance window, and a fresh
backup taken immediately before the migration runs.
Can't classify — if the dry-run can't prove a
migration is safe, we treat it as breaking. Fail closed, never fail
open.
Included on every plan.
Speed
Fast is the default, not an add-on. Every tenant serves from the edge,
with the PHP and cache layers tuned before you deploy a line.
The same diagnostic engine that names a slow query also keeps the fast
path fast: static and cacheable responses are served from Cloudflare's
global edge, dynamic responses come off a pre-tuned PHP stack, and the
object cache survives every deploy so a release never cold-starts your
site.
Cloudflare edge on every tenant. Every request enters
through Cloudflare's global network — cached assets serve from the edge
nearest your customer, not a round trip to origin. WAF, CDN, and DDoS
protection ride the same path. Included on every plan, not a premium
CDN upsell.
PHP tuned before you arrive. OPcache and FastCGI Cache
are wired into the base image by default, per PHP version. No plugin to
install, no config to hand-tune — the fast path is the path you get.
Object cache that survives deploys. Per-slot external
Redis backs your application object cache. Blue and green slots hold
independent cache instances, so a deploy never flushes your live cache
and traffic never lands on a cold site.
Media off the edge. Uploads land in a per-tenant S3
bucket and serve through the CDN, so images and downloads don't compete
with your application for PHP workers.
Scales with the spike. Flash sale or launch day,
resources burst up to meet demand and settle back when it passes —
burst to 3× your base reservation, the first 10% of the month free.
Included on every plan.
Backups + Restore
Backups are table stakes, not a premium tier.
Hourly XtraBackup. Continuous binlog shipping at 1-minute intervals.
Hourly EBS snapshots. Tamper-proof cross-region copies. Same retention
for every tenant.
30-day XtraBackup retention.
14-day binlog retention — PITR to any timestamp in the window.
7-day EBS snapshot retention as tertiary safety net.
Self-service restore: pick a timestamp, watch the workflow provision
a fresh primary, swap endpoints, keep the old for a rollback window.
Quarterly cross-region restore drills, full fleet coverage.
Included on every plan.
Security + Compliance
PCI-DSS SAQ D as a Service Provider. GDPR co-equal.
CDE boundary published, audited annually, validated quarterly via ASV
scans and continuous file-integrity monitoring at the container layer.
Per-tenant KMS keys, multi-region.
File-integrity monitoring and runtime security on every node, every
container.
Cardholder-data discovery scanner across MySQL, S3, and the telemetry
store — pattern matching for any payment data outside the expected
flow.