Tech Stack — Morgan (Final)¶
Status: Final — deployment architecture updated 2026-04-08 (Casey sign-off)
1. Finalized Stack¶
| Layer | Decision | Change from Previous Version |
|---|---|---|
| Frontend | Next.js 15 (App Router) + React 19 + TypeScript | No change |
| UI | shadcn/ui + Tailwind CSS | No change |
| Backend | Node.js + Hono + tRPC | No change |
| ORM | Drizzle ORM | No change |
| Database | PostgreSQL via Supabase | No change |
| Scoring / domain logic | Pure TypeScript domain package (monorepo) | No change — complexity is justified, see Section 2 |
| File parsing | xlsx (SheetJS) for .xls/.xlsx import |
No change — Excel import is v1 scope per requirements |
| Cache / Pub-Sub | Removed — Upstash Redis dropped entirely for v1 | CHANGED — see Section 2 |
| Auth | Supabase Auth (JWT + RLS) | No change |
| Real-time | Supabase Realtime (primary) + SSE fallback | No change |
| Hosting | keystone shared platform on kst1.wagen.io (Hetzner VPS, Docker compose, Caddy ingress) |
CHANGED twice — first off Vercel onto a single VPS (Section 5), then off the single VPS onto keystone (Phase 3 migration; see keystone repo) |
| Monorepo | Turborepo | No change |
| CI/CD | GitLab CI via keystone shared templates (buildx Docker build, deploy step rolls compose stack on kst1) | CHANGED twice — GitHub Actions + Vercel removed (Section 5), then SSH-rsync replaced by the keystone deploy contract |
2. Simplicity Challenge — Layer-by-Layer Review¶
This section documents the challenge applied to each layer after reading the full requirements. Every layer was evaluated against the actual scale: one organizer, a handful of signage screens, players browsing on phones during a Swiss club tournament.
[DECISION] Drop Upstash Redis entirely for v1¶
What I evaluated: The previous version retained Redis for rate limiting on the file upload and scraper ingestion endpoints.
Why it is being removed: At this scale and user profile, there is nothing to rate-limit. The file upload endpoint is auth-gated — only the authenticated organizer can reach it, and they will call it once or twice per tournament. The scraper ingestion endpoint does not exist in v1 (Stage 3 is explicitly future scope per requirements Section 3). There are no public write surfaces. The unauthenticated endpoints are the read-only signage screens, which are not candidates for rate limiting.
Adding a paid managed Redis service to rate-limit an endpoint that one person calls twice per event is over-engineering. It adds a billing dependency, a secrets management concern, and operational surface area for zero measurable benefit.
What brings it back: If Stage 3 scraping ships and requires polling pressure management, or if the app ever exposes a public write surface, Redis can be added then. The architecture does not need to change — it is an additive concern.
[DECISION] Keep Next.js 15 App Router — justified, not over-engineered¶
What I evaluated: Whether a Vite + React SPA with a standalone Express/Hono server would be simpler.
Why App Router stays: SCR-001 (Schedule Screen) and SCR-002 (Draw/Bracket) are displayed on venue TVs all day. They are public, unauthenticated, and benefit from server-rendered initial HTML — fast first paint without a client-side data fetch waterfall. App Router server components deliver this with no additional SSR infrastructure. The alternative (Vite SPA + Node server) would require a separate deployment and separate SSR configuration to achieve the same result.
The authenticated organizer screens (SCR-003 through SCR-005) are client components with interactive drag-and-drop and real-time updates. The same App Router project handles both without a split deployment. This is the right fit, not a novelty choice.
What would be over-engineered: Using Next.js with a separate Express backend deployed independently. The tRPC router runs inside Next.js API routes — one deployed application, not two.
[DECISION] Keep tRPC — end-to-end types are earned, not aspirational¶
What I evaluated: Whether plain REST endpoints or Next.js Server Actions would be simpler and sufficient.
Why tRPC stays: The scoring domain produces deeply typed structures: MatchResult with SetScore[] and embedded TiebreakScore, 7 DrawFormat variants (seven full scoring rule sets F1-F7; F8 is the degenerate standalone match tiebreak and is seeded as an 8th row in draw_formats but does not count as a scoring rule set variant) with different validation rule sets, bracket winner_slot enums. With plain REST, maintaining TypeScript types across the API boundary on a 5-person team requires a shared type package or OpenAPI code generation — both add their own maintenance burden. tRPC gives that type safety for free as a natural consequence of defining procedures in TypeScript.
Server Actions are a valid alternative for simple mutations but they do not provide the explicit router + procedure model that makes the API surface auditable and independently testable. For a domain with this many business rule variants, auditability matters.
Trade-off acknowledged: tRPC adds a learning curve for developers unfamiliar with it. This is a hiring concern, not an architecture defect — the Backend Developer and Frontend Developer skill profiles require tRPC experience.
[DECISION] Keep Hono as the HTTP adapter — it is thin by design¶
What I evaluated: Whether Hono adds unnecessary complexity over plain Next.js API routes.
Why it stays: Hono is a thin HTTP framework sitting in front of the tRPC router. It adds the file upload endpoint (SheetJS parsing, multipart form data) and the SSE fallback endpoint — both of which are awkward to implement cleanly inside Next.js API route handlers. Hono's request/response handling is more ergonomic for these cases. It is not adding a second server; it runs inside the same Next.js server process.
[DECISION] Keep pure packages/scoring domain package — complexity is justified by the domain¶
What I evaluated: Whether the scoring logic could live inline in API routes or as plain utility functions rather than an isolated package.
Why it stays, and why this is the one place complexity is earned: The scoring engine must handle:
- 7 DrawFormat configurations with distinct validation rules (F1-F7; see tRPC decision above for the F8 caveat) covering games per set, no-ad, final set rule, tiebreak targets, win-by-two variants
- Round robin standings with a 6-step tiebreak sort requiring H2H sub-calculations among tied players only
- Three special outcome types (WALKOVER, DEFAULT, RETIREMENT) each with distinct set and game counting rules
- Super-tiebreak points that count as a set but not as games in GW/GL
- Bracket advancement: traversing next_match_id chains and filling winner_slot atomically
These rules are precise, enumerable, and testable. They must not be scattered across API route handlers. Isolating them in a zero-dependency TypeScript package with high unit test coverage is the only defensible choice. The complexity is in the domain, not in the architecture decision.
This package is also imported by the frontend for client-side standings preview. Sharing it via the monorepo avoids duplication without a deployment concern.
[DECISION] Keep Supabase (Postgres + Auth + Realtime) as a single managed service¶
What I evaluated: Whether splitting Auth, Realtime, and Postgres into separate services would give more flexibility.
Why it stays: The data model is relational through and through. Bracket topology is FK chains on the Match row. Registrations link Players to Tournaments and Draws. JSONB stores MatchResult inline on the Match row. Supabase gives managed Postgres, Auth with JWT, Row Level Security, and Realtime change notifications in one service and one billing relationship. Separating these would be strictly more complex for identical capability at this scale.
The real-time audience is small: one organizer, three or four signage screens, a few dozen players browsing concurrently at peak. Supabase Realtime handles this comfortably at its base tier.
[DECISION] Keep Turborepo monorepo — appropriate for 4 packages, not over-engineered¶
What I evaluated: Whether a flat single-app repository would be simpler.
Why the monorepo stays: The packages/scoring isolation is load-bearing — it must be imported by both the backend (for write-time validation) and the frontend (for client-side preview). A flat repo would either duplicate this logic or introduce awkward relative imports that break tooling. Four packages (one app, three shared packages) is the natural minimum for this codebase structure. Turborepo's affected-package filtering and task pipeline orchestration provide real CI efficiency at this structure — only changed packages are rebuilt and tested on each commit.
[RESOLVED] Concurrent tournament support is a non-issue at this scale¶
The previous version flagged this as an open question. The requirements make the operating assumption clear: one organizer manages one tournament at a time. The data model already handles multiple tournaments via tournament_id scoping on every entity. No special infrastructure, connection pooling strategy, or multi-tenancy layer is needed. Close this.
[RESOLVED] Bracket topology is FK columns on Match row (Option A)¶
Confirmed in requirements Section 7: next_match_id (UUID FK → Match, null for the final) and winner_slot (UPPER | LOWER enum) on the Match row. Double-elimination is explicitly out of scope. No bracket_edges table needed.
3. Architecture Overview¶
Browser / Signage Screen
|
| HTTPS / WebSocket
v
Caddy on kst1.wagen.io (keystone ingress — reverse proxy + automatic TLS)
|
Next.js 15 App Router — standalone build, packaged into a Docker image by buildx, run as a compose service on kst1
├── Server Components (SCR-001, SCR-002 — public signage, SSR initial paint)
└── Client Components (SCR-003–005 — organizer, auth-gated)
|
| tRPC over HTTP (end-to-end type safety, no schema drift)
v
Hono API (Next.js route handler — persistent process, no serverless timeout)
├── tRPC router
├── File upload endpoint (Excel import, SheetJS, multipart)
├── SSE endpoint (real-time fallback for proxied WebSocket environments)
└── packages/scoring (pure domain logic, imported here for write-time validation)
|
|─── PostgreSQL (Supabase — Frankfurt eu-central-1, free tier)
| Row-Level Security enforces organizer vs. public access
| Realtime publications on: matches, draws, registrations
| JSONB column on Match for MatchResult
|
└─── Supabase Auth (JWT)
Monorepo packages:
- apps/web — Next.js application
- packages/scoring — pure TS domain logic (DrawFormat validation, RR standings with 6-step tiebreak, bracket advancement)
- packages/db — Drizzle schema + migrations + typed query helpers
- packages/trpc — tRPC router definitions shared between app and API
Removed from architecture: - Upstash Redis — no justification at v1 scale. Re-evaluate if Stage 3 scraping ships.
4. Key Technical Risks¶
[RISK] Round robin standings performance at render time¶
The 6-step tiebreak sort requires iterating all H2H match pairs among tied players. For a round robin group of 8 players (28 matches) this is trivial in isolation. With multiple concurrent draws and live updates triggering re-calculations, standings must be computed server-side on the result-save mutation and pushed as a derived payload — not recalculated on every client on every Realtime event.
Mitigation: standings computation runs inside the tRPC mutation that saves a result, result is pushed via Supabase Realtime as a derived standings payload, not as a raw match row change for the client to re-derive.
[RISK] Excel date serial conversion¶
SwissTennis .xls exports store date fields (columns 1 and 7: registered_at, date_of_birth) as Excel date serials. SheetJS returns raw numbers unless cellDates: true is passed. A one-line omission produces silently wrong dates with no parse error.
Mitigation: write a dedicated date converter in a packages/import utility, cover it with unit tests using known SwissTennis export samples with confirmed expected outputs. This package also handles the Excel date serial epoch difference between pre-1900 and post-1900 dates.
[RISK] Bracket advancement on result save is a multi-step write¶
When a result is entered on a non-final elimination match:
1. Save MatchResult on current match
2. Resolve winner from result
3. Read next_match_id and winner_slot from current match
4. Write player1_id or player2_id on the next match row
All four steps must be atomic in a single Postgres transaction. A partial write corrupts the bracket.
Mitigation: enforce this in a single tRPC mutation. The client never issues multi-step bracket writes. The transaction boundary is in the tRPC procedure, not the caller.
[RISK] Drag-and-drop on mobile¶
SCR-005 (Court Assignment) requires drag-and-drop across court columns. The HTML5 Drag and Drop API does not fire on touch devices. The organizer is expected to use this on a tablet while at the venue.
Mitigation: use @dnd-kit/core which uses pointer events and handles touch. The Frontend Developer must prototype SCR-005 drag-and-drop on a real mobile device before other SCR-005 work begins. Desktop browser touch emulation is not acceptable for this sign-off.
[RISK] Supabase Realtime subscription management on long-lived signage screens¶
Signage screens run unattended all day. Supabase Realtime channels not explicitly unsubscribed accumulate. Network interruptions at a venue are likely.
Mitigation: signage pages subscribe to the minimum channel scope (filtered by tournament_id), implement explicit cleanup on component unmount, and reconnect with exponential backoff on network interruption. This is an implementation discipline requirement, not a stack risk — but it must be in the Frontend Developer's brief from day one.
5. Deployment Architecture Decisions¶
Historical (2026-04, Casey-era). The decisions in this section captured the original CSD-only Hetzner VPS + PM2 deployment. They are kept here as a record of the design context that produced
next.config.tssettings still in use (e.g.output: 'standalone'). The actual deployment substrate has since moved to the keystone shared platform (Hetznerkst1.wagen.io, Docker compose, Caddy, self-hosted Postgres + gotrue + realtime + cron). For current deployment guidance, see the keystone runbooks atgitlab.com/wagen-public/keystone-public/-/tree/main/docs/runbooks/(notably10-app-onboarding.mdand12-platform-deploy-target.md). The Casey-era deployment runbooks were retired in #166.
The following decisions were proposed by Casey (DevOps) and signed off by Morgan on 2026-04-08.
[DECISION] Hosting: Hetzner CX22 VPS + Caddy + PM2 — Vercel removed¶
Vercel is no longer the deployment target. The Next.js application runs on a Hetzner CX22 VPS (€3.79/month, 2 vCPU, 4 GB RAM) behind Caddy as a reverse proxy with automatic TLS via Let's Encrypt. PM2 manages the Node.js process lifecycle.
Why Vercel was removed: The PO's existing infrastructure does not include Vercel in a way that avoids cost at this scale, and GitLab (not GitHub) is the code host. Namecheap shared hosting is a hard blocker for Next.js — not a configuration problem. The Hetzner VPS is the cheapest viable option at ~€4/month total new spend.
Accepted trade-offs: - No global CDN / Vercel Edge Network. Latency is higher for non-EU users. Acceptable — this is a Swiss club app. - No automatic per-PR preview deployments. Confirmed acceptable by PO. Developers use local Supabase CLI for pre-merge review. - Single point of failure. Hetzner's uptime SLA is strong; PM2 restarts on crash; a VPS reboot takes under 60 seconds. Acceptable for v1.
[DECISION] output: 'standalone' in next.config.ts — signed off¶
Next.js standalone build mode is required for VPS deployment. It bundles the minimal Node.js server and required node_modules into .next/standalone/. PM2 runs .next/standalone/server.js directly. Build runs in GitLab CI; only the compiled standalone output is rsynced to the VPS — the VPS never runs next build.
Implementation requirements for the dev team:
- The public/ directory and .next/static/ must be copied alongside .next/standalone/ in the rsync deploy step — they are not bundled automatically.
- Internal workspace packages (packages/scoring, packages/db, packages/trpc) must be reachable from the standalone bundle. Use transpilePackages in next.config.ts for any non-compiled internal packages. Verify on a clean CI run before the first tournament.
[DECISION] Hono runs as a Next.js route handler in the persistent process — no restructuring needed¶
Hono runs as a Next.js route handler at apps/web/app/api/[...]/route.ts. On Vercel this handler ran inside a short-lived serverless function invocation. On Hetzner it runs inside the long-lived PM2-managed Node.js process. The Hono integration code does not change. No custom server adapter is required.
The practical benefit: Vercel's 10-second serverless function timeout is eliminated. The Excel import and SheetJS parsing endpoints run with no enforced timeout ceiling — the previous risk flagged in the DevOps skill profile is now closed.
[DECISION] Turborepo remote cache skipped for v1 — GitLab CI local cache sufficient¶
Turborepo remote cache (previously Vercel-hosted) is not configured for v1. GitLab CI caches node_modules and .turbo directories keyed against package-lock.json. This is local-runner cache only but provides meaningful speed on repeated runs.
The justification for the Turborepo monorepo is unaffected — affected-package filtering and task pipeline orchestration are the core value, not the remote cache. Remote cache can be added later (self-hosted ducktors/turborepo-remote-cache on the existing VPS) if CI build times become a pain point.
[DECISION] CI/CD: GitLab CI + SSH deploy to VPS — GitHub Actions removed¶
Pipeline stages: install → check (typecheck + lint, parallel) → test (vitest) → build (Docker image via buildx, per ADR-0010 in keystone) → deploy (keystone shared template SSHes into kst1 and rolls the per-env compose stack). Secrets stored as masked variables in GitLab CI/CD settings. Service-role-key style secrets must never appear in build logs — verify masked variable behavior before first production deploy.
6. Open Questions¶
[OPEN QUESTION] PDF import (Stage 2)¶
Stage 2 includes PDF import from SwissTennis. PDF parsing is significantly harder than Excel — SwissTennis PDFs are likely generated from a fixed template but the exact layout is unknown until a sample is obtained. This is deferred to Stage 2.
The Backend Developer must be hired with awareness of PDF parsing libraries (pdf-parse, pdfjs-dist). The exact library choice depends on whether the SwissTennis PDF is text-layer or rasterised. A sample PDF must be obtained from the PO before committing to a library.
7. Skill Profiles — Agents to Hire¶
The following profiles define the minimum acceptable skill set for each remaining agent. Cleo (HR) should use these as hiring criteria.
Backend Developer¶
Stack: Node.js, Hono, tRPC, Drizzle ORM, PostgreSQL, Supabase, TypeScript
Must-have skills:
- TypeScript at the type-level: generic types, discriminated unions, branded types. The scoring domain uses all of these.
- tRPC: defining routers, procedures, middleware, context. Must understand the type inference chain end-to-end.
- Drizzle ORM: schema definition, migrations, relational queries, transactions. Must be comfortable writing raw SQL when Drizzle's query builder falls short.
- PostgreSQL: indexes, JSONB columns (MatchResult is stored as JSONB), row-level security policies, Supabase Realtime publications.
- Supabase Auth: JWT validation, RLS policy design. Must understand that RLS is the security boundary, not just a convenience feature.
- File parsing: SheetJS for .xls/.xlsx (26-column SwissTennis export, Section 6 of requirements). Must handle Excel date serials correctly — cellDates: true is not enough; see [RISK] above.
- Domain modeling: can implement the RR standings 6-step tiebreak, all DrawFormat validation rules, and bracket advancement in a pure, testable TypeScript package without reaching for framework helpers.
- Unit testing: Vitest or Jest. The scoring package requires high coverage because the rules are exact and enumerated in the requirements.
Nice-to-have: Experience with PDF parsing libraries. Awareness of Stage 3 web scraping patterns (Playwright or Puppeteer for headless browser automation against SwissTennis portal).
Red flag: A backend developer who reaches for an ORM's findMany for every query and avoids SQL. The standings and bracket queries require SQL-level thinking.
Frontend Developer¶
Stack: Next.js 15 (App Router), React 19, TypeScript, tRPC client, shadcn/ui, Tailwind CSS, Supabase Realtime client
Must-have skills:
- Next.js App Router: server components vs. client components, server actions, route groups, streaming. Must know when NOT to use a server component — the organizer screens are interactive and must be client components.
- React 19: hooks, concurrent features, Suspense boundaries. The real-time signage screens require careful Suspense and transition use to avoid stale renders during Realtime updates.
- tRPC client: useQuery, useMutation, optimistic updates. SCR-005 (Court Assignment) requires optimistic UI on drag-drop to feel instantaneous even before the server confirms.
- Supabase Realtime client: channel subscriptions, filter syntax, reconnect handling. Will implement the live-updating signage screens (SCR-001, SCR-002).
- Drag and drop: @dnd-kit/core with touch/pointer events. Must prototype on real mobile hardware before any SCR-005 work is considered complete.
- Responsive layout: Tailwind CSS responsive prefixes, CSS Grid for multi-column court layout (SCR-001, SCR-005). Both screens have dynamic column counts driven by the number of courts — this is not a fixed-column layout problem.
- shadcn/ui: knows the component library internals well enough to extend components without copy-pasting the entire source.
- Accessibility: signage screens display publicly; contrast ratios and font sizing for large-screen readability are first-class concerns.
Nice-to-have: Experience with tournament bracket visualisation (tree layouts in SVG or CSS Grid). The single-elimination bracket sub-view (SCR-002A) requires a non-trivial visual layout.
Red flag: A frontend developer who has only worked with the pages-router Next.js and treats App Router as a folder rename. The server/client component boundary is a genuine mental model shift that affects every architectural decision in the rendering layer.
UX/UI Designer¶
Stack awareness required: Tailwind CSS, shadcn/ui component constraints, mobile browser viewport limitations
Must-have skills: - Information-dense table design: the round robin standings grid, the draw bracket, and the contact sheet are all dense data views. The designer must be comfortable with data-heavy interfaces — this is not a marketing site. - Signage and large-screen design: SCR-001 and SCR-002 are displayed on TVs or large monitors at arm's length. Font sizing, contrast ratios, and layout must be designed for glanceability at distance, not for a 14-inch laptop. - Mobile-first responsive design: SCR-005 (Court Assignment) requires drag-and-drop on a tablet held by an organizer walking between courts. This is a primary use case, not a nice-to-have. - Design system adherence: must work within shadcn/ui component constraints. Custom components must extend the system, not bypass it. - Domain consultation: must read the requirements document and consult Rafael before wireframing the draw and bracket screens. Tennis scoring conventions and bracket topology are not intuitive without domain knowledge. - Figma or equivalent: deliverables must be in a format the Frontend Developer can implement directly.
Nice-to-have: Experience designing for real-time data that changes while the user is looking at it (live dashboards, sports results). Must understand that layout shift on data update is a UX defect, not a cosmetic issue.
Red flag: A designer who produces beautiful static mockups but has never considered what a screen looks like when data arrives asynchronously and content reflows.
QA Engineer¶
Stack awareness required: TypeScript, Vitest/Jest, Playwright, Supabase local dev, tRPC
Must-have skills:
- Unit testing in TypeScript with Vitest or Jest. The packages/scoring package requires near-complete coverage. Must be able to write tests for the 6-step tiebreak algorithm, all 7 DrawFormat validation rule sets (F1-F7; F8 caveat per Section 2), and all three special outcomes (WO, DEFAULT, RETIREMENT) with their distinct counting rules.
- End-to-end testing with Playwright: can write tests that drive the full organizer flow — create tournament, import player list, assign matches to courts, enter a result, verify the live update appears on the signage screen within the 2-second SLA.
- Real-time testing: can write E2E tests that open two browser contexts simultaneously (organizer context + signage context) and assert that a result entered in one context appears in the other within AC-007's 2-second target.
- Database state management: can seed and reset a Supabase local database between test runs using the Supabase CLI. Must establish a repeatable test database state.
- Mobile testing: must test SCR-005 drag-and-drop on an actual mobile device or a reliable hardware emulator. Desktop browser touch emulation is not acceptable for this sign-off.
- Boundary-value testing: the scoring rules have many edges — tiebreak loser score storage convention, retirement partial set inclusion, super-tiebreak not counting in GW/GL, WO/DEFAULT empty sets array. QA must derive test cases directly from the requirements spec, not from the happy path.
Nice-to-have: Experience writing contract tests for tRPC procedures. Familiarity with visual regression testing (Chromatic or Percy) for the bracket view.
Red flag: A QA engineer who only does manual testing or treats Playwright as a click-recorder. The real-time assertion requirement demands programmable, assertion-driven test logic with explicit timing.
DevOps Engineer¶
Historical (2026-04 hire). Casey was hired against the pre-keystone single-VPS + PM2 stack described below. From Phase 3 onward CSD runs on the keystone shared platform (Docker compose on
kst1.wagen.io); platform-engineering work for that substrate sits in the keystone repo (Atlas) rather than in CSD. This profile is preserved as the historical hiring record.
Stack awareness required (historical, pre-keystone): Linux VPS administration, Caddy, PM2, GitLab CI, Supabase, Turborepo
Must-have skills:
- Linux server administration (Ubuntu 24.04): user management, SSH hardening (key-only auth, password disabled), firewall configuration, systemd service setup. The VPS is the only application server — it must be set up securely before any deployment happens.
- Caddy: reverse proxy configuration, automatic TLS via Let's Encrypt, HTTP → HTTPS redirect, proxying to the PM2-managed Node.js process. Must know how to reload Caddy config without downtime.
- Process management for the Next.js standalone server (originally PM2 in 2026-04; superseded by Docker compose under keystone). Must understand process supervision, crash restart, log rotation, and reboot persistence — the specific tool changed when CSD moved onto keystone.
- GitLab CI: pipeline configuration in .gitlab-ci.yml. Original Casey pipeline shape was install → check (typecheck + lint) → test (vitest) → build (next build standalone) → deploy (rsync + SSH process-manager restart); under keystone the build step is a buildx Docker build and the deploy step calls a keystone shared template that rolls a compose stack on kst1. Must understand Turborepo's --filter and affected-packages logic to avoid full rebuilds on unrelated package changes.
- Next.js standalone build deployment: must know that public/ and .next/static/ must be copied alongside .next/standalone/ in the rsync step. Must verify that internal workspace packages are accessible from the standalone bundle before first production deploy.
- Supabase CLI: local dev setup, migration workflows, schema diff. Must establish a supabase db push flow that runs reliably in CI without manual intervention.
- Secrets management: all secrets stored as masked variables in GitLab CI/CD settings. The Supabase service role key must never appear in build logs. Must verify masked variable behavior before first production deploy. VPS .env.local written by the deploy job, not committed to the repository.
- Supabase anti-pause cron: configure the VPS crontab to ping the Supabase REST endpoint every 5 days to prevent free-tier auto-pause. This must be in place before the first production tournament.
- Monitoring: error tracking (Sentry or equivalent) and uptime monitoring for signage screens. These run unattended all day at a venue — a silent crash is unacceptable.
Nice-to-have: Experience setting up a self-hosted turborepo-remote-cache server (for a future upgrade if CI build times become a pain point). Experience running Playwright in CI with parallelisation and artifact upload on test failure.
Red flag: A DevOps engineer who has only worked with fully managed PaaS platforms (Vercel, Railway, Render) and has never administered a Linux VPS directly. The security posture and process management of a bare VPS requires hands-on Linux experience — push-and-it-works platforms do not build that skill.