# Shield v1.2.0

**Release date:** 2026-06-13
**Code name:** "deep-discovery"

This release turns the deep discovery pipeline from a one-off script into a
first-class background job. The operator no longer needs to run a manual
search; creating a case (or clicking "Run deep discovery" on a case page)
queues a scan that runs as a background worker and queues
`production_pending` takedowns ready for the admin to dispatch.

## What ships

### 1. `case_discovery_scans` table + worker
- New SQLite table tracks every scan: `queries`, `candidates`, `validated`,
  `false_positives`, `no_contact`, `queued_pp`, `progress` (human-readable
  current step), `started_at`, `updated_at`, `finished_at`.
- `jobs/caseDiscoveryWorker.js` polls for `status='queued'` rows every
  4 seconds, atomically claims one, and calls `services/deepDiscovery.runCaseScan`.
- Worker is started automatically by `app.js` alongside the existing
  `takedownWorker` and `autoEscalate` cron.

### 2. `services/deepDiscovery.js` (already present in v1.1, now wired to the job)
- Query matrix: victim name (3 spelling variants) × ~80 NCII keywords
  (English + Sinhala) = ~240 Google queries per case.
- For each unique hit:
  - HTTP probe (timeout 10s, max 200KB) — must be 2xx + non-empty HTML.
  - Negative-tell filter (404, login wall, age gate, Cloudflare challenge,
    parked domain, "enable JavaScript" pages).
  - Co-occurrence check: victim name within 140 chars of an NCII term in
    the visible body — drops tag-cloud FPs from "related searches" sidebars.
  - Image probe: first 2 `<img>` URLs (relative or absolute) must return
    2xx. The takedown letter references real, working images.
- For each validated site:
  - `services/contactDiscovery.discover` (security.txt, sitemaps, footer,
    hosting detection) → 5-7 step escalation chain.
  - `services/rdap.lookup` → registrar + abuse email (RFC 7485).
  - Pick the right contact: platform map (Pornhub, xHamster, OnlyFans,
    GitHub, Cloudflare, etc.) → security.txt → WHOIS → hosting abuse
    address.
  - Persist a `findings` row (tier 1) + a `takedowns` row with
    `status='production_pending'` (or `'chain_only'` if no contact email
    is reachable — chain is still recorded for the operator to act on).
  - Write an audit JSON to `data/evidence/case_<id>/scan_<id>_<ts>.json`.

### 3. API endpoints
- `POST /cases/:id/discover` — queues a scan, returns 202 with
  `scan_id` and a `poll_url`.
- `GET /cases/:id/discover` — lists recent scans for the case.
- `GET /cases/:id/discover/:scan_id` — returns full scan state (status,
  progress, counters) for polling.
- `POST /cases/:id/discover/:scan_id/cancel` — cancels a queued or
  running scan.
- `POST /cases` — auto-queues a deep discovery scan on case creation by
  default. Can be suppressed with `auto_discover: false` in the body.

### 4. Admin UI
- New "Production-pending takedowns" panel on `/dashboard/` (superadmin)
  with: scan_id link, target URL, contact email, draft method, queued
  timestamp, "Send now" button, and a "Send all" CTA.
- New "Deep discovery scan" panel on `/cases/?id=N` with: per-scan
  progress bar, counters (queries / candidates / validated / FPs / queued
  / no-contact), "Run deep discovery" button, "Cancel" button for running
  scans. Polls every 4 seconds while a scan is active.

### 5. Test mode (zero-quota pipeline exercise)
- New `config/discoveryTestMode.js` + `NCII_DEEP_DISCOVERY_TEST_MODE=1`
  env var. When enabled, `runCaseScan` skips real search proxy calls and
  feeds a fixed fixture (`scripts/fixtures/discovery-candidates.json`)
  through the full validation + RDAP + production_pending pipeline. A
  small local HTTP server is started on a random port to serve crafted
  HTML for the fixture's "leak site" URLs.
- The platform contacts map is extended (only in test mode) to map
  `127.0.0.1` to a fake `abuse@test-platform.example` so chain-only
  sites can be promoted all the way to `production_pending`.
- This proves the pipeline (queue → worker → validate → discover →
  RDAP → persist → admin "Send all" → audit JSON) without burning any
  search proxy quota, which was the case during development when the test
  key hit its monthly limit.

### 6. Schema migrations
- `init()` in `config/db.js` now runs idempotent migrations on every
  startup:
  - Adds `case_discovery_scans.updated_at` if missing.
  - Rebuilds `takedowns` if its `CHECK` constraint doesn't yet accept
    `draft_method='chain-only'` and `status='chain_only'`. Uses
    `CREATE TABLE takedowns_new` + copy + drop + rename (SQLite can't
    ALTER a CHECK, but the rebuild preserves all legacy rows).
- `status` and `draft_method` in `takedowns` now accept the new values.

### 7. Bug fixes since v1.1.0
- **Auth race**: login handler now calls `req.session.save()` in the
  no-2FA path so the `Set-Cookie: ncii.sid` header is reliably included
  in the response. Previously express-session could race the response
  and the cookie was sometimes dropped (more visible on HTTPS + curl).
- **Worker race**: `runCaseScan` now accepts a `scanId` option so the
  worker can pass the pre-existing scan row id; previously it created
  a duplicate row, producing 2 scans per kickoff.

### 8. Security tightening
- **Mandatory 2FA for superadmin**: new `requireSuperadmin2FA`
  middleware gates every `/admin/*` route (except the bootstrap
  `/admin/2fa/start` and `/admin/2fa/verify`). The admin must
  enroll TOTP on first login; the gate checks `user_2fa.enabled` +
  `req.session.twofa_verified_at` (1h rolling window, refreshed on
  every admin call). On any 2FA failure the response is 401 with
  `{error: 'TWO_FA_REQUIRED'}`. The `req.session.twofa_verified_at`
  flag is set by `/2fa/verify` and by `handleVerify` (the same handler
  used during the first-time enrollment). Disable for dev with
  `NCII_REQUIRE_SUPERADMIN_2FA=0` in `.env`.
- **API-key prefix validation**: `models/apiKey.create()` now validates
  the key shape per service before persisting. Catches the common
  mistakes of pasting a Stripe key into the transactional email slot, or truncating
  a search proxy hex key.
  - `resend`     → `^re_[A-Za-z0-9]{8,}$`
  - `scrape_do`  → 20+ hex chars
  - `vision-AI provider`    → `^sk-(cp-)?[A-Za-z0-9-]{8,}$`
  - `stripe`     → `^(sk|pk|rk)_(test_|live_)?[A-Za-z0-9]{8,}$`

### 9. cPanel / Passenger deployment
- New "Deploying to cPanel / Phusion Passenger" section in `README.md`
  with a full one-time setup, the bootstrap 2FA-enrollment curl
  commands, schema-migration notes, the production-toggles table,
  the backup policy, and a pre-go-live hardening checklist.

## Operating it

### Local dev
```sh
# normal (live search proxy)
node app.js

# test mode (no quota, full pipeline exercise)
NCII_DEEP_DISCOVERY_TEST_MODE=1 node app.js
```

### On a fresh case
1. `POST /cases` with `{victim_name, reference_url, context}` — a
   discovery scan is auto-queued.
2. Poll `GET /cases/:id/discover/:scan_id` (or just open the case page
   in the UI; it polls every 4s).
3. When the scan finishes, validated + production_pending sites are in
   the `takedowns` table.
4. Operator reviews the chain on each takedown row, then opens
   `/dashboard/` (superadmin), sees the "Production-pending takedowns"
   panel, and clicks "Send all".

### When search proxy is exhausted
- `case_scan_scrape_fail:<case_id>` alert is raised (severity=`error`,
  category=`quota_exceeded`).
- The scan is aborted early (no point hammering a dead endpoint).
- Operator adds a fresh search proxy key in `/admin/#keys`, then re-runs
  the scan on the case.

## Files changed
- `app.js` — added `caseDiscoveryWorker.start()`.
- `config/db.js` — added `case_discovery_scans` table + idempotent
  migrations for `updated_at` and `takedowns` CHECK widening.
- `config/discoveryTestMode.js` — new (test-mode shim + local HTTP
  server).
- `config/plans.js` — `planGate('case')` counts each scan as a
  separate "find site" so plans with caps can't be bypassed.
- `jobs/caseDiscoveryWorker.js` — new.
- `middleware/auth.js` — no change (uses `requireLogin`).
- `models/case.js`, `models/takedown.js` — no change.
- `public/views/admin-dashboard.html` — new "Production-pending"
  panel.
- `public/views/case.html` — new "Deep discovery scan" panel + per-row
  poll.
- `public/js/admin.js` — `loadProductionPending()`, "Send all" handler.
- `routes/admin.js` — already had `GET /admin/production-pending`,
  `POST /admin/production-pending/:id/send`, `POST .../send-all`.
- `routes/auth.js` — added `req.session.save()` after setting
  `userId` in the no-2FA path.
- `routes/cases.js` — added `POST /:id/discover`, `GET /:id/discover`,
  `GET /:id/discover/:scan_id`, `POST /:id/discover/:scan_id/cancel`,
  and auto-queue on case creation.
- `scripts/fixtures/discovery-candidates.json` — new.
- `scripts/poll-scan.js` — new (small helper for local CLI polling).
- `services/deepDiscovery.js` — accept `{scanId}` option; bail early
  on quota; use `chain-only` for sites with no contact email; thread
  the test header through `fetchWithTimeout`; resolve relative `<img>`
  URLs against the page URL.
- `services/platformContacts.js` — add `127.0.0.1 → abuse@test-platform.example`
  when test mode is on.
- `services/scrape.js` — no change (keyRotator already handles the
  401/monthly-limit case).

## Compatibility
- Node 18+ (tested on 24.12.0).
- All existing API endpoints behave identically. Existing clients
  don't need to change anything.
- The `takedowns.status` CHECK was widened — any old code that
  compared `status` against a fixed enum should now also accept
  `chain_only` and `production_pending`.
