Ongoing

Korean Public Procurement Analytics

Builder · 2026 · 7 min read

A two-tier practitioner platform for the 나라장터 public procurement ecosystem — free layer: 82 law-verified concept cards on Astro Starlight; paid layer: FastAPI workflow engine with 나라장터 + 지방재정 API integration, 24,338 contracts analyzed across 6 live data runs, Railway-deployed demo with 5 client profiles.

Overview

A knowledge product and workflow platform built for Korean public procurement practitioners preparing for the 공공조달관리사 (Public Procurement Manager) national credential. The free layer is a static reference site (Astro Starlight) with 82 concept cards covering the 14 core procurement regulations — each card law-verified via live law.go.kr API cross-check. The paid layer is a FastAPI practitioner engine that queries 나라장터 (G2B) and 지방재정365 (LOFIN) APIs, applies statutory compliance rules (낙찰하한율, 적격심사 scoring, 수의계약 thresholds), and identifies anomaly patterns in 24,338 local government contracts — surfacing procurement decisions that deviate from legal guardrails.

Problem

Two problems run in parallel. For the credential market: the 공공조달관리사 exam expects practitioners to apply 14 statutory instruments to realistic tender scenarios. No organized, searchable, law-linked reference exists. Exam prep relies on paper study guides that go stale as regulations change annually. For the analytics market: Korean local government procurement data is published through 지방재정365 (LOFIN) as monthly parquet files with no standard anomaly detection layer. A city comptroller or procurement auditor who wants to know whether last quarter's 수의계약 awards follow the legal threshold criteria has no off-the-shelf tool. Manual review across 24,338 contracts is not viable.

Constraints

  • G2B API (나라장터) and LOFIN API rate limits require aggressive caching — brute-force re-fetching against 나라장터's 40M+ contract records is not viable on free-tier credentials
  • Statutory rules change annually (행정안전부 고시): 낙찰하한율 rates and 표준노무비 rates embedded in the compliance engine must be updated at the start of each fiscal year
  • 82 public concept cards pass the public-sync quality gate (title, law_id, verified_at, jurisdiction fields required); remaining 22 are private or unresolved — MISMATCH status from law.go.kr verification blocks public sync
  • 지방재정365 data covers local government (지자체) contracts only; 나라장터 central government data is separately gated and requires higher API tier for bulk download
  • Outreach to 지자체 procurement offices requires named contacts before any email — cold emails without a named recipient are blocked by hard rule

Approach

Free layer first: extract 99 concept cards from a 15-chapter exam-prep book via Anthropic Files API (claude-sonnet-4-6 as OCR + semantic extractor), populate law_id fields against MST code registry (28 instruments Tier 1–3), run live law.go.kr verification for each article citation, run public-sync quality gate, deploy to Astro Starlight on Vercel. Paid layer: DuckDB/Parquet pipeline querying G2B and LOFIN APIs; compliance functions compute 낙찰하한율 floor, 적격심사 score bracket, and 수의계약 threshold per contract record; anomaly detection applies isolation forest and local outlier factor for price deviations and contract-splitting patterns. Output: FastAPI + Jinja2 demo with 5 client profiles deployed to Railway.

Key Decisions

Static site (Astro Starlight) for free layer, not a SaaS subscription

Reasoning:

The credential market has strong SEO intent (practitioners searching specific article citations) and low willingness to pay for the reference function alone. A free, law-linked, searchable reference site builds the inbound audience that converts to paid workflow engine subscribers. Dynamic SaaS for the reference function would require auth, hosting costs, and a payment wall that blocks the discovery channel. Starlight's static output deploys to Vercel at zero cost and serves the full card index via CDN.

Alternatives considered:
  • Paywall the concept cards as a standalone product — converts poorly; practitioners already have PDF study guides; the moat is the law linkage, not the content itself
  • Notion or GitBook for the reference layer — acceptable for internal use; not deployable as a branded product with custom domain and analytics

Live law.go.kr verification for every concept card, not manual review

Reasoning:

Korean procurement regulations update annually via 행정안전부 고시. Manual review of 99 cards after each regulatory change is a recurring maintenance burden. The law.go.kr Open API provides article-level text retrieval — the verify_card_accuracy.py tool calls it for each card's law_id citation and compares the claimed rule with the live legal text. Cards that drift to MISMATCH are flagged for remediation before the next public-sync. This automation is the moat: competitors' PDF guides and static web pages go stale silently.

Alternatives considered:
  • Manual annual review — adequate for 20 cards; does not scale to 100+ cards across 14 statutes with sub-article granularity
  • LLM-only verification without authoritative source — introduces hallucination risk on legal citations; practitioners would correctly distrust it

DuckDB in-memory query against Parquet, not SQLite or PostgreSQL, for the analytics layer

Reasoning:

LOFIN monthly parquet files arrive as columnar data best suited for window aggregations (vendor concentration over rolling 12 months, 수의계약 cumulative thresholds by NAICS-equivalent category). DuckDB runs these in-memory joins and aggregations significantly faster than SQLite row scans, and the Parquet files are the durable store — no database migration risk. The same DuckDB pattern is used in the Korean forensic finance toolkit, making the code patterns consistent across both projects.

Alternatives considered:
  • PostgreSQL — correct choice if multi-user write concurrency is needed; over-engineered for a single-user analytics workbench querying read-only monthly extracts
  • pandas only — readable but 10-50x slower than DuckDB for the rolling window aggregations over 24,338 contracts

Tech Stack

  • Python ≥3.11, uv
  • FastAPI + Jinja2 (practitioner web app)
  • Astro Starlight (free layer static site)
  • DuckDB, pandas, PyArrow, Parquet
  • 나라장터 G2B API (공공데이터포털 G2B_API_KEY)
  • 지방재정365 LOFIN API (LOFIN_API_KEY)
  • law.go.kr Open API (LAW_OC) — concept card verification
  • Anthropic SDK (claude-sonnet-4-6) — book OCR ingest
  • scikit-learn (isolation forest, LOF, gradient boosting price model)
  • Railway (demo deployment), Vercel (Astro static site)
  • pytest (618 tests)

Result & Impact

  • 618
    Tests passing
  • 24,338
    Local government contracts analyzed
  • 6
    Live data runs
  • 82
    Public concept cards (law-verified)
  • 28 (14 core + 14 supporting)
    Statutory instruments indexed
  • 15 (full exam-prep book)
    OCR chapters ingested

The first open-access, law-linked Korean procurement reference with automated regulatory drift detection. Each concept card is verified against live law.go.kr article text — not a static PDF. The paid layer applies statutory compliance rules (낙찰하한율, 적격심사, 수의계약 thresholds) to actual 지자체 contract data, surfacing deviations a human auditor would need a week to identify manually.

Learnings

  • Law-linking is the moat, not content. Procurement exam prep content is commoditized (PDF study guides exist). Automatically verifiable, article-level citations that survive regulatory updates are not. The verify_card_accuracy.py MATCH/MISMATCH feedback loop makes the product defensible in a way a static guide cannot be.
  • Statutory rule encoding is brittle at the edges. The 낙찰하한율 rate table has seven category × estimate × issuer combinations, each with sub-percentage precision. Encoding these correctly required reading the 행정안전부 고시 original text, not a secondary source. One digit error in a rate table invalidates an entire compliance bracket.
  • Demo realism determines outreach conversion. The Railway demo with 5 real BRNs and live 나라장터 API field names (MAS 2단계경쟁, 영업일 countdown, 복수예비가격 15-dot spread) converted a practitioner meeting from a cold email. A generic 'analytics dashboard' would not have.
  • Named-contact outreach is a hard constraint, not a soft preference. Every 지자체 outreach campaign stalls at the same point: emails without a named recipient go to general inboxes and die. The pipeline is call-first (get a named contact) → PDF + data package → email. Reversing this order wastes effort.

Architecture

Free Layer — Astro Starlight

82 concept cards covering the 14 core procurement regulations (국가계약법, 지방계약법, 조달사업법, 예산회계법, and 10 supporting instruments). Each card is sourced from a 15-chapter exam-prep book ingested via Anthropic Files API + Claude OCR, then verified against live law.go.kr article text. Cards that drift to MISMATCH status after a regulatory update are automatically flagged by verify_card_accuracy.py and blocked from public sync until remediated. The Astro Starlight site is deployed to Vercel as a static build.

Paid Layer — FastAPI Practitioner Engine

DuckDB in-memory analytics over LOFIN monthly parquet files. Three compliance functions run per contract record:

  • 낙찰하한율_floor(category, estimate, issuer, sme) — computes the statutory minimum award rate given the contract category, estimated cost, issuing agency type, and SME classification
  • 적격심사_bracket(ct, issuer, estimate) — returns the applicable 적격심사 scoring bracket specification under the relevant schedule
  • 수의계약 threshold checks — flags awards above statutory limits for the applicable exception category

Anomaly detection adds isolation forest (price outliers), local outlier factor (vendor concentration), and contract-splitting pattern detection (cumulative awards approaching 수의계약 thresholds from below).

Demo

Railway-deployed FastAPI + Jinja2 app with 5 client profiles (4 real BRNs + 당진건설 demo). All compliance tabs are functional in demo_mode=True. Auth layer (/login, /register, /clients/add) is built but only activated in demo_mode=False.