Korean Public Procurement Analytics
A two-tier practitioner platform for the 나라장터 public procurement ecosystem — free layer: 82 law-verified concept cards on Astro Starlight; paid layer: FastAPI workflow engine with 나라장터 + 지방재정 API integration, 24,338 contracts analyzed across 6 live data runs, Railway-deployed demo with 5 client profiles.
Overview
A knowledge product and workflow platform built for Korean public procurement practitioners preparing for the 공공조달관리사 (Public Procurement Manager) national credential. The free layer is a static reference site (Astro Starlight) with 82 concept cards covering the 14 core procurement regulations — each card law-verified via live law.go.kr API cross-check. The paid layer is a FastAPI practitioner engine that queries 나라장터 (G2B) and 지방재정365 (LOFIN) APIs, applies statutory compliance rules (낙찰하한율, 적격심사 scoring, 수의계약 thresholds), and identifies anomaly patterns in 24,338 local government contracts — surfacing procurement decisions that deviate from legal guardrails.
Problem
Two problems run in parallel. For the credential market: the 공공조달관리사 exam expects practitioners to apply 14 statutory instruments to realistic tender scenarios. No organized, searchable, law-linked reference exists. Exam prep relies on paper study guides that go stale as regulations change annually. For the analytics market: Korean local government procurement data is published through 지방재정365 (LOFIN) as monthly parquet files with no standard anomaly detection layer. A city comptroller or procurement auditor who wants to know whether last quarter's 수의계약 awards follow the legal threshold criteria has no off-the-shelf tool. Manual review across 24,338 contracts is not viable.
Constraints
- G2B API (나라장터) and LOFIN API rate limits require aggressive caching — brute-force re-fetching against 나라장터's 40M+ contract records is not viable on free-tier credentials
- Statutory rules change annually (행정안전부 고시): 낙찰하한율 rates and 표준노무비 rates embedded in the compliance engine must be updated at the start of each fiscal year
- 82 public concept cards pass the public-sync quality gate (title, law_id, verified_at, jurisdiction fields required); remaining 22 are private or unresolved — MISMATCH status from law.go.kr verification blocks public sync
- 지방재정365 data covers local government (지자체) contracts only; 나라장터 central government data is separately gated and requires higher API tier for bulk download
- Outreach to 지자체 procurement offices requires named contacts before any email — cold emails without a named recipient are blocked by hard rule
Approach
Free layer first: extract 99 concept cards from a 15-chapter exam-prep book via Anthropic Files API (claude-sonnet-4-6 as OCR + semantic extractor), populate law_id fields against MST code registry (28 instruments Tier 1–3), run live law.go.kr verification for each article citation, run public-sync quality gate, deploy to Astro Starlight on Vercel. Paid layer: DuckDB/Parquet pipeline querying G2B and LOFIN APIs; compliance functions compute 낙찰하한율 floor, 적격심사 score bracket, and 수의계약 threshold per contract record; anomaly detection applies isolation forest and local outlier factor for price deviations and contract-splitting patterns. Output: FastAPI + Jinja2 demo with 5 client profiles deployed to Railway.
Key Decisions
Static site (Astro Starlight) for free layer, not a SaaS subscription
The credential market has strong SEO intent (practitioners searching specific article citations) and low willingness to pay for the reference function alone. A free, law-linked, searchable reference site builds the inbound audience that converts to paid workflow engine subscribers. Dynamic SaaS for the reference function would require auth, hosting costs, and a payment wall that blocks the discovery channel. Starlight's static output deploys to Vercel at zero cost and serves the full card index via CDN.
- Paywall the concept cards as a standalone product — converts poorly; practitioners already have PDF study guides; the moat is the law linkage, not the content itself
- Notion or GitBook for the reference layer — acceptable for internal use; not deployable as a branded product with custom domain and analytics
Live law.go.kr verification for every concept card, not manual review
Korean procurement regulations update annually via 행정안전부 고시. Manual review of 99 cards after each regulatory change is a recurring maintenance burden. The law.go.kr Open API provides article-level text retrieval — the verify_card_accuracy.py tool calls it for each card's law_id citation and compares the claimed rule with the live legal text. Cards that drift to MISMATCH are flagged for remediation before the next public-sync. This automation is the moat: competitors' PDF guides and static web pages go stale silently.
- Manual annual review — adequate for 20 cards; does not scale to 100+ cards across 14 statutes with sub-article granularity
- LLM-only verification without authoritative source — introduces hallucination risk on legal citations; practitioners would correctly distrust it
DuckDB in-memory query against Parquet, not SQLite or PostgreSQL, for the analytics layer
LOFIN monthly parquet files arrive as columnar data best suited for window aggregations (vendor concentration over rolling 12 months, 수의계약 cumulative thresholds by NAICS-equivalent category). DuckDB runs these in-memory joins and aggregations significantly faster than SQLite row scans, and the Parquet files are the durable store — no database migration risk. The same DuckDB pattern is used in the Korean forensic finance toolkit, making the code patterns consistent across both projects.
- PostgreSQL — correct choice if multi-user write concurrency is needed; over-engineered for a single-user analytics workbench querying read-only monthly extracts
- pandas only — readable but 10-50x slower than DuckDB for the rolling window aggregations over 24,338 contracts
Tech Stack
- Python ≥3.11, uv
- FastAPI + Jinja2 (practitioner web app)
- Astro Starlight (free layer static site)
- DuckDB, pandas, PyArrow, Parquet
- 나라장터 G2B API (공공데이터포털 G2B_API_KEY)
- 지방재정365 LOFIN API (LOFIN_API_KEY)
- law.go.kr Open API (LAW_OC) — concept card verification
- Anthropic SDK (claude-sonnet-4-6) — book OCR ingest
- scikit-learn (isolation forest, LOF, gradient boosting price model)
- Railway (demo deployment), Vercel (Astro static site)
- pytest (618 tests)
Result & Impact
- 618Tests passing
- 24,338Local government contracts analyzed
- 6Live data runs
- 82Public concept cards (law-verified)
- 28 (14 core + 14 supporting)Statutory instruments indexed
- 15 (full exam-prep book)OCR chapters ingested
The first open-access, law-linked Korean procurement reference with automated regulatory drift detection. Each concept card is verified against live law.go.kr article text — not a static PDF. The paid layer applies statutory compliance rules (낙찰하한율, 적격심사, 수의계약 thresholds) to actual 지자체 contract data, surfacing deviations a human auditor would need a week to identify manually.
Learnings
- Law-linking is the moat, not content. Procurement exam prep content is commoditized (PDF study guides exist). Automatically verifiable, article-level citations that survive regulatory updates are not. The verify_card_accuracy.py MATCH/MISMATCH feedback loop makes the product defensible in a way a static guide cannot be.
- Statutory rule encoding is brittle at the edges. The 낙찰하한율 rate table has seven category × estimate × issuer combinations, each with sub-percentage precision. Encoding these correctly required reading the 행정안전부 고시 original text, not a secondary source. One digit error in a rate table invalidates an entire compliance bracket.
- Demo realism determines outreach conversion. The Railway demo with 5 real BRNs and live 나라장터 API field names (MAS 2단계경쟁, 영업일 countdown, 복수예비가격 15-dot spread) converted a practitioner meeting from a cold email. A generic 'analytics dashboard' would not have.
- Named-contact outreach is a hard constraint, not a soft preference. Every 지자체 outreach campaign stalls at the same point: emails without a named recipient go to general inboxes and die. The pipeline is call-first (get a named contact) → PDF + data package → email. Reversing this order wastes effort.
Architecture
Free Layer — Astro Starlight
82 concept cards covering the 14 core procurement regulations (국가계약법, 지방계약법, 조달사업법, 예산회계법, and 10 supporting instruments). Each card is sourced from a 15-chapter exam-prep book ingested via Anthropic Files API + Claude OCR, then verified against live law.go.kr article text. Cards that drift to MISMATCH status after a regulatory update are automatically flagged by verify_card_accuracy.py and blocked from public sync until remediated. The Astro Starlight site is deployed to Vercel as a static build.
Paid Layer — FastAPI Practitioner Engine
DuckDB in-memory analytics over LOFIN monthly parquet files. Three compliance functions run per contract record:
낙찰하한율_floor(category, estimate, issuer, sme)— computes the statutory minimum award rate given the contract category, estimated cost, issuing agency type, and SME classification적격심사_bracket(ct, issuer, estimate)— returns the applicable 적격심사 scoring bracket specification under the relevant schedule- 수의계약 threshold checks — flags awards above statutory limits for the applicable exception category
Anomaly detection adds isolation forest (price outliers), local outlier factor (vendor concentration), and contract-splitting pattern detection (cumulative awards approaching 수의계약 thresholds from below).
Demo
Railway-deployed FastAPI + Jinja2 app with 5 client profiles (4 real BRNs + 당진건설 demo). All compliance tabs are functional in demo_mode=True. Auth layer (/login, /register, /clients/add) is built but only activated in demo_mode=False.