krff-shell — Natural Language Finance Shell
11-tool MCP server exposing the Korean forensic finance toolkit as natural-language queries via Claude Desktop — DuckDB query layer, parameterized SQL (injection-safe), per-company self-contained HTML reports (Plotly), optional Claude narrative synthesis. FastAPI REST endpoints mirror all 11 MCP tools. 317 tests.
Overview
An MCP (Model Context Protocol) server and CLI that wraps the forensic-accounting-toolkit's analytics as 11 tools callable from Claude Desktop or any MCP-compatible client. Tools span the full analytical stack: company registry lookups, trading calendar computations, Beneish M-Score screening, CB/BW dilution assessment, anomaly score retrieval, and JFIA literature search. All tools execute parameterized SQL through a shared DuckDB connection layer (`krff/db.py`) — no string interpolation in query construction. Per-company HTML reports embed interactive Plotly charts as self-contained files with no external CDN dependency. FastAPI REST endpoints mirror all 11 MCP tools for non-MCP integrations.
Problem
The forensic-accounting-toolkit's 12 component libraries each have their own Python API. Using them together requires knowing which library to call, which parquet files to load, how to join results across data sources, and how to format output for human review. A practitioner answering 'which KOSDAQ companies had both a high M-Score and a CB repricing event in the same fiscal year?' must orchestrate at least three libraries, two parquet schemas, and undocumented join logic. krff-shell collapses this into a single natural-language query: the MCP layer handles routing, data loading, and result formatting, while the practitioner describes what they want.
Constraints
- Local-only deployment — krff-shell runs against local parquet files produced by the toolkit's data pipeline; no hosted service, no shared database; each user runs their own instance
- MCP protocol requires all tool inputs and outputs to be JSON-serializable — Plotly figure objects must be serialized and returned as HTML strings, not as native objects
- SQL injection risk in DuckDB: parameterized queries using `?` placeholders work in WHERE clauses but not for table names or column identifiers — dynamic table selection required a different injection mitigation strategy
- FastAPI REST endpoints must maintain schema parity with MCP tool signatures — drift between the two interfaces is not caught by the type checker and required dedicated parity tests
- Claude narrative synthesis is optional and gated behind an API key check — the shell must be fully functional without an active Anthropic API key
Approach
Built `krff/db.py` as a singleton DuckDB connection manager with typed parameterized query helpers. 11 tools implemented as standard Python functions; MCP registration and FastAPI route definition both reference the same function — no code duplication between interfaces. Per-company HTML reports use Plotly with `include_plotlyjs=True` (not CDN) so reports are viewable offline and do not depend on external asset availability. JFIA search (Tool 11) loads `jfia_catalog.json` at startup and runs keyword + abstract full-text matching, returning structured citation records. The test suite (317 tests) covers all 11 tool response schemas, all REST endpoint input/output contracts, SQL injection attempts on every parameterized field, and report HTML structure validation.
Key Decisions
Parameterized SQL for all DuckDB queries rather than f-string interpolation
The initial implementation used f-strings to construct DuckDB queries — fast to write, readable, and common in data analysis scripts. During the test suite build, a SQL injection test case against the company code parameter produced a query that executed successfully against an unintended table. DuckDB's in-memory model means injection doesn't reach a networked database server, but it can expose parquet file contents or corrupt an in-memory state that the session reuses. Parameterized queries using `?` placeholders in WHERE clauses, combined with an allowlist approach for table name selection, eliminated the injection surface.
- f-string interpolation throughout — readable but injectable; acceptable for single-user local scripts, not for a tool callable by an LLM agent
- ORM (SQLAlchemy) over DuckDB — parameterization handled automatically; adds a dependency and a translation layer between DuckDB's columnar semantics and the ORM's row-oriented model
Self-contained HTML reports (Plotly `include_plotlyjs=True`) rather than CDN-dependent or hosted dashboard
Per-company reports are shared with practitioners and saved to local directories. A CDN-dependent report that embeds `<script src='https://cdn.plot.ly/...'>` becomes non-functional when viewed offline or in an air-gapped environment. A hosted dashboard requires a running server and a session. Self-contained HTML files (single file, no external dependencies) can be emailed, archived, or viewed in a browser with no infrastructure. The file size increase (~3 MB per report vs. ~50 KB with CDN) is acceptable for artifacts intended for human review rather than bulk storage.
- CDN-dependent Plotly (`include_plotlyjs='cdn'`) — smaller file, same visual output; breaks offline and in environments where CDN is blocked
- Hosted FastAPI dashboard with shared state — better for real-time updates; requires a running server and introduces session management complexity
FastAPI REST layer mirroring all 11 MCP tools
MCP is the correct interface for Claude Desktop. But not all downstream integrations are MCP-compatible — programmatic scripts, CI pipelines, and non-Claude LLM clients use HTTP APIs. Building the FastAPI layer using the same function signatures as the MCP tools (no separate implementation) costs minimal additional development but preserves integration flexibility. The parity test suite ensures both interfaces return identical outputs for identical inputs.
- MCP-only, no REST layer — simpler codebase; removes the programmatic integration option before there is confirmed demand for it
- REST-only with no MCP — loses the Claude Desktop integration that makes natural-language queries possible
Tech Stack
- Python ≥3.11, uv
- MCP SDK (Anthropic)
- FastAPI + uvicorn
- DuckDB
- Plotly
- pandas, PyArrow
- Anthropic SDK (claude-sonnet-4-6) — optional narrative synthesis
- jfia_catalog.json (Tool 11 data source)
- pytest (317 tests)
Result & Impact
- 11MCP tools
- 317Tests
- 11 (mirrors MCP tools)REST endpoints
- Self-contained HTML (Plotly)Report format
The conversational interface for the forensic-accounting-toolkit. Questions that previously required orchestrating multiple libraries and parquet schemas are answered as a single Claude Desktop query. The parameterized SQL layer and 317-test coverage make it the most hardened component in the toolkit from a code-quality standpoint. The self-contained HTML report format has been used in practitioner outreach — a single file attachment that opens in any browser with no dependencies.
Learnings
- SQL injection testing should be part of the initial test suite, not a later hardening pass. The DuckDB injection surface was only discovered when building tests for the demo — if tests had been written first, the f-string interpolation pattern would have been caught before it appeared in the main implementation.
- MCP tool design is an API design problem. The first iteration of several tools returned raw DataFrames serialized to JSON — useful for machine consumption, not for the Claude narrative layer or for human review. The second iteration returns structured objects with a summary string, a findings list, and a data block — a format that the LLM can narrate and a human can skim.
- Parity between MCP tools and REST endpoints is a maintenance constraint, not a one-time setup. The dedicated parity test suite (checking identical inputs produce identical outputs across both interfaces) is what catches drift when one interface's response schema is updated without updating the other.
- Optional Claude narrative synthesis is the right default. Practitioners using the tool for rapid screening do not need narrative for every result — they need the data fast. Narrative is valuable for the final output (the per-company HTML report) and for explaining anomalies to a non-technical stakeholder. Gating it behind an API key check makes the dependency explicit.
Architecture
krff-shell is organized in three layers:
Query layer — krff/db.py
A singleton DuckDB connection manager. All 11 tools route their data access through this module. Parameterized helpers ensure that no tool constructs a query by string interpolation — all user-supplied values pass through ? placeholders in WHERE clauses. Table name selection uses an allowlist pattern (not parameterization, which DuckDB does not support for identifiers).
Tool layer — 11 MCP tools + 11 FastAPI routes
Each tool is a plain Python function. MCP registration and FastAPI route definition both reference the same function — there is no separate implementation for the two interfaces. Tool categories:
- Company data — registry lookup, financial statement retrieval, trading calendar
- Forensic screening — Beneish M-Score, CB/BW dilution assessment, anomaly scores
- Report generation — per-company self-contained HTML report
- Literature — JFIA article search (Tool 11, loads
jfia_catalog.jsonat startup)
Report layer — krff/reports.py
Generates per-company HTML files with interactive Plotly charts. include_plotlyjs=True embeds the Plotly JavaScript bundle in each file — reports are self-contained, offline-capable, and email-safe.
SQL Injection Mitigation
DuckDB’s parameterized query support covers WHERE clause values but not table names or column identifiers. The initial implementation used f-strings for table name selection — a pattern discovered during test construction to be exploitable within the in-memory session.
The fix: an allowlist for all table name inputs. If a caller passes a table name not in the allowlist, the query returns an empty result rather than executing against an unintended target. SQL injection tests for all 11 parameterized fields are in the test suite and run on every commit.
Dual Interface
| MCP | REST | |
|---|---|---|
| Client | Claude Desktop, MCP-compatible | HTTP, programmatic scripts |
| Input format | JSON tool call | HTTP request body |
| Output format | MCP tool response | JSON response |
| Schema source | Same Python function | Same Python function |
| Parity test | Yes — 317 tests include interface parity checks |