Dev.to US tech 2026-06-27 02:28

GoでゼロリークのPostgres MCPゲートウェイを構築する

原題: Building a Zero-Leak Postgres MCP Gateway in Go

分析結果

カテゴリ: AI
重要度: 65
トレンドスコア: 27
要約: この記事では、Go言語を使用してゼロリークのPostgres MCPゲートウェイを構築する方法について説明します。具体的には、Postgresデータベースとの接続を最適化し、データの漏洩を防ぐための設計原則や実装手法を紹介します。これにより、効率的で安全なデータ処理が可能となり、システムの信頼性が向上します。
キーワード: schema column err compounds query metadata region database

The promise of agentic AI workflows introduces a critical architectural paradox: to make an LLM deeply useful, you must grant it structural awareness of your data layer. Traditional integration patterns force a losing trade-off. Either you hand an external orchestrator direct database access (risking catastrophic data egress), or you must serialize and persist your entire proprietary database schema onto third-party infrastructure. This exposure of internal domain definitions outside the secure perimeter represents a massive intellectual property leak, stalling production AI adoption in highly competitive or regulated sectors. For instance, a localized real estate consultancy managing proprietary compound metrics and high-value transactional ledgers cannot afford to expose its structural competitive edge to a shared cloud context just to run an analytical prompt. To bridge this gap, backend teams must shift toward an architectural pattern where the data plane isolates schema definitions and executes only the commands explicitly defined by the MCP server, delivering pre-approved aggregations without ever leaking raw data layouts upstream. This article demonstrates how to build a zero-leak database proxy in Go using the Model Context Protocol (MCP) over a secure stdio transport layer. By decoupling the LLM from direct database access, you will implement a live gateway that executes two core tasks: Dynamic Schema Reflection to auto-generate tool manifests programmatically, and Analytical Egress Hardening to ensure the external AI agent never touches a raw database row. The project follows a standard go folder layout - cmd/ for the entrypoint, pkg/db for the Postgres connection and logic. This isn’t a framework requirement, just a convention that keeps schema reflection, query execution and MCP transport cleanly separated. You can flatten this into a single file for prototyping. The MCP zero-leak architecture Three things make this gateway zero-leak; Schema Visibility : What it’s allowed to see from the data source ( EXPOSED_TABLES ) Aggregation: What it’s allowed to compute Tool Registration to the MCP server : How these become callable by the LLM. The Schema Visibility Schema visibility step utilizes Postgres’ information_schema.columns table to actually fetch column metadata from the database - instead of having to hardcode or dump it out of our database every time the LLM needs to know about what schema structure is available in our data layer. In pkg/db/postgres.go we create an InspectExposedSchema function that returns a slice of type ColumnMetadata which can eventually be passed into the LLM context window. package db import ( "database/sql" ) // ColumnMetadata defines a single column in our postgres database type ColumnMetadata struct { TableName string ColumnName string DataType string } // InspectExposedSchema reads structural layout data dynamically from the system catalog. func InspectExposedSchema ( db * sql . DB , exposedTables [] string ) ([] ColumnMetadata , error ) { query := ` SELECT table_name, column_name, data_type FROM information_schema.columns WHERE table_schema = 'public' AND table_name = ANY($1) ORDER BY table_name, column_name;` rows , err := db . Query ( query , exposedTables ) if err != nil { return nil , err } defer rows . Close () var metadata [] ColumnMetadata for rows . Next () { var col ColumnMetadata if err := rows . Scan ( & col . TableName , & col . ColumnName , & col . DataType ); err != nil { return nil , err } metadata = append ( metadata , col ) } if err := rows . Err (); err != nil { return nil , err } return metadata , nil } In the project we’ve setup a .env file with the following variable: EXPOSED_TABLES = compounds , sales_ledger This variable is read and passed into the InspectExposedSchema function to fetch only those tables that we’ve explicitly whitelisted for visibility. It’s worth dwelling on why this is a deny-by-default allowlist rather than an exposed-by-default filter. A more generic approach would remove the filter entirely, but in a regulated FinTech or real estate platform, that's not a hypothetical risk. Staging tables, audit logs, or a users table with national ID numbers would become visible to the orchestrator the moment they're created, with zero code change and zero review. The allowlist isn't extra friction, it's the only thing standing between "the LLM sees what we intended" and "the LLM sees whatever the last migration happened to leave lying around.” Important note on the choice of environment variables in this article: In this article we’re only highlighting a single filter level (table level). But a more production-ready design would include a deeper deny list on a more granular level for columns such as surrogate keys, create/delete/update timestamps or vector fields if you’re using PGVector. Aggregation Raw query access is the obvious approach — and the wrong one. Here's why the gateway pre-defines every computation the LLM is allowed to run. For this project, we are taking on one business case where the user of the LLM needs an aggregate of the total number of units sold ( units_sold ), total revenue made ( revenue_egp ) and total cancelled orders ( cancelled_orders ) for a specific region In the schema provided in the repository, we have 2 entities compounds and sales_ledger . sales_ledger column compound_id is a foreign key that references compounds.id . In many popular MCP implementations, the LLM would generally create the aggregation query and send it as plain-text for execution. This poses massive security risk - aside from DELETE or DROP statements which are naive assumptions given a read-only access. The real risk is an exhaustive SELECT query. There is no telling what the LLM might decide is the best path. For the majority of cases it might send the correct query for the business need directly. -- Find aggregate of units sold, revenue, cancelled orders -- relative to a select region SELECT compounds . region , sum ( units_sold ) AS TOTAL_UNITS_SOLD , sum ( revenue_egp ) AS TOTAL_REVENUE , sum ( cancelled_orders ) AS TOTAL_CANCELLED FROM sales_ledger JOIN compounds ON sales_ledger . compound_id = compounds . id WHERE compounds . region = ANY ( $ 1 ) GROUP BY compounds . region But if an attacker were to hijack a session or acquire access to the server running the LLM, there is no stopping them from instructing or injecting a prefix to the context window that instructs the LLM to pull raw data to the server and process it instead of aggregate it. A more secure gateway only allows the LLM to know what it must know - without any possibility of further hijacking. In pkg/db/queries.go we initialize a Queries struct and constructor for it which accepts a *sql.DB connection: type Queries struct { db * sql . DB } func NewQueries ( db * sql . DB ) * Queries { return & Queries { db : db , } } Then we create the result struct for the first type of aggregation which consists of all the fields that represent a single record out from the above query. type RegionalMetricsResult struct { Region string `json:"region"` UnitsSold int `json:"unitsSold"` TotalRevenue float64 `json:"totalRevenue"` CancelledOrders int `json:"cancelledOrders"` } Finally, we create FindRegionalMetrics method on Queries struct with a pointer receiver: func ( q * Queries ) FindRegionalMetrics ( ctx context . Context , regions [] string ) ([] RegionalMetricsResult , error ) { query := `SELECT compounds.region, sum(units_sold) AS TOTAL_UNITS_SOLD, sum(revenue_egp) AS TOTAL_REVENUE, sum(cancelled_orders) AS TOTAL_CANCELLED FROM sales_ledger JOIN compounds ON sales_ledger.compound_id = compounds.id WHERE compounds.region = ANY($1) GROUP BY compounds.region` rows , err := q . db . QueryContext ( ctx , query , pq . Array ( regions )) if err != nil { return nil , err } defer rows . Close () var result [] RegionalMetricsResult for rows . Next () { var col RegionalMetricsResult if err := rows . Scan ( & col . Region , & col . UnitsSold , & col . TotalRevenue , & col . CancelledOrders ); err != nil { return nil , err } result = append ( result , col ) } if err := rows . Err (); err != nil { return nil , err } return result , nil } pq.Array is required here because Go's database/sql doesn't natively serialize a string slice to Postgres's ANY($1) array syntax - the lib/pq driver wrapper handles that translation. The FindRegionalMetrics and any similar method absolutely doesn’t have to know about who is calling it. It doesn’t care if the caller is an MCP server or a CRUD API server. It is pure business logic that constricts and abstracts flow from the underlying data store, essentially telling the LLM what it is allowed to do with the data. This is also true in case your team decides to create a more complex and dynamic aggregate implementation - The end goal remains the same: You give the LLM a sparse set of information proxies that cannot be abused even if an attacker gains access. Registering the functions as MCP Tools Now comes the part where we register these tools as discoverable and usable utilities to the LLM. For this project, we are using github.com/mark3labs/mcp-go to register MCP tools and run the MCP server. First, we define a small helper that serializes any result type to indented JSON before returning it to the MCP transport layer. Using any as the input type means this same function works for every tool response — schema metadata, regional metrics, or any future query result. func formatResult ( v any ) string { b , _ := json . MarshalIndent ( v , "" , " " ) return string ( b ) } In production, the marshal error should be handled explicitly. For this gateway, marshaling failures on known struct types are effectively impossible, but the pattern should be hardened before shipping. The library makes it easy to add a descriptor for the tools using the mcp.NewTool method. For the list_tables tool - initialize a tool name and the description: listTablesTool := mcp . NewTool ( "