MCP Generator vs. Embedded MCP: Two Paths to AI-Ready APIs
When you want AI agents to interact with your REST API through the Model Context Protocol, you have two fundamentally different options: generate a standalone MCP server from your OpenAPI spec, or embed MCP directly into your existing API server. Both work. Neither is universally better. The right choice depends on what you're building, what you already have, and how much control you need.
The Two Approaches at a Glance
| Dimension | MCP Generator 3.x (External) | Embedded MCP (Same Server) |
|---|---|---|
| Architecture | Separate process, reads OpenAPI spec, proxies HTTP calls to your API | MCP tools live inside your API server, calling internal functions directly |
| Code ownership | Generated Python code in generated_mcp/ |
You write and maintain the tool registrations yourself |
| Transport | STDIO or Streamable HTTP (SSE) | Whatever your framework supports |
| Auth model | Dedicated middleware stack (JWT/JWKS, OAuth2) | Shared with your existing API auth |
| Deployment | Two services (your API + the MCP server) | One service |
| Language | Python 3.11+ (regardless of your API's language) | Same language as your API |
1. Architecture
MCP Generator produces a standalone FastMCP 3.x server that sits between the AI agent and your API. The generator reads your OpenAPI spec (3.0.x, 3.1.x, or Swagger 2.0), discovers tags automatically, and creates modular sub-servers (one per API tag). Each tool is essentially a typed wrapper that makes an HTTP call to your real API. The generated server runs as its own process, with its own middleware stack for timing, logging, caching, and auth.
Embedded MCP means your API server registers its own functions as MCP tools. If you're using FastAPI, you might use FastMCP's mount or compose pattern to add MCP endpoints alongside your REST routes. There is no intermediary. When an agent calls a tool, it executes your application code directly.
The architectural difference matters most when you think about failure modes. With the generator approach, your MCP server can crash without taking down your API (and vice versa). With embedded MCP, they share a fate.
2. Performance
This is where embedded MCP wins cleanly.
| Operation | MCP Generator | Embedded MCP |
|---|---|---|
| Tool invocation | Agent -> MCP server -> HTTP request -> your API -> response -> MCP server -> Agent | Agent -> MCP server (same process) -> internal function call -> Agent |
| Latency overhead | Full HTTP round-trip per tool call (DNS, TCP, TLS, serialization) | Near zero (in-process function call) |
| Caching | Built-in response caching middleware in the generated server | You implement it yourself, but you also have access to your app's existing cache layer |
For high-throughput scenarios or latency-sensitive agent workflows, the extra HTTP hop in the generator approach adds up. The generator does include a caching middleware layer that helps, but it cannot eliminate the fundamental cost of inter-process communication.
That said, for most practical agent interactions (where the bottleneck is the LLM, not the API call), the performance difference is negligible.
3. Developer Experience
MCP Generator shines when you want to go from "I have an OpenAPI spec" to "AI agents can use my API" with minimal effort. Three commands get you there:
generate-mcp # reads your spec, produces the server code
register-mcp # sets up client configuration
run-mcp # starts the server
You get auto-generated tests, BM25 search over tools (useful when your API has dozens or hundreds of endpoints), OpenTelemetry tracing, Docker output, and response limiting. The modular sub-server architecture keeps large APIs organized. For a 200-endpoint API, this is a significant time saver.
Embedded MCP requires you to write each tool registration by hand. For a small API (5 to 15 endpoints), this is straightforward and gives you fine-grained control over tool descriptions, parameter schemas, and behavior. For a large API, it becomes tedious and error-prone.
The tradeoff is classic: automation vs. control.
4. Maintenance
This is where embedded MCP has a structural advantage.
With MCP Generator, your generated code and your API can drift apart. Every time you add an endpoint, change a parameter, or modify auth requirements, you need to regenerate. The generated code lives in generated_mcp/, and while you can customize it after generation, those customizations risk being overwritten on the next run.
With Embedded MCP, there is only one codebase. When you add a new endpoint, you add the MCP tool registration right next to it. Refactoring is straightforward because everything lives in the same repo and the same language.
5. Security
| Concern | MCP Generator | Embedded MCP |
|---|---|---|
| Auth implementation | Dedicated middleware (JWT/JWKS validation, OAuth2 flows) | Reuses your existing API auth |
| Attack surface | Two services to secure, but the MCP server only proxies (no direct DB access) | One service, but MCP tools have access to everything your app can reach |
| Credential handling | MCP server holds API credentials to authenticate with your backend | No inter-service credentials needed |
| Isolation | Strong process-level isolation between MCP layer and business logic | No isolation; a bug in a tool handler can affect the whole application |
The generator approach provides a natural security boundary. The MCP server can only do what your API allows.
Embedded MCP tools have direct access to your database, internal services, and business logic. Powerful, but requires discipline about what you expose.
6. Deployment
MCP Generator: Two services. Your API + the generated MCP server. Docker output is provided, but operational complexity is real.
Embedded MCP: One deployment artifact. Scaling, monitoring, and deployment workflows stay exactly the same.
7. When to Use Which
Choose MCP Generator when:
- You have an existing API with a solid OpenAPI spec and want MCP without modifying the server
- Your API is large (50+ endpoints)
- Your API is in a language other than Python
- You want process-level isolation
- You need built-in observability, BM25 tool search, and response limiting
- You want to expose a third-party API you don't control
Choose Embedded MCP when:
- Your API is small to medium (under 50 endpoints)
- Performance matters and you can't afford the HTTP proxy overhead
- You want tools that go beyond REST (accessing internal state, atomic multi-operation tools)
- You prefer a single codebase and deployment
- Your API is already in Python (especially FastAPI)
- You're building both the API and MCP layer from scratch
The Hybrid Option
Nothing stops you from doing both. Use MCP Generator to bootstrap quickly, then migrate high-value tools to embedded implementations as your needs mature.
MCP Generator 3.x: Key Facts
- Repo: github.com/quotentiroler/mcp-generator-3.x
- License: Apache 2.0 (generated code is yours to license however you want)
- Stars: 16
- Python: 3.11+, uses uv
- Supported specs: OpenAPI 3.0.x, 3.1.x, Swagger 2.0 (JSON and YAML)
- Unique features: Modular sub-servers, JWT/JWKS auth, OAuth2 flows, middleware stack, auto-generated tests, MCP resources, tag auto-discovery, BM25 tool search, OpenTelemetry, Docker output