MCP under the hood: the entire protocol, without leaving a single detail out

Anyone who has read the MCP tutorial in TypeScript here on the blog knows how to stand up a working server. This post is the dense companion piece: the MCP protocol taken apart under the hood, from the transport layer to the handshake, from the primitives to the notifications, from JSON-RPC to OAuth. Nothing left out. If you are going to write a server for production, integrate with a custom client, or simply understand what is happening when Claude Desktop calls a tool, this is it.

MCP (Model Context Protocol) is an open protocol from Anthropic, released in November 2024 and under active iteration. The current stable version is 2025-06-18, with revision 2025-03-26 still supported for backward compatibility. The entire specification lives at modelcontextprotocol.io. The rest of this post is what is there, translated into direct language and organized for sequential reading.

1. Architecture: host, client, server

Three roles. Host: the application that orchestrates the LLM (Claude Desktop, Cursor, Windsurf, your custom agent). Client: the component inside the host that keeps a 1:1 connection with a server. Server: the process that exposes capabilities (tools, resources, prompts). A host can have N clients, each client connected to 1 server. Servers are dumb about the host, they know only their own scope.

The separation matters for security. The host decides which servers to connect to, manages user consent, and routes what each LLM conversation gets to see. The server never talks directly to another server. Every cross interaction goes through the host. This prevents lateral privilege escalation and keeps the blast radius predictable.

2. The three layers: transport, protocol, application

Transport is the physical channel for messages (stdio, HTTP). Protocol is the syntax and semantics of the messages (JSON-RPC 2.0). Application is the MCP vocabulary itself (methods like tools/call, notifications like notifications/resources/updated). The layers are independent: swapping the transport does not change the payload, swapping the payload does not change the transport.

3. Official transports

Two transports standardized by the spec.

stdio: the client spawns the server as a child process. JSON-RPC goes back and forth over stdin and stdout, one message per line (delimited by \n, with no embedded newlines in the JSON). stderr is free for logs. It is the default transport for local integration (Claude Desktop, IDE plugins). Minimal latency, no network, no auth (the process inherits the host permission).

Streamable HTTP: introduced in 2025-03-26, it replaces the old HTTP+SSE. The server exposes a single endpoint that accepts POST and GET. The client sends messages via POST with Content-Type: application/json. The server responds inline (plain JSON) or upgrades the response to text/event-stream when it needs streaming. The GET on the same endpoint opens an SSE channel for server-initiated messages (notifications, reverse sampling). Sessions are identified by the Mcp-Session-Id header returned in initialize and reused in subsequent calls. It is the recommended transport for any remote server.

Implementations can have custom transports (WebSocket, named pipes, gRPC bridges), but they lose interop. The spec recommends sticking to the two official ones.

4. JSON-RPC 2.0: the envelope

All MCP traffic is JSON-RPC 2.0. Three message types.

Request: asks for something and expects a response. It has an id (string or number, unique within the session), a method, and an optional params.

{
 "jsonrpc": "2.0",
 "id": 42,
 "method": "tools/call",
 "params": { "name": "create_ticket", "arguments": { "title": "x" } }
}

Response: matches the request by the same id. It has result on success, or error on failure. Never both.

{ "jsonrpc": "2.0", "id": 42, "result": { "content": [...] } }

{ "jsonrpc": "2.0", "id": 42, "error": { "code": -32602, "message": "invalid params" } }

Notification: fire-and-forget. No id, no response expected. Used for progress, state changes, logging.

{ "jsonrpc": "2.0", "method": "notifications/progress", "params": {...} }

Standard JSON-RPC errors: -32700 parse error, -32600 invalid request, -32601 method not found, -32602 invalid params, -32603 internal. MCP reserves the range -32000 to -32099 for specific errors (e.g. -32002 resource not found).

5. Lifecycle: initialize, operate, shutdown

Every MCP session follows three phases.

Initialize: the first request the client sends. It negotiates the protocol version and exchanges capabilities. The server picks the version (among those supported) and returns its own capabilities. The client confirms with the notifications/initialized notification. Before that confirmation, the server should not process any normal request (except ping).

// client -> server
{
 "jsonrpc": "2.0", "id": 1, "method": "initialize",
 "params": {
 "protocolVersion": "2025-06-18",
 "capabilities": { "sampling": {}, "roots": { "listChanged": true } },
 "clientInfo": { "name": "my-host", "version": "1.0.0" }
 }
}

// server -> client
{
 "jsonrpc": "2.0", "id": 1,
 "result": {
 "protocolVersion": "2025-06-18",
 "capabilities": {
 "tools": { "listChanged": true },
 "resources": { "subscribe": true, "listChanged": true },
 "prompts": {},
 "logging": {}
 },
 "serverInfo": { "name": "mcp-tickets", "version": "0.1.0" }
 }
}

// client -> server (notification)
{ "jsonrpc": "2.0", "method": "notifications/initialized" }

Operate: normal message exchange. It lasts as long as the session is alive.

Shutdown: stdio closes on EOF on stdin; HTTP closes on session timeout or a DELETE on the endpoint with Mcp-Session-Id. There is no shutdown method in the protocol.

6. Capability negotiation: who supports what

Each side announces what it offers. The server exposes combinations of tools, resources, prompts, logging, completions. The client exposes sampling, roots, elicitation (new in 2025-06-18). Subkeys enable fine-grained features: listChanged: true in tools means the server can emit notifications/tools/list_changed when the list changes. subscribe: true in resources enables resources/subscribe.

Golden rule: do not call a method without an announced capability. If the server did not declare prompts, sending prompts/list is an error.

7. Server primitives: tools, resources, prompts

Tools: functions that the LLM calls to perform an action (create a record, run a query, send a message). Each tool has a name (snake_case, unique within the server), a description (the text the LLM reads to choose), and an inputSchema (JSON Schema of the input). 2025-06-18 adds an optional outputSchema and annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) that help the host decide the confirmation UX.

Methods: tools/list (with pagination via cursor), tools/call. The response of tools/call has content (an array of TextContent, ImageContent, AudioContent or EmbeddedResource) and isError (a boolean indicating a logical failure of the tool, distinct from a protocol error). 2025-06-18 adds an optional structuredContent for a typed return matching the outputSchema.

Resources: data that the client reads to feed context. Identified by URI (file://, https://, or a custom scheme like tickets://open). They have name, description, mimeType. Methods: resources/list, resources/read, resources/templates/list (parameterized resources via RFC 6570 URI template, e.g. tickets://{id}), resources/subscribe and resources/unsubscribe. The server emits notifications/resources/updated with the URI when a subscribed resource changes.

Prompts: parameterized templates that the client injects into the LLM. It has name, description, arguments (with type and required). Methods: prompts/list, prompts/get. prompts/get returns messages (an array of {role, content}) already interpolated, ready to send to the model.

8. Client primitives: sampling, roots, elicitation

The symmetry many people do not notice: the protocol is bidirectional. The server can ask things of the client.

Sampling: the server invokes the client LLM via sampling/createMessage. Useful when the tool needs LLM reasoning in the middle of execution (summarize, classify, choose) without the server having its own LLM. The client intercepts, shows it to the user (consent mandatory), and returns the response. Params include messages, modelPreferences (cost/speed/quality hints), systemPrompt, includeContext (none/thisServer/allServers).

Roots: the client exposes to the server which directories/URIs it has permission to operate on. The server calls roots/list and discovers the scope. A change emits notifications/roots/list_changed. Without roots, the server does not know which file it can touch.

Elicitation (new in 2025-06-18): the server asks the user for additional structured input during execution. elicitation/create with message and requestedSchema (JSON Schema). The client renders a form and returns the validated response. It replaces the anti-pattern of a tool with 30 optional parameters.

9. Utilities: ping, progress, cancellation, logging, completion

Ping: ping with no params, empty response. Either side can call it to check liveness.

Progress: for long requests, the caller includes _meta.progressToken. The other side emits notifications/progress with {progressToken, progress, total} throughout execution.

Cancellation: notifications/cancelled with {requestId, reason}. Best-effort, the server should stop as soon as possible but is not required to.

Logging: the client sets the level with logging/setLevel (debug, info, notice, warning, error, critical, alert, emergency). The server emits notifications/message with {level, logger, data}.

Completion: autocomplete for a prompt argument or a resource template URI. completion/complete with ref (type and identifier) and argument (name and partial value). The server returns up to 100 suggestions.

10. Authentication: OAuth 2.1 over HTTP

stdio does not authenticate (it inherits the process permission). HTTP needs it. The 2025-06-18 spec standardizes OAuth 2.1 with mandatory PKCE. The MCP server acts as a Resource Server. Discovery via /.well-known/oauth-protected-resource points to the Authorization Server. The client does the Authorization Code Flow with PKCE, receives an access token, and sends it in Authorization: Bearer on every call.

Dynamic Client Registration (RFC 7591) is recommended so as not to require manual setup. Tokens must have their audience checked (RFC 8707) to prevent a passthrough attack. Refresh tokens are optional but recommended for a long session.

11. Streamable HTTP in detail

The single endpoint accepts both methods.

POST: the body is JSON-RPC (single or batch). The server responds with application/json (a single response, synchronous mode) OR upgrades to text/event-stream and streams. In the second mode, each SSE event has data: containing JSON-RPC. The stream ends when the server closes it. Useful for a long tool that emits progress along the way.

GET: opens an SSE stream for server-initiated messages (notifications, sampling). The client keeps it open. The server can optionally require Last-Event-ID to resume reconnection without losing an event.

DELETE: ends the session (sends Mcp-Session-Id in the header).

The Mcp-Session-Id header comes back in the initialize response and must be sent on every subsequent request. Sessions are isolated: two clients on the same server have separate sessions. The server can invalidate a session at any moment by returning 404, and the client must reinitialize.

12. Security: three traps that kill

Confused deputy: the server runs with privileged credentials, the LLM convinces it to execute an action the user never asked for. Defense: the server confirms intent via elicitation/create before a destructive action. The host marks tools destructiveHint: true and asks for human confirmation.

Token passthrough: the client sends the user token to the server, the server reuses it in an upstream call. Defense: validate the aud claim. A token issued for MCP server X cannot be accepted by API Y.

Indirect prompt injection: the server exposes a resource (e.g. an email, a GitHub issue) whose content has a malicious instruction for the LLM. The LLM reads it, follows it. Defense: sanitize/mark the boundary of external content, the host must isolate tool calls triggered from read content versus a direct instruction from the user.

13. Versioning and evolution

The protocol version is a string with a date (YYYY-MM-DD). Negotiated in initialize: the client sends a preference, the server picks among those supported. Breaking changes generate a new version. 2025-06-18 brought elicitation, structured tool output, mandatory OAuth 2.1, and the removal of JSON-RPC batch (which existed in 2025-03-26). 2025-11-25 is in draft.

14. The complete flow of a tool call, in order

The host initializes the client, opens the transport (spawn process or HTTP connection).
The client sends initialize, the server responds with capabilities.
The client sends notifications/initialized.
The client sends tools/list. The server returns the catalog.
The host injects the catalog into the LLM context (the host proprietary format, but generally it becomes a function definition for the model).
The LLM decides to call a tool, emits a tool_call.
The host validates permission, optionally asks the user for confirmation.
The client sends tools/call with name and arguments.
The server executes. If it takes a while, it emits notifications/progress. If it needs extra input, it calls a reverse elicitation/create.
The server responds with content and isError.
The host injects the content back into the LLM context as an observation.
The LLM continues the loop (another tool call, or a final response).

That is the entire protocol. All additional complexity (multiplexing, retry, rate limit, cache) lives in the host or in the server, outside the spec. MCP itself is deliberately thin. That is exactly what gives it a chance to become a market standard: small enough to implement in an afternoon, structured enough to support everything a serious agent needs to do.

If you are building a server for production, the minimum checklist is: implement the lifecycle correctly (initialize + initialized before anything else), respect capability negotiation (do not emit a notification the client did not ask for), return a structured error instead of an exception, mark tools with the appropriate annotations, support pagination on large lists, and never trust LLM input without schema validation. The rest is domain detail.