LLM tool calling is getting easier, but MCP mostly standardizes the message format, not the messy parts that break real projects: auth, file transport, proxies, and permissions. That’s why a Home Assistant setup can expose
/api/mcptoday yet still need a gateway, and why one Reddit user spent 3.5 months, 1,300 hours, and nearly $700 before giving up.
The demo is always the same.
A model sees a button-shaped problem, picks the right action, calls the right function, and suddenly your Shopify cart updates, your lights dim, or your grocery order goes through. You watch GPT-5 or Claude do one clean tool call and think: okay, this is it. We finally escaped bespoke integrations.
Then you try to wire it into a real stack.
OpenClaw talks to Home Assistant. Home Assistant talks to an MCP server. Your browser mcp server needs a login. Your hosted dev environment needs a token. Your agent has to hand off a file to another service. And now you’re not building “AI features” anymore. You’re doing agent ops. You’re maintaining glue.
That’s the part nobody puts in the launch video.
MCP fixed the protocol layer, so why does everything still feel fragile?
I’m not anti-MCP. I actually think the core idea is right.
The official Model Context Protocol spec gives us a common language for tools. It uses JSON-RPC 2.0 and now defines two transports: stdio and Streamable HTTP. That matters. It really does remove a bunch of one-off integration work that wasted everybody’s time in 2024.
But here’s the catch: MCP standardizes the protocol, not the operational glue.
That sounds abstract until you hit the first real deployment problem.
Stdio is simple until it isn’t
If you run MCP over stdio, you’re usually spawning a local subprocess and shuttling messages over stdin/stdout. The spec explicitly says stdio implementations should not use the HTTP authorization flow. They’re supposed to get credentials from the environment instead.
Which means you still need to solve:
- how secrets get onto the machine
- who is allowed to read those secrets
- how the subprocess starts, restarts, and dies
- what happens when one developer’s laptop works and the CI box doesn’t
- whether you actually trust the local machine boundary
That is not a protocol problem. That is an operations problem.
Remote MCP is cleaner, but it moves the pain
Streamable HTTP looks more grown-up. The MCP spec supports HTTP transports, and authorization is defined there in an OAuth-style direction. OpenAI even made this feel wonderfully easy when it added remote MCP server support to the Responses API on May 21, 2025.
The example is so clean it’s almost rude:
response = client.responses.create(
model="gpt-4.1",
tools=[{
"type": "mcp",
"server_label": "shopify",
"server_url": "https://pitchskin.com/api/mcp",
}],
input="Add the Blemish Toner Pads to my cart"
)
That’s a great API. Seriously.
But the moment you move past the snippet, you inherit a new pile of questions: is the endpoint reachable, is auth configured correctly, is origin validation tight enough, did the tool schema change, what happens when the remote server is down, and who rotates the tokens?
The protocol got simpler. The glue just changed shape.
Home Assistant is where the demo meets the wall
Home Assistant is the best example I’ve seen because it’s both exciting and brutally honest.
As of its recent docs, Home Assistant can act as both an MCP server and an MCP client. As a server, it can expose your home context and actions so Claude Desktop or another MCP client can interact with lights, switches, shopping lists, and automations. As a client, Home Assistant can connect to external MCP servers and bring in things like memory or web search.
That sounds incredible. It is incredible.
It is also exactly where the hidden tax shows up.
The standard still needs bridges
Home Assistant’s MCP Server integration exposes an endpoint at /api/mcp using Streamable HTTP. Nice. Except the docs also warn that many MCP clients still only support stdio, so you may need a gateway such as mcp-proxy.
And on the other side, Home Assistant’s MCP client docs say that if a server only supports stdio, you may need a proxy to expose it over SSE or HTTP.
Read that again.
The “standard” often still requires a protocol bridge in one direction or the other.
That’s not a knock on Home Assistant. It’s actually a credit to the docs for saying the quiet part out loud. The ecosystem is moving fast, but support is fragmented, and fragmentation is where brittle glue code breeds.
Home Assistant has scale here too. Its 2025.5 release notes reported 2,000,000 active installations worldwide. So this is not some toy edge case from a niche Raspberry Pi project. This is mainstream home automation colliding with real LLM plumbing.
“I gave the agent my token” is where the horror movie starts
While researching this, I came across a thread on r/openclaw that perfectly captured the gap between the demo and reality.
One user wrote: “I got my open claw agent responding to me and running but when I gave the agent my mcp server long lived access token ... I’m just struggling to get the agent to be able to turn my lights on and off or to write basic automations for me”.
That’s the whole story in one sentence.
The model is not the hard part anymore. GPT-5, Claude, Qwen, Llama — they can all decide “turn off the kitchen lights” is an action. The hard part is whether the action is exposed, authorized, shaped correctly, reachable, and safe.
Permissions are still weirdly blunt
Home Assistant’s auth docs are a good reminder that “just give it a token” is not a serious security model. The docs say non-owner accounts currently have the same access as the owner account, with more restrictions planned later. The MCP Server docs add entity exposure controls, which helps, but the underlying point remains: action-taking tools are risky when authorization is coarse.
That matters a lot more when the agent can do things in the real world.
Turn on a light? Fine.
Open a garage door, place a grocery order, disable an alarm, or trigger an automation that talks to Stripe, Twilio, or Shopify? Different game.
Home Assistant also notes that unused refresh tokens are automatically removed after 90 days if they haven’t been used for login, and recommends long-lived access tokens for permanent script access. That’s practical advice. It’s also a reminder that now you’re managing token lifecycles, not just prompts.
And then the file handoff problem shows up
Just when you think you’ve solved actions, artifacts appear.
A generated CSV has to move from one agent to another. A screenshot from a browser mcp server has to get attached to a bug report. A Claude Code session in a hosted environment has to pass a file to n8n, Discord, or GitHub without someone saying “uh, just paste it somewhere.”
I found another r/openclaw discussion where one user summed it up perfectly: “The options I kept running into: S3 presigned URLs — works but 15 minutes of setup for every new project ... ‘Just commit it to git’ — please no”.
Exactly.
This is spreadsheet hell all over again. Not because MCP is bad, but because every team keeps rebuilding the same invisible columns:
- where files live
- how they expire
- who can fetch them
- what format they’re in
- how the next agent knows they exist
If you’ve ever manually installed an n8n community node inside a Docker container, you already know the feeling:
docker exec -it n8n sh
mkdir ~/.n8n/nodes
cd ~/.n8n/nodes
npm i n8n-nodes-nodeName
Nothing here is impossible. It’s just one more tiny maintenance decision that becomes permanent.
Which setup is actually easiest to live with?
Here’s my blunt take: the best architecture is usually the one with the fewest moving connectors, not the one with the fanciest protocol diagram.
| Option | What you’re really signing up for |
|---|---|
| MCP stdio | Local subprocess over stdin/stdout; credentials from environment instead of HTTP auth; you own process management, local secrets, and machine-specific setup |
| MCP Streamable HTTP / remote MCP | HTTP GET/POST with optional SSE streaming; OAuth-style authorization supported by spec for HTTP transports; you own origin validation, token handling, auth discovery, and remote uptime |
| Direct platform-native tools | Vendor-managed API surface; auth usually centralized in one provider account; less connector glue, but more lock-in and fewer cross-platform guarantees |
This is why I think a lot of teams are making the same mistake.
They keep adding more tools when they should be reducing the surface area.
Cloudflare’s MCP guidance is refreshingly sane on this. It recommends remote MCP over Streamable HTTP plus OAuth, but it also says the quietly important part: use scoped permissions, expose fewer well-designed tools, and run evals after updates.
That is infrastructure-vendor speak for: stop shipping a junk drawer.
What happens when it “works fine for 3 months”?
This is my favorite and least favorite category of failure.
Not the dramatic crash. The calm success that hides a landmine.
In another r/openclaw post, someone said their grocery agent had run great for months and then ordered 2 kg of garlic instead of 2 heads.
That story is funny right up until the thing ordering garlic is ordering medication, replacement parts, or something expensive from Shopify.
Long periods of apparent reliability can hide brittle assumptions:
- units are underspecified
- schemas drift
- browser sessions expire
- proxies drop headers
- file links expire earlier than expected
- one model interprets a field differently than another
This is also why the giant “it worked in Claude Desktop” victory lap often ages badly when you move the same flow into OpenClaw, n8n, Make, Zapier, or a hosted agent-native environment.
Different clients support different transport layers. Different auth paths exist. Different timeout behavior appears. The connector stack becomes the product whether you wanted it to or not.
So what should teams standardize first?
Not the prompts.
Not even the models.
The glue.
The teams that win with llm tool calling are not the teams with the most MCP servers. They’re the teams that decide, once, how actions, files, auth, and observability work — and then force every new agent to use the same boring path.
My opinionated checklist
If I were setting up an agent stack today for Home Assistant, OpenClaw, Claude Code, n8n, and a couple of remote MCP services, I’d standardize these before adding one more capability:
-
One preferred transport
- Pick remote MCP over Streamable HTTP where possible.
- Use stdio only when locality is the point, not as the accidental default.
-
One auth pattern
- Prefer scoped OAuth-style flows for remote services.
- Treat long-lived access tokens as hazardous materials.
-
One file handoff method
- Signed object storage URLs, fixed TTLs, and explicit metadata beat ad hoc Git commits and mystery temp folders.
-
One tool design rule
- Fewer tools, clearer schemas, stronger descriptions.
- Ten sharp tools beat a hundred vague ones every time.
-
One eval harness
- Re-run tests after every schema tweak, auth change, model swap, or proxy update.
- If the flow touches money, devices, or customer data, test the ugly cases on purpose.
OpenAI said that since releasing the Responses API in March 2025, hundreds of thousands of developers had used it to process trillions of tokens across its models. That tells me the appetite for tool-enabled agents is absolutely real.
But scale at the API layer does not magically remove local fragility.
I also kept thinking about one more Reddit story while writing this: a user who said they spent 3.5 months, 1,300 hours, nearly 5 billion tokens, and about $700 before pausing a fragile setup. Those numbers are extreme, sure. But the emotional arc is familiar to anyone who has tried to make six half-compatible agent components behave like one product.
MCP is not the problem. It’s the beginning of the solution.
The mistake is thinking the standard means you no longer have to own the boring stuff.
You do. You absolutely do.
And the sooner your team treats auth, file transport, permissions, proxies, and evals as first-class product surfaces, the sooner llm tool calling stops feeling like magic and starts feeling reliable.
