Here is the release note from Ollama that made this possible: https://ollama.com/blog/claude
Technically, what I do is pretty straightforward:
- Detect which local models are available in Ollama.
- When internet access is unavailable, the client automatically switches to Ollama-backed local models instead of remote ones.
- From the user’s perspective, it is the same Claude Code flow, just backed by local inference.
In practice, the best-performing model so far has been qwen3-coder:30b. I also tested glm-4.7-flash, which was released very recently, but it struggles with reliably following tool-calling instructions, so it is not usable for this workflow yet.
https://github.com/pchalasani/claude-code-tools/blob/main/do...
One tricky thing that took me a whole day to figure out is that using Claude Code in this setup was causing total network failures due to telemetry pings, so I had to set this env var to 1: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC
in particular i´d like to call claude-models - in openai-schema hosted by a reseller - with some proxy that offers anthropic format to my claude --- but it seems like nothing gets to fully line things up (double-translated tool names for example)
reseller is abacus.ai - tried BerriAI/litellm, musistudio/claude-code-router, ziozzang/claude2openai-proxy, 1rgs/claude-code-proxy, fuergaosi233/claude-code-proxy,
The invocation would be like this
llsed --host 0.0.0.0 --port 8080 --map_file claude_to_openai.json --server https://openrouter.ai/api
Where the json has something like { tag: ... from: ..., to: ..., params: ..., pre: ..., post: ...}
So if one call is two, you can call multiple in the pre or post or rearrange things accordingly.This sounds like the proper separation of concerns here... probably
The pre/post should probably be json-rpc that get lazy loaded.
Writing that now. Let's do this: https://github.com/day50-dev/llsed
This will be a bit challenging I'm sure but I agree, litellm and friends do too many things and take too long to get simple asks from
I've been pitching this suite I'm building as "GNU coreutils for the LLM era"
It's not sticking and nobody is hyped by it.
I don't know if I should keep going or if this is my same old pattern cropping up again of things I really really like but just kinda me
The value comprehension market is small
So I'll need to surface it better or just do something else
But I'm surprised litellm (and its wrappers) don't work for you and I wonder if there's something wrong with your provider or model. Which model were you using?
But with Qwen3-30B-A3B I get 20 tps in CC.