Local-First Chat Completion API
OpenAI-compatible interface for on-device language models with zero latency, complete privacy, and native performance.
Requires macOS 26 Tahoe, iOS 26, and Apple Intelligence
When local models are available, requests execute locally. Otherwise, gracefully fall back to cloud APIs.
Your data never leaves your device when running locally. Process sensitive information without external API calls.
Instant responses with on-device inference. No network delays for real-time applications and interactive experiences.
OpenAI-compatible streaming with chat.completion.chunk format. Perfect for responsive chat interfaces.
Uses models built into macOS. No need to download multi-gigabyte model files or manage local storage.
Drop-in replacement for OpenAI's API. Use the same request/response formats you already know and love.
Experience llllm in action. Test the OpenAI-compatible API and see real-time responses.
if (await window.localLLM?.available()) {
console.log('llllm ready! ๐');
} else {
console.log('Local LLM unavailable, using cloud');
}
const result = await window.localLLM.chat.completions.create({
messages: [
{ role: "system", content: "You are a terse assistant." },
{ role: "user", content: "Hello local world" }
]
});
console.log(result.choices[0].message.content);
const stream = await window.localLLM.chat.completions.create({
messages: [
{ role: "user", content: "Write a haiku about AI" }
],
stream: true
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
console.log(delta);
}
}
const result = await window.localLLM.chat.completions.create({
messages: [
{ role: "user", content: "Explain quantum computing" }
],
temperature: 0.7,
max_tokens: 500,
stream: false
});
Be among the first to test llllm on iOS - Join our exclusive TestFlight beta program