localllm

Local-First Chat Completion API

OpenAI-compatible interface for on-device language models with zero latency, complete privacy, and native performance.

๏ฃฟ Download for macOS โœˆ Join iOS Beta

Requires macOS 26 Tahoe, iOS 26, and Apple Intelligence

Why llllm?

๐Ÿ 

Local-First Design

When local models are available, requests execute locally. Otherwise, gracefully fall back to cloud APIs.

๐Ÿ”’

Complete Privacy

Your data never leaves your device when running locally. Process sensitive information without external API calls.

โšก

Zero Latency

Instant responses with on-device inference. No network delays for real-time applications and interactive experiences.

๐Ÿ“ก

Real-time Streaming

OpenAI-compatible streaming with chat.completion.chunk format. Perfect for responsive chat interfaces.

๐Ÿ’พ

No Downloads Required

Uses models built into macOS. No need to download multi-gigabyte model files or manage local storage.

๐Ÿ”„

OpenAI Compatible

Drop-in replacement for OpenAI's API. Use the same request/response formats you already know and love.

๐Ÿงช Live Demo

Experience llllm in action. Test the OpenAI-compatible API and see real-time responses.

OpenAI-Compatible. Local-First. Ready to Use.

if (await window.localLLM?.available()) {
  console.log('llllm ready! ๐Ÿš€');
} else {
  console.log('Local LLM unavailable, using cloud');
}
const result = await window.localLLM.chat.completions.create({
  messages: [
    { role: "system", content: "You are a terse assistant." },
    { role: "user", content: "Hello local world" }
  ]
});

console.log(result.choices[0].message.content);
const stream = await window.localLLM.chat.completions.create({
  messages: [
    { role: "user", content: "Write a haiku about AI" }
  ],
  stream: true
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    console.log(delta);
  }
}
const result = await window.localLLM.chat.completions.create({
  messages: [
    { role: "user", content: "Explain quantum computing" }
  ],
  temperature: 0.7,
  max_tokens: 500,
  stream: false
});