llllm - Local-First Chat Completion API

Why llllm?

🏠

Local-First Design

When local models are available, requests execute locally. Otherwise, gracefully fall back to cloud APIs.

🔒

Complete Privacy

Your data never leaves your device when running locally. Process sensitive information without external API calls.

⚡

Zero Latency

Instant responses with on-device inference. No network delays for real-time applications and interactive experiences.

📡

Real-time Streaming

OpenAI-compatible streaming with chat.completion.chunk format. Perfect for responsive chat interfaces.

💾

No Downloads Required

Uses models built into macOS. No need to download multi-gigabyte model files or manage local storage.

🔄

OpenAI Compatible

Drop-in replacement for OpenAI's API. Use the same request/response formats you already know and love.

🧪 Live Demo

Experience llllm in action. Test the OpenAI-compatible API and see real-time responses.

OpenAI-Compatible. Local-First. Ready to Use.

if (await window.localLLM?.available()) {
  console.log('llllm ready! 🚀');
} else {
  console.log('Local LLM unavailable, using cloud');
}

const result = await window.localLLM.chat.completions.create({
  messages: [
    { role: "system", content: "You are a terse assistant." },
    { role: "user", content: "Hello local world" }
  ]
});

console.log(result.choices[0].message.content);

const stream = await window.localLLM.chat.completions.create({
  messages: [
    { role: "user", content: "Write a haiku about AI" }
  ],
  stream: true
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    console.log(delta);
  }
}

const result = await window.localLLM.chat.completions.create({
  messages: [
    { role: "user", content: "Explain quantum computing" }
  ],
  temperature: 0.7,
  max_tokens: 500,
  stream: false
});

localllm