llama.app - Official home for llama.cpp

llama.app GitHub

curl -fsSL https://llama.app/install.sh | bash

Prefer Brew or Winget? Package managers · Rather build from source? Follow instructions

AI that lives on your computer.
Open-source, private, always local.

Run frontier AI entirely on your machine. No API keys, no telemetry, no limits. Take AI back.

AI running on your computer

# 1. Serve a model
llama serve

# 2. Run Pi, it will autodiscover it
pi

Pair it with a local coding agent.

Run llama serve, then launch Pi. It auto-discovers your local model. No config, no API keys. Files stay on your machine, requests never leave it.

Optimized for any hardware.

From your laptop to a cluster, llama.cpp runs on whatever you have. Same binary, same models, same hand-tuned kernels for every GPU and CPU.

Apple Silicon

M Ultra

RTX 5090

H100

MI300

RTX 4090

M Max

A100

DGX Spark

T4

Jetson

B200

Intel Arc

CPU

Radeon RX

M Pro

RTX 3090

Run your first model

Qwen3.6-27B

Coding & reasoning. Single-GPU sweet spot.

Qwen3.6-35B-A3B

35B MoE · 3B active

MoE: 35B-class quality at 3B-class speed.

Gemma-4-26B-A4B

26B MoE · 4B active

Google's desktop MoE. Strong reasoning, fast inference.

Gemma-4-E4B

Tiny footprint. Runs on phones and low-end laptops.

gpt-oss-20b

OpenAI's open weights. Frontier reasoning, local.

Step-3.5-Flash

Snappy generalist for everyday chat and writing.

Discover more models on Hugging Face