Ollama
Run open LLMs behind a simple API - llama, mistral, qwen. The local-model server (GPU recommended).
One-click deploy, from $25/mo on a Miget plan.
Ollama is the simplest way to run open large language models behind an HTTP API: pull a model (Llama, Mistral, Qwen, Gemma, Phi, and many more) and serve it with OpenAI-compatible endpoints, on infrastructure you own. OLLAMA_HOST puts the API on port 5000; there are no external dependencies.
Read the caveat first, because it is the whole story: real inference wants a GPU. A PaaS typically does not pass one through, so on CPU you are limited to small models (~1-3B parameters) and responses are slow. Ollama still runs, and small models are genuinely useful for embeddings, classification, and lightweight chat - just size expectations to CPU. This template is marked experimental for exactly that reason.
It is best used as a building block: keep it private and front it with this catalogue’s litellm for keys, rate limits, and an OpenAI-compatible gateway, or add open-webui for a chat UI. Models are large, so the model volume is the thing to size.
Upstream project: Ollama
#what you get
- Pull and serve open LLMs behind one HTTP API
- OpenAI-compatible endpoints; no external dependencies
- Pairs with litellm (gateway) and open-webui (chat UI)
- OLLAMA_HOST=0.0.0.0:5000 - clean port fit, no wrapper
- Private by default - no auth, so do not expose it raw
- MIT-licensed; GPU strongly recommended
#topology
| Service | Role | Public |
|---|---|---|
| ollama | model server / API (private, :5000) | no (front with litellm) |
| models volume | downloaded model weights (large) | no |
#miget sizing
// this stack needs
4 GiB RAM · 50 GB disk · 1 service
RAM tracks the model: a 3B needs a few GiB, 7-8B more. The model volume is large (~2 GB for 3B, ~5 GB for 7-8B, ~40 GB for 70B) - size it or models re-download on restart. Without a GPU, stick to small models.
Hobby - recommended fit
$25/mo
2 vCPU · 4 GiB · 80 GiB disk
Headroom for your own apps: 8 GiB at $49/mo
Professional - production
$85/mo
4 vCPU · 8 GiB · 50 GiB disk
Dedicated resources, production SLOs - plan details
One Miget plan is a fixed pool of compute - the whole stack (managed databases included) deploys inside it, and anything left over runs your other apps. No per-service or per-seat math.
#vs. the managed service
What the hosted equivalents charge, against the flat Miget plan this stack fits on. Prices as of June 2026, sources linked.
| Service | Plan | Monthly | What you get |
|---|---|---|---|
| Ollama on Miget ★ | 4 GiB plan | $25 | this whole stack, flat - no usage meters, and room left for your own apps |
| OpenAI API | usage-based | usage-based | per-token, per call - scales with usage |
Ollama runs open models with no per-token bill; the trade is you provide the compute (a GPU is strongly recommended).
#vs. other PaaS
Estimated monthly cost of running this exact stack (4 GiB RAM, 50 GB disk, 1 container) elsewhere, from published June 2026 rates.
| Platform | Est. monthly | Notes |
|---|---|---|
| Miget ★ | $25 flat | compose stacks first-class: one deploy, dedicated vCPU, managed Postgres/Valkey, volumes and TLS all included in the plan |
| Heroku | ~$200 | no volumes; nothing between 1 GB ($50) and 2.5 GB ($250) dynos - 2 GB containers cost far more than shown |
| Render | ~$63 | per-service instances (0.5 GB $7, 2 GB $25) - every container is its own paid service |
| DO App Platform | ~$53 | no persistent volumes - stateful containers need managed DBs/Spaces (base $5 Spaces included here) |
| Railway | ~$48 | usage-based ($10/GB RAM-mo); vCPU billed separately at $20/vCPU-mo on top |
| Fly.io | ~$31 | cheapest sticker price - but burstable shared CPUs (1/16 core; dedicated vCPUs cost ~2-3×), no compose deploys (one app per container, manual wiring), managed DBs billed extra |
Estimates assume RAM fully allocated at published on-demand rates - and sticker price isn't the whole comparison: the cheaper rows buy burstable shared CPUs, per-service wiring instead of a compose deploy, and managed databases billed separately. Heroku and DO App Platform have no persistent volumes at all - stateful stacks like this one need workarounds there.
#deploy it
On Miget
- Create a Compose Stack in app.miget.com pointing at the templates repository
- Set the stack path to
ollama -
Set the required variable:
(none), OLLAMA_HOST is preset to 0.0.0.0:5000 by the template
- Deploy. Miget layers
compose.miget.yaml(RAM, privacy, volumes, managed services) automatically
Locally first?
Every template is portable, vanilla Docker Compose - the Miget overrides are ignored locally:
git clone https://github.com/deployable-sh/stacks
cd miget-compose-templates/ollama
docker compose up -d Same files, same behavior. The template README covers connection strings and scaling notes.
#faq
Is it usable without a GPU?
For small models, yes - 1-3B models run on CPU and are useful for embeddings, classification, and light chat, just slowly. For 7B+ at interactive speed you really want a GPU, which a PaaS generally cannot provide. That is why this template is experimental.
Should I expose it publicly?
No. Ollama has no authentication, so this template keeps it private. Reach it from your other apps over the internal network, or front it with litellm (in this catalogue) to add API keys, rate limits, and an OpenAI-compatible gateway.
How big is the model storage?
Models are large and live on a volume: roughly 2 GB for a 3B model, 5 GB for 7-8B, and 40 GB for 70B. Size the volume to what you pull, or models re-download after a restart.
Ship Ollama today
One compose stack, 4 GiB of RAM, from $25/month flat, and it runs on your laptop with the same files.