LLM Infrastructure experimental 1 service 4 GiB RAM 50 GB disk

Ollama

Name: Ollama
Price: 25.00 USD

Run open LLMs behind a simple API - llama, mistral, qwen. The local-model server (GPU recommended).

One-click deploy, from $25/mo on a Miget plan.

Ollama is the simplest way to run open large language models behind an HTTP API: pull a model (Llama, Mistral, Qwen, Gemma, Phi, and many more) and serve it with OpenAI-compatible endpoints, on infrastructure you own. OLLAMA_HOST puts the API on port 5000; there are no external dependencies.

Read the caveat first, because it is the whole story: real inference wants a GPU. A PaaS typically does not pass one through, so on CPU you are limited to small models (~1-3B parameters) and responses are slow. Ollama still runs, and small models are genuinely useful for embeddings, classification, and lightweight chat - just size expectations to CPU. This template is marked experimental for exactly that reason.

It is best used as a building block: keep it private and front it with this catalogue’s litellm for keys, rate limits, and an OpenAI-compatible gateway, or add open-webui for a chat UI. Models are large, so the model volume is the thing to size.

Upstream project: Ollama

#what you get

Pull and serve open LLMs behind one HTTP API
OpenAI-compatible endpoints; no external dependencies
Pairs with litellm (gateway) and open-webui (chat UI)
OLLAMA_HOST=0.0.0.0:5000 - clean port fit, no wrapper
Private by default - no auth, so do not expose it raw
MIT-licensed; GPU strongly recommended

#topology

Service	Role	Public
ollama	model server / API (private, :5000)	no (front with litellm)
models volume	downloaded model weights (large)	no

#miget sizing

// this stack needs

4 GiB RAM · 50 GB disk · 1 service

RAM tracks the model: a 3B needs a few GiB, 7-8B more. The model volume is large (~2 GB for 3B, ~5 GB for 7-8B, ~40 GB for 70B) - size it or models re-download on restart. Without a GPU, stick to small models.

Hobby - recommended fit

$25/mo

2 vCPU · 4 GiB · 80 GiB disk

Headroom for your own apps: 8 GiB at $49/mo

Professional - production

$85/mo

4 vCPU · 8 GiB · 50 GiB disk

Dedicated resources, production SLOs - plan details

One Miget plan is a fixed pool of compute - the whole stack (managed databases included) deploys inside it, and anything left over runs your other apps. No per-service or per-seat math.

#vs. the managed service

What the hosted equivalents charge, against the flat Miget plan this stack fits on. Prices as of June 2026, sources linked.

Service	Plan	Monthly	What you get
Ollama on Miget ★	4 GiB plan	$25	this whole stack, flat - no usage meters, and room left for your own apps
OpenAI API	usage-based	usage-based	per-token, per call - scales with usage

Ollama runs open models with no per-token bill; the trade is you provide the compute (a GPU is strongly recommended).

#vs. other PaaS

Estimated monthly cost of running this exact stack (4 GiB RAM, 50 GB disk, 1 container) elsewhere, from published June 2026 rates.

Platform	Est. monthly	Notes
Miget ★	$25 flat	compose stacks first-class: one deploy, dedicated vCPU, managed Postgres/Valkey, volumes and TLS all included in the plan
Heroku	~$200	no volumes; nothing between 1 GB ($50) and 2.5 GB ($250) dynos - 2 GB containers cost far more than shown
Render	~$63	per-service instances (0.5 GB $7, 2 GB $25) - every container is its own paid service
DO App Platform	~$53	no persistent volumes - stateful containers need managed DBs/Spaces (base $5 Spaces included here)
Railway	~$48	usage-based ($10/GB RAM-mo); vCPU billed separately at $20/vCPU-mo on top
Fly.io	~$31	cheapest sticker price - but burstable shared CPUs (1/16 core; dedicated vCPUs cost ~2-3×), no compose deploys (one app per container, manual wiring), managed DBs billed extra

Estimates assume RAM fully allocated at published on-demand rates - and sticker price isn't the whole comparison: the cheaper rows buy burstable shared CPUs, per-service wiring instead of a compose deploy, and managed databases billed separately. Heroku and DO App Platform have no persistent volumes at all - stateful stacks like this one need workarounds there.

#deploy it

On Miget

Create a Compose Stack in app.miget.com pointing at the templates repository
Set the stack path to ollama
Set the required variable:
- (none), OLLAMA_HOST is preset to 0.0.0.0:5000 by the template
Deploy. Miget layers compose.miget.yaml (RAM, privacy, volumes, managed services) automatically

Locally first?

Every template is portable, vanilla Docker Compose - the Miget overrides are ignored locally:

git clone https://github.com/deployable-sh/stacks
cd miget-compose-templates/ollama
docker compose up -d

Same files, same behavior. The template README covers connection strings and scaling notes.

#faq

Is it usable without a GPU?

For small models, yes - 1-3B models run on CPU and are useful for embeddings, classification, and light chat, just slowly. For 7B+ at interactive speed you really want a GPU, which a PaaS generally cannot provide. That is why this template is experimental.

Should I expose it publicly?

No. Ollama has no authentication, so this template keeps it private. Reach it from your other apps over the internal network, or front it with litellm (in this catalogue) to add API keys, rate limits, and an OpenAI-compatible gateway.

How big is the model storage?

Models are large and live on a volume: roughly 2 GB for a 3B model, 5 GB for 7-8B, and 40 GB for 70B. Size the volume to what you pull, or models re-download after a restart.

Ship Ollama today

One compose stack, 4 GiB of RAM, from $25/month flat, and it runs on your laptop with the same files.

Deploy on Miget Browse the catalogue