LLM Infrastructure 1 service 1 GiB RAM 2 GB disk

TEI Embeddings

Name: TEI Embeddings
Price: 7.00 USD

A self-hosted embeddings API on CPU - bge-small in 512 MB, OpenAI-compatible, zero per-token bills.

One-click deploy, from $7/mo on a Miget plan.

Embeddings are the quiet recurring cost of RAG: every document chunk and every query goes through the API meter, forever. Small open models ended that trade - bge-small at 384 dimensions handles real retrieval workloads, and Hugging Face’s TEI serves it from CPU in about half a gigabyte of RAM, no GPU anywhere.

TEI is a Rust server with an OpenAI-compatible /v1/embeddings endpoint, so existing clients just change the base URL. It is internal-only by design (no built-in auth): apps in your project call http://tei:5000 over the private network.

This completes the catalogue’s self-contained RAG loop: embed with tei, store in qdrant or chromadb, generate via the litellm gateway, trace in langfuse or phoenix - with the embedding leg now costing exactly $7/month flat.

Upstream project: Text Embeddings Inference (Hugging Face)

#what you get

OpenAI-compatible /v1/embeddings plus native /embed and /rerank
bge-small default: 384-dim, tens of ms per text on CPU
Rust + candle: ~512 MB RSS, no Python, no GPU
Model cached on a volume - fast restarts
Swap MODEL_ID for other embedders or rerankers
Apache-2.0, internal-only posture by default

#topology

Service	Role	Public
tei	embeddings API (:5000)	no (by design - no built-in auth)

#miget sizing

// this stack needs

1 GiB RAM · 2 GB disk · 1 service

bge-small fits comfortably in 1 GiB; larger embedders (bge-base/large) want the next plans. One model per instance - run a second instance for a reranker.

Hobby - recommended fit

$7/mo

1 vCPU · 1 GiB · 25 GiB disk

Headroom for your own apps: 2 GiB at $13/mo

Professional - production

$22/mo

1 vCPU · 2 GiB · 10 GiB disk

Dedicated resources, production SLOs - plan details

One Miget plan is a fixed pool of compute - the whole stack (managed databases included) deploys inside it, and anything left over runs your other apps. No per-service or per-seat math.

#vs. other PaaS

Estimated monthly cost of running this exact stack (1 GiB RAM, 2 GB disk, 1 container) elsewhere, from published June 2026 rates.

Platform	Est. monthly	Notes
Miget ★	$7 flat	compose stacks first-class: one deploy, dedicated vCPU, managed Postgres/Valkey, volumes and TLS all included in the plan
Heroku	~$50	no volumes; nothing between 1 GB ($50) and 2.5 GB ($250) dynos - 2 GB containers cost far more than shown
DO App Platform	~$17	no persistent volumes - stateful containers need managed DBs/Spaces (base $5 Spaces included here)
Render	~$13	per-service instances (0.5 GB $7, 2 GB $25) - every container is its own paid service
Railway	~$10	usage-based ($10/GB RAM-mo); vCPU billed separately at $20/vCPU-mo on top
Fly.io	~$6	cheapest sticker price - but burstable shared CPUs (1/16 core; dedicated vCPUs cost ~2-3×), no compose deploys (one app per container, manual wiring), managed DBs billed extra

Estimates assume RAM fully allocated at published on-demand rates - and sticker price isn't the whole comparison: the cheaper rows buy burstable shared CPUs, per-service wiring instead of a compose deploy, and managed databases billed separately. Heroku and DO App Platform have no persistent volumes at all - stateful stacks like this one need workarounds there.

#deploy it

On Miget

Create a Compose Stack in app.miget.com pointing at the templates repository
Set the stack path to tei
No required variables - deploy as-is
Deploy. Miget layers compose.miget.yaml (RAM, privacy, volumes, managed services) automatically

Locally first?

Every template is portable, vanilla Docker Compose - the Miget overrides are ignored locally:

git clone https://github.com/deployable-sh/stacks
cd miget-compose-templates/tei
docker compose up -d

Same files, same behavior. The template README covers connection strings and scaling notes.

#faq

How good is bge-small compared to API embeddings?

For typical RAG retrieval it is competitive with commercial small embeddings - strong MTEB scores at 384 dimensions, which also halves your vector storage versus 768+ dim models. Test with your corpus; swapping MODEL_ID is one variable.

What does this actually save?

API embeddings meter every chunk at index time and every query forever, and re-indexing a corpus repeats the whole bill. Self-hosted, the marginal cost of embedding is zero - re-index freely, embed logs, embed everything.

Why is it internal-only?

TEI ships no authentication, so it must not face the internet. In-project apps reach tei:5000 over the private network - the same posture as the qdrant and chromadb templates it pairs with.

Can it rerank too?

Yes - TEI serves reranker models on /rerank. Run a second instance with a reranker MODEL_ID (e.g. bge-reranker variants) and call it after retrieval for a meaningful quality lift.

Ship TEI Embeddings today

One compose stack, 1 GiB of RAM, from $7/month flat, and it runs on your laptop with the same files.

Deploy on Miget Browse the catalogue