Melis unifies OpenAI, Anthropic, Google Vertex, OCI GenAI and Ollama behind a single OpenAI-compatible contract. Stateless, sub-2ms overhead, under 32Mi RSS — built for production LLMOps.

docker run -d \
--name melis-gateway \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml:ro \
-v $(pwd)/routes.yaml:/app/routes.yaml:ro \
-e MELIS_SERVER_PORT=9090 \
melis-gateway:latestOne contract — every major provider
Features
Move load balancing, circuit breaking, token compression and rate limiting out of your application code and into a high-performance infrastructure layer.
Exposes POST /v1/chat/completions. Melis transpiles payloads on the fly to each upstream provider's schema.
Non-blocking async Rust core. Internal processing under 2ms with a memory footprint below 32Mi RSS.
Native support for openai, anthropic, google_vertex_ai, oci_genai and ollama with configurable traffic weights.
Tokenizes inputs locally and trims repetitive metadata before sending to the cloud — protect your token budget.
Distributed token-bucket rate limiting and circuit breaking with exponential backoff, orchestrated via Redis.
Hot-reload routes.yaml, Prometheus /metrics, OpenTelemetry tracing and Kubernetes-compliant probes.
Architecture
Melis instances scale horizontally inside Kubernetes with no shared state. Volatile cluster state, blocklists and token-bucket counters live in an external high-speed Redis layer.
[ App Python (FastAPI) ] ──┐
├──► [ Melis AI Gateway Pod ] ──► [ OpenAI / Claude / Gemini ]
[ App Java (Quarkus) ] ──┘ │
▼
[ Redis ] ◄──┴──► [ Prometheus / OTel ]Declarative routing
Move from a costly OpenAI setup to a local Ollama Llama 3 model by editing a single YAML file. Melis intercepts, translates and streams responses natively.
routes:
- path: "/v1/chat/completions"
method: "POST"
provider: "ollama" # Swapped from "openai" instantly
model: "llama3.2" # Overrides the payload target model
token_optimization:
strategy: "adaptive_trimming"
compress_above_tokens: 4096from openai import OpenAI
client = OpenAI(base_url="http://melis:9090/v1", api_key="sk-anything")
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from Melis!"}],
)
print(resp.choices[0].message.content)Deploy
Run as a standalone Docker container, or deploy natively to Kubernetes with ConfigMaps, Secrets and Horizontal Pod Autoscaling.
docker run -d \
--name melis-gateway \
-p 9090:9090 \
-v $(pwd)/config.yaml:/app/config.yaml:ro \
-v $(pwd)/routes.yaml:/app/routes.yaml:ro \
-e MELIS_SERVER_PORT=9090 \
melis-gateway:latestkubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yamlObservability
Scrape /metrics from Prometheus, ship traces with OpenTelemetry and build Grafana dashboards your SRE team will actually trust.
# HELP melis_request_duration_seconds Gateway overhead per request
# TYPE melis_request_duration_seconds histogram
melis_request_duration_seconds_bucket{provider="openai",le="0.002"} 18421
melis_tokens_total{provider="anthropic",direction="input"} 1284912
melis_tokens_total{provider="anthropic",direction="output"} 421038
melis_circuit_breaker_state{provider="google_vertex_ai"} 0
melis_ratelimit_drops_total{client="tenant-a"} 12Melis is — and will always be — 100% open source. No paid tier, no vendor lock-in, no "open core" surprises. Fork it, deploy it, contribute back.
Drop Melis in front of any OpenAI SDK and gain routing, resiliency, observability and cost control — without rewriting a line of application code.