Enterprise AI Orchestration Platform
The Problem
A platform team needed a way to host, route, and stream multiple LLM workloads behind a single Kubernetes ingress — without each application reimplementing model selection, streaming, and tool integration on its own. They needed an internal orchestration layer that other teams could build on.
The Solution
A Go-based microservices stack deployed on Kubernetes via Helm. A model registry routes inference requests across Ollama-hosted models. An MCP server registry exposes tool integrations to any client that speaks the Model Context Protocol. A streaming chat service handles long-lived connections through the nginx ingress so browser clients can consume token-by-token responses without buffering. A web frontend ties the pieces together for internal users. Containerized end-to-end, port-forwarded for local dev, troubleshooting playbooks documented.
Tech Stack
Results
Have a project like this?
Tell me about your project →