Infrastructure · AIDeployed · Internal platform

Enterprise AI Orchestration Platform

The Problem

A platform team needed a way to host, route, and stream multiple LLM workloads behind a single Kubernetes ingress — without each application reimplementing model selection, streaming, and tool integration on its own. They needed an internal orchestration layer that other teams could build on.

The Solution

A Go-based microservices stack deployed on Kubernetes via Helm. A model registry routes inference requests across Ollama-hosted models. An MCP server registry exposes tool integrations to any client that speaks the Model Context Protocol. A streaming chat service handles long-lived connections through the nginx ingress so browser clients can consume token-by-token responses without buffering. A web frontend ties the pieces together for internal users. Containerized end-to-end, port-forwarded for local dev, troubleshooting playbooks documented.

Tech Stack

GoKubernetesHelmOllamaMCPnginx ingressStreaming APIsDocker

Results

Multi-service Go backend running on Kubernetes
Helm charts for repeatable deployments
Streaming chat through nginx ingress with token-level delivery
Model Context Protocol registry for tool integrations
Ollama-backed model routing with hot-swappable models
Internal-platform footprint serving multiple downstream apps

Have a project like this?

Tell me about your project →