Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

Running the RAG API Server

Our RAG API leverages the vector database we built earlier and processes user queries using ChromaDB, LangChain, and OpenAI. It exposes a simple REST API endpoint that takes a query and returns a response enriched with Kubernetes documentation.

Quick Start

To run the server:

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

docker run -e OPENAI_API_KEY="<your-openai-key>" \
  --volume /tmp/testcontainer:/tmp/testcontainer \
  -p 8000:8000 \
  --name rag_server_api rag_server_demo

Test it using curl:

curl -X 'POST' \
  'http://0.0.0.0:8000/completion' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "What is Kubernetes?"
}' | jq

Sample response:

{
  "rag_response": "Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services..."
}

Expanding Knowledge Sources

Our RAG system is flexible, it can easily incorporate additional knowledge sources, such as:

Customer-specific documentation
Code repositories
Logs and monitoring data
Kubernetes audit and event logs

This extensibility makes RAG an ideal solution for advanced Kubernetes troubleshooting beyond generic LLM responses.

Enhancing K8sGPT with RAG

K8sGPT is an AI-powered Kubernetes troubleshooting tool that analyzes clusters and provides insights using Go-based analyzers. These analyzers detect common Kubernetes failure patterns and deliver structured diagnostic data.

By integrating our RAG API, K8sGPT can:

Retrieve structured knowledge from Kubernetes documentation.
Provide richer context when diagnosing issues.
Minimize hallucinations by grounding responses in official documentation.

Setting Up K8sGPT with RAG

First, create a local Kubernetes cluster using Kind:

kind create cluster

Next, deploy a broken pod to test troubleshooting capabilities:

apiVersion: v1
kind: Pod
metadata:
  name: broken-pod
  namespace: default
spec:
  containers:
    - name: broken-pod
      image: nginx:1.a.b.c # Invalid image tag
      livenessProbe:
        httpGet:
          path: /
          port: 81
        initialDelaySeconds: 3
        periodSeconds: 3

Apply the configuration:

kubectl apply -f broken-pod.yml

Clone and build the K8sGPT project with RAG integration:

git clone https://github.com/elieser1101/k8sgpt.git
cd k8sgpt
git checkout CustomRagAIClient
make build

This fork in the branch CustomRagAIClient contains a custom AI provider that allows K8sGPT to communicate with the RAG API. That’s why you’ll need to build the application from source.

Please feel free to explore the code to understand what happens under the hood.

Using K8sGPT’s Analyze and Explain Features

NOTE: Ensure you use the recently built k8sgpt, should be on ./bin/k8sgpt

Run the analyze command:

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

k8sgpt analyze --filter Pod

Sample output:

AI Provider: AI not used; --explain not set

0: Pod default/broken-pod
- Error: Back-off pulling image "nginx:1.a.b.c"

It shows the error from pulling the image. You can also use the explain feature. This command utilizes any LLM provider (OpenAI, Llama, etc.) to explain the problem in natural language, based on the analyze results:

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

K8sGPT supports custom AI backends, allowing us to connect it to our RAG API.

Final Architecture

Here’s the architecture of the complete system:

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

Running K8sGPT with the Custom RAG Backend

Add the custom RAG backend and run an analysis:

k8sgpt auth add --backend customrag
k8sgpt analyze --filter Pod --explain --backend customrag --no-cache

Example output:

0: Pod default/broken-pod
- Error: Back-off pulling image "nginx:1.a.b.c"
- Solution:
  1. Check the image tag for typos.
  2. Use a valid version (e.g., "nginx:1.16.1").
  3. Run `kubectl set image deployment/nginx nginx=<correct_image_name>`.

How It Works

Here’s what happens under the hood:

K8sGPT analyzes the cluster.
As part of explain it sends an HTTP request to the RAG API.
The RAG API retrieves relevant context/documentation from the vector database via LangChain.
OpenAI processes the prompt, combining the analyze results and relevant documentation.
K8sGPT displays accurate, contextualized troubleshooting insights.

Next Steps & Future Improvements

This system is already a powerful Kubernetes troubleshooting tool, but here’s how it can be improved:

Deploy K8sGPT as a Kubernetes Operator to provide continuous monitoring and proactive alerts.
Self-host an open-source LLM to reduce API costs and improve data privacy.
Measure accuracy to minimize hallucinations and validate recommendations.
Enable auto-remediation by integrating K8sGPT with Kubernetes controllers for self-healing clusters.
Adopt the Model Context Protocol to standardize LLM context-sharing across tools.
Package the solution as a SaaS product for small business with limited access to DevOps and SRE teams.

Conclusion

We’ve built a fully functional RAG system that enhances Kubernetes troubleshooting by combining:

✅ Kubernetes documentation embeddings
✅ A REST API powered by LangChain
✅ K8sGPT’s diagnostic capabilities

This combination makes Kubernetes issue resolution faster, more accurate, and grounded in official knowledge. With further development, this can evolve into a production-ready tool for SREs and platform engineers.

Explore our Cloud, SRE, DevOps & Cybersecurity solutions

With our Cloud, SRE, and DevOps Studio embrace cloud-native solutions for accelerated development, combined with reliable, secure, and scalable environments.

Explore more

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

Running the RAG API Server

Quick Start

Expanding Knowledge Sources

Enhancing K8sGPT with RAG

Setting Up K8sGPT with RAG

Using K8sGPT’s Analyze and Explain Features

Final Architecture

Running K8sGPT with the Custom RAG Backend

How It Works

Next Steps & Future Improvements

Conclusion

Explore our Cloud, SRE, DevOps & Cybersecurity solutions

News and things that inspire us

Let’s work together

Building a RAG-Powered API for Kubernetes Troubleshooting with K8sGPT

Running the RAG API Server

Quick Start

Expanding Knowledge Sources

Enhancing K8sGPT with RAG

Setting Up K8sGPT with RAG

Using K8sGPT’s Analyze and Explain Features

Final Architecture

Running K8sGPT with the Custom RAG Backend

How It Works

Next Steps & Future Improvements

Conclusion

Explore our Cloud, SRE, DevOps & Cybersecurity solutions

News and things that inspire us

Related Articles

Building resilience in staff augmentation teams

The art and science of building high-performance teams

The multidisciplinary approach to building AI products

News and things that inspire us

Let’s work together

Contact Us