From Idea to Online: How to Deploy Your First LLM App and Set Up a Full CI/CD Pipeline

From Idea to Online: How to Deploy Your First LLM App and Set Up a Full CI/CD Pipeline

Posted on:
Apr 29, 2025 08:07 PM
Category
AI summary
 

Why I’m Writing This

Every other week a friend pings me: “Hey Jesse, I have this cool AI idea—can you help me make it real?” They have a notebook demo, maybe a Hugging Face Space, but no clue how to turn it into something you can actually curl from a phone or slap behind a paywall.
So I pulled a tiny project out of my lunch‑box: a FastAPI endpoint that calls Replicate’s Flux Image model and spits back a picture from your prompt. Nothing fancy, but it’s a perfect skeleton to show how pros (and wannabe pros) get code from laptop to the big bad internet—without babysitting servers 24 × 7.
My mission today: walk you through the architecture, the tools, and the mind‑set. No wall of code here—when you need the full source, go check my repo: https://github.com/JesseQin123/fastapi-cicd.
notion image
 

1. What the App Actually Does

One sentence version: Client sends prompt → FastAPI receives → FastAPI calls Replicate (Flux Image) → Replicate returns an image → FastAPI pipes it back.
The endpoint is /generate-image and accepts a JSON payload with a prompt field. That’s it.
 
notion image
FIGURE 1 – Prompt‑to‑Image flow: a skinny arrow diagram showing client → FastAPI → Replicate → Image
If you want to play locally, check my Github Repo: https://github.com/JesseQin123/fastapi-cicd.git
# Clone and run the stub pip install -r requirements.txt python app/main.py # spins up on localhost:8000
Again, full instructions live in the repo README.

2. Why Bother With All This Pipeline Stuff?

You might ask, “I can already curl localhost:8000, bro. Why bring Docker, GitHub Actions, Kubernetes—sounds like overkill!”
Because:
  1. Repeatability – “Works on my machine” is cute until your co‑founder pulls and nothing works.
  1. Safety – A mis‑typed git push --force shouldn’t nuke prod. CI/CD gives you guardrails.
  1. Scale – The first Hacker News post that hits /generate‑image ten times a second will melt a hobby box. K8s lets you spin extra pods while you sleep.
  1. Rollback – If the new model pumps out cursed images, Argo CD can roll you back faster than you can type “sorry folks”.
That’s why serious teams automate—even for a tiny demo like this, learning the pattern pays off later.

3. Birds‑Eye Architecture

notion image
 
FIGURE 2 – Full pipeline: Git push → GitHub Actions → Docker image → Container Registry → Argo CD → Kubernetes Deployment/Service → FastAPI pods → Replicate API
 
Here’s the play‑by‑play:
  1. Code lives on GitHub. When I push main, a GitHub Actions workflow triggers.
  1. GitHub Actions (CI) builds a Docker image, runs unit tests (yeah, write some!), and pushes the image to Docker Hub.
  1. Argo CD (CD) is installed inside the cluster. It watches my Git repo’s k8s/ folder (where deployment.yaml and service.yaml sit). When it sees the image tag bump, it syncs the cluster.
  1. Kubernetes updates the Deployment—old pods drain, new ones start. A LoadBalancer service keeps a static public IP.
  1. Requests hit FastAPI pods, which in turn call out to Replicate. Replicate hosts the Flux model, so I don’t touch GPUs at all.
Boom—one push, pipeline does the rest.

4. Tech Stack at a Glance

Layer
Tool
Why I Picked It
Web Framework
FastAPI
Fast, async, Pythonic, automatic docs.
Model Hosting
Replicate (Flux Image)
No GPU bill, simple REST.
Containers
Docker
Ubiquitous, Dev→Prod parity.
CI
GitHub Actions
Free minutes, easy secrets.
Orchestration
Kubernetes
Autoscaling, self‑healing.
CD
Argo CD
Git‑Ops, click‑to‑rollback.
Cloud
AWS/GCP/Azure (pick one)
Managed K8s saves grey hair.
 

5. Step‑by‑Step Walk‑Through

5.1 FastAPI + Replicate on Your Laptop

  • Create a virtualenv, pip install stuff.
  • main.py defines a POST /generate-image endpoint. The handler calls replicate.run() with your prompt.
  • Keep your Replicate API token in .env. Add .env to .gitignore—non‑negotiable. Because leaking your API token even once can get your account abused, cost you money, and expose your project to serious risks.
notion image
 
 
Fire up uvicorn and test with curl or Postman. You should get back a base64 or URL to the generated image.

5.2 Containerize With Docker

Why Docker? Because Python dependency hell is real. The Dockerfile is short:
FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY ./app /app EXPOSE 8000 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Build it: docker build -t jesseqin/fastapi-flux:latest .
Run it locally: docker run -p 8000:8000 ...—same endpoint, now inside a container.

5.3 Continuous Integration With GitHub Actions

When I push, .github/workflows/ci.yml does:
  1. Checkout code.
  1. Set up Python.
  1. Build & test.
  1. Log in to Docker Hub (using a secret).
  1. docker push the new image.
 
notion image
FIGURE 3 – Screenshot of a green check on GitHub Actions run
The workflow file is only ~40 lines—go grab it in the repo. Modify registry credentials and you’re set.

5.4 Deploying on Kubernetes

I keep two YAMLs under k8s/:
  • deployment.yaml – defines 2 replicas, container image, port 8000.
  • service.yaml – type LoadBalancer, exposes port 8000.
If you’re on EKS/GKE/AKS you get an external IP; on minikube you can use minikube tunnel.

5.5 Letting Argo CD Do the Boring Stuff

Manual kubectl apply is okay once, terrible forever. Argo CD makes Kubernetes truly Git‑Ops:
  • You install Argo CD (helm chart or manifest).
  • Point it at the Git repo & folder.
  • Every commit, Argo checks diff; if out‑of‑sync, it applies changes.
  • The UI shows you Healthy / Synced status and one‑click rollback.
 
notion image
FIGURE 4 – Argo CD dashboard with app green
This closes the loop: git push → pod updated. Congrats, you joined the cool kids.
 

6. Security & Best Practices

  1. Secrets management – GitHub Secrets for CI, Kubernetes Secrets or AWS SM for runtime.
  1. Image scanning – Enable Dependabot or Trivy.
  1. Health probes – Add liveness & readiness in the Deployment.
  1. Monitoring – Grafana + Prometheus or whatever makes you sleep at night.
  1. Separate environments – dev / staging / prod namespaces. Trust me, you’ll thank me later.
    1. notion image
FIGURE 5 – GitHub → Settings → Secrets page
 

7. Where to Go From Here

You now own a neat little pipeline that turns one push into a live, scalable image‑generation service.
Next steps you can try:
  • Swap in a custom fine‑tuned model.
  • Add authentication (JWT or API key) to the endpoint.
  • Cache generated images in S3 or Cloudflare R2.
  • Build a tiny React front‑end and point it to the LoadBalancer IP.
 
Thanks for reading. Now stop scrolling, go deploy something, and send me a screenshot when your pod turns green.