From Idea to Online: How to Deploy Your First LLM App and Set Up a Full CI/CD Pipeline

Why I’m Writing This

Every other week a friend pings me: “Hey Jesse, I have this cool AI idea—can you help me make it real?” They have a notebook demo, maybe a Hugging Face Space, but no clue how to turn it into something you can actually curl from a phone or slap behind a paywall.

So I pulled a tiny project out of my lunch‑box: a FastAPI endpoint that calls Replicate’s Flux Image model and spits back a picture from your prompt. Nothing fancy, but it’s a perfect skeleton to show how pros (and wannabe pros) get code from laptop to the big bad internet—without babysitting servers 24 × 7.

My mission today: walk you through the architecture, the tools, and the mind‑set. No wall of code here—when you need the full source, go check my repo: https://github.com/JesseQin123/fastapi-cicd.

1. What the App Actually Does

One sentence version: Client sends prompt → FastAPI receives → FastAPI calls Replicate (Flux Image) → Replicate returns an image → FastAPI pipes it back.

The endpoint is /generate-image and accepts a JSON payload with a prompt field. That’s it.

FIGURE 1 – Prompt‑to‑Image flow: a skinny arrow diagram showing client → FastAPI → Replicate → Image

If you want to play locally, check my Github Repo: https://github.com/JesseQin123/fastapi-cicd.git


# Clone and run the stub
pip install -r requirements.txt
python app/main.py  # spins up on localhost:8000

Again, full instructions live in the repo README.

2. Why Bother With All This Pipeline Stuff?

You might ask, “I can already curl localhost:8000, bro. Why bring Docker, GitHub Actions, Kubernetes—sounds like overkill!”

Because:

Repeatability – “Works on my machine” is cute until your co‑founder pulls and nothing works.

Safety – A mis‑typed git push --force shouldn’t nuke prod. CI/CD gives you guardrails.

Scale – The first Hacker News post that hits /generate‑image ten times a second will melt a hobby box. K8s lets you spin extra pods while you sleep.

Rollback – If the new model pumps out cursed images, Argo CD can roll you back faster than you can type “sorry folks”.

That’s why serious teams automate—even for a tiny demo like this, learning the pattern pays off later.

3. Birds‑Eye Architecture

FIGURE 2 – Full pipeline: Git push → GitHub Actions → Docker image → Container Registry → Argo CD → Kubernetes Deployment/Service → FastAPI pods → Replicate API

Here’s the play‑by‑play:

Code lives on GitHub. When I push main, a GitHub Actions workflow triggers.

GitHub Actions (CI) builds a Docker image, runs unit tests (yeah, write some!), and pushes the image to Docker Hub.

Argo CD (CD) is installed inside the cluster. It watches my Git repo’s k8s/ folder (where deployment.yaml and service.yaml sit). When it sees the image tag bump, it syncs the cluster.

Kubernetes updates the Deployment—old pods drain, new ones start. A LoadBalancer service keeps a static public IP.

Requests hit FastAPI pods, which in turn call out to Replicate. Replicate hosts the Flux model, so I don’t touch GPUs at all.

Boom—one push, pipeline does the rest.

4. Tech Stack at a Glance

Layer	Tool	Why I Picked It
Web Framework	FastAPI	Fast, async, Pythonic, automatic docs.
Model Hosting	Replicate (Flux Image)	No GPU bill, simple REST.
Containers	Docker	Ubiquitous, Dev→Prod parity.
CI	GitHub Actions	Free minutes, easy secrets.
Orchestration	Kubernetes	Autoscaling, self‑healing.
CD	Argo CD	Git‑Ops, click‑to‑rollback.
Cloud	AWS/GCP/Azure (pick one)	Managed K8s saves grey hair.

5. Step‑by‑Step Walk‑Through

5.1 FastAPI + Replicate on Your Laptop

Create a virtualenv, pip install stuff.

main.py defines a POST /generate-image endpoint. The handler calls replicate.run() with your prompt.

Keep your Replicate API token in .env. Add .env to .gitignore—non‑negotiable. Because leaking your API token even once can get your account abused, cost you money, and expose your project to serious risks.

Fire up uvicorn and test with curl or Postman. You should get back a base64 or URL to the generated image.

5.2 Containerize With Docker

Why Docker? Because Python dependency hell is real. The Dockerfile is short:


FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY ./app /app
EXPOSE 8000

CMD  ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build it: docker build -t jesseqin/fastapi-flux:latest .

Run it locally: docker run -p 8000:8000 ...—same endpoint, now inside a container.

5.3 Continuous Integration With GitHub Actions

When I push, .github/workflows/ci.yml does:

Checkout code.

Set up Python.

Build & test.

docker push the new image.

FIGURE 3 – Screenshot of a green check on GitHub Actions run

The workflow file is only ~40 lines—go grab it in the repo. Modify registry credentials and you’re set.

5.4 Deploying on Kubernetes

I keep two YAMLs under k8s/:

deployment.yaml – defines 2 replicas, container image, port 8000.

service.yaml – type LoadBalancer, exposes port 8000.

If you’re on EKS/GKE/AKS you get an external IP; on minikube you can use minikube tunnel.

5.5 Letting Argo CD Do the Boring Stuff

Manual kubectl apply is okay once, terrible forever. Argo CD makes Kubernetes truly Git‑Ops:

You install Argo CD (helm chart or manifest).

Point it at the Git repo & folder.

Every commit, Argo checks diff; if out‑of‑sync, it applies changes.

The UI shows you Healthy / Synced status and one‑click rollback.

FIGURE 4 – Argo CD dashboard with app green

This closes the loop: git push → pod updated. Congrats, you joined the cool kids.

6. Security & Best Practices

Secrets management – GitHub Secrets for CI, Kubernetes Secrets or AWS SM for runtime.

Image scanning – Enable Dependabot or Trivy.

Health probes – Add liveness & readiness in the Deployment.

Monitoring – Grafana + Prometheus or whatever makes you sleep at night.

Separate environments – dev / staging / prod namespaces. Trust me, you’ll thank me later.

FIGURE 5 – GitHub → Settings → Secrets page

7. Where to Go From Here

You now own a neat little pipeline that turns one push into a live, scalable image‑generation service.

Next steps you can try:

Swap in a custom fine‑tuned model.

Add authentication (JWT or API key) to the endpoint.

Cache generated images in S3 or Cloudflare R2.

Build a tiny React front‑end and point it to the LoadBalancer IP.

Thanks for reading. Now stop scrolling, go deploy something, and send me a screenshot when your pod turns green.