Why I’m Writing This
Every other week a friend pings me: “Hey Jesse, I have this cool AI idea—can you help me make it real?” They have a notebook demo, maybe a Hugging Face Space, but no clue how to turn it into something you can actually curl from a phone or slap behind a paywall.
So I pulled a tiny project out of my lunch‑box: a FastAPI endpoint that calls Replicate’s Flux Image model and spits back a picture from your prompt. Nothing fancy, but it’s a perfect skeleton to show how pros (and wannabe pros) get code from laptop to the big bad internet—without babysitting servers 24 × 7.
My mission today: walk you through the architecture, the tools, and the mind‑set. No wall of code here—when you need the full source, go check my repo: https://github.com/JesseQin123/fastapi-cicd.
1. What the App Actually Does
One sentence version: Client sends prompt → FastAPI receives → FastAPI calls Replicate (Flux Image) → Replicate returns an image → FastAPI pipes it back.
The endpoint is
/generate-image
and accepts a JSON payload with a prompt
field. That’s it.FIGURE 1 – Prompt‑to‑Image flow: a skinny arrow diagram showing client → FastAPI → Replicate → Image
If you want to play locally, check my Github Repo: https://github.com/JesseQin123/fastapi-cicd.git
# Clone and run the stub pip install -r requirements.txt python app/main.py # spins up on localhost:8000
Again, full instructions live in the repo README.
2. Why Bother With All This Pipeline Stuff?
You might ask, “I can already curl
localhost:8000
, bro. Why bring Docker, GitHub Actions, Kubernetes—sounds like overkill!”Because:
- Repeatability – “Works on my machine” is cute until your co‑founder pulls and nothing works.
- Safety – A mis‑typed
git push --force
shouldn’t nuke prod. CI/CD gives you guardrails.
- Scale – The first Hacker News post that hits /generate‑image ten times a second will melt a hobby box. K8s lets you spin extra pods while you sleep.
- Rollback – If the new model pumps out cursed images, Argo CD can roll you back faster than you can type “sorry folks”.
That’s why serious teams automate—even for a tiny demo like this, learning the pattern pays off later.
3. Birds‑Eye Architecture
FIGURE 2 – Full pipeline: Git push → GitHub Actions → Docker image → Container Registry → Argo CD → Kubernetes Deployment/Service → FastAPI pods → Replicate API
Here’s the play‑by‑play:
- Code lives on GitHub. When I push main, a GitHub Actions workflow triggers.
- GitHub Actions (CI) builds a Docker image, runs unit tests (yeah, write some!), and pushes the image to Docker Hub.
- Argo CD (CD) is installed inside the cluster. It watches my Git repo’s
k8s/
folder (wheredeployment.yaml
andservice.yaml
sit). When it sees the image tag bump, it syncs the cluster.
- Kubernetes updates the Deployment—old pods drain, new ones start. A LoadBalancer service keeps a static public IP.
- Requests hit FastAPI pods, which in turn call out to Replicate. Replicate hosts the Flux model, so I don’t touch GPUs at all.
Boom—one push, pipeline does the rest.
4. Tech Stack at a Glance
Layer | Tool | Why I Picked It |
Web Framework | FastAPI | Fast, async, Pythonic, automatic docs. |
Model Hosting | Replicate (Flux Image) | No GPU bill, simple REST. |
Containers | Docker | Ubiquitous, Dev→Prod parity. |
CI | GitHub Actions | Free minutes, easy secrets. |
Orchestration | Kubernetes | Autoscaling, self‑healing. |
CD | Argo CD | Git‑Ops, click‑to‑rollback. |
Cloud | AWS/GCP/Azure (pick one) | Managed K8s saves grey hair. |
5. Step‑by‑Step Walk‑Through
5.1 FastAPI + Replicate on Your Laptop
- Create a virtualenv, pip install stuff.
main.py
defines aPOST /generate-image
endpoint. The handler callsreplicate.run()
with your prompt.
- Keep your Replicate API token in
.env
. Add.env
to.gitignore
—non‑negotiable. Because leaking your API token even once can get your account abused, cost you money, and expose your project to serious risks.
Fire up
uvicorn
and test with curl or Postman. You should get back a base64 or URL to the generated image.5.2 Containerize With Docker
Why Docker? Because Python dependency hell is real. The
Dockerfile
is short:FROM python:3.10-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY ./app /app EXPOSE 8000 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Build it:
docker build -t jesseqin/fastapi-flux:latest .
Run it locally:
docker run -p 8000:8000 ...
—same endpoint, now inside a container.5.3 Continuous Integration With GitHub Actions
When I push,
.github/workflows/ci.yml
does:- Checkout code.
- Set up Python.
- Build & test.
- Log in to Docker Hub (using a secret).
docker push
the new image.
FIGURE 3 – Screenshot of a green check on GitHub Actions run
The workflow file is only ~40 lines—go grab it in the repo. Modify registry credentials and you’re set.
5.4 Deploying on Kubernetes
I keep two YAMLs under
k8s/
:- deployment.yaml – defines 2 replicas, container image, port 8000.
- service.yaml – type
LoadBalancer
, exposes port 8000.
If you’re on EKS/GKE/AKS you get an external IP; on minikube you can use
minikube tunnel
.5.5 Letting Argo CD Do the Boring Stuff
Manual
kubectl apply
is okay once, terrible forever. Argo CD makes Kubernetes truly Git‑Ops:- You install Argo CD (helm chart or manifest).
- Point it at the Git repo & folder.
- Every commit, Argo checks diff; if out‑of‑sync, it applies changes.
- The UI shows you Healthy / Synced status and one‑click rollback.
FIGURE 4 – Argo CD dashboard with app green
This closes the loop: git push → pod updated. Congrats, you joined the cool kids.
6. Security & Best Practices
- Secrets management – GitHub Secrets for CI, Kubernetes Secrets or AWS SM for runtime.
- Image scanning – Enable Dependabot or Trivy.
- Health probes – Add
liveness
&readiness
in the Deployment.
- Monitoring – Grafana + Prometheus or whatever makes you sleep at night.
- Separate environments – dev / staging / prod namespaces. Trust me, you’ll thank me later.
FIGURE 5 – GitHub → Settings → Secrets page
7. Where to Go From Here
You now own a neat little pipeline that turns one push into a live, scalable image‑generation service.
Next steps you can try:
- Swap in a custom fine‑tuned model.
- Add authentication (JWT or API key) to the endpoint.
- Cache generated images in S3 or Cloudflare R2.
- Build a tiny React front‑end and point it to the LoadBalancer IP.
Thanks for reading. Now stop scrolling, go deploy something, and send me a screenshot when your pod turns green.