Project: GETTR TechOps SLA & Security Overhaul
Role: Director of TechOps
Problem: Rapid platform growth led to instability, fragmented triage, and increasing security exposure
Solution:
Designed incident protocols, severity triage, and RCA post-mortem loops
Implemented PagerDuty, public status page, multi-cloud architecture
Integrated Imperva WAF and DDoS protection
Partnered with cybersecurity for red teaming, audits, and policy enforcement
Built a full observability stack (SLIs/SLOs) for platform health
Impact:
Achieved 99.99% uptime
Reduced incident response time by 80%
Maintained clear public communications during upstream outages (e.g., Shopify Merch Store incident in Nov 2023)
Built a culture of proactive defense, transparency, and systems thinking at scale
When I became Director of TechOps at GETTR, we were no longer a scrappy startup with a few servers and a single product.
We had grown into a full-stack social platform—with livestreaming, short-form video, direct messaging, public feeds, real-time comments, notifications, and a moderation engine. Our user base spanned multiple continents, our traffic surged with every political event, and downtime wasn't just inconvenient anymore.
It was unacceptable.
We were evolving from a fast-moving tech team into a critical infrastructure provider for real-time expression. My mandate was clear: