Open to Opportunities

M S Farhan
SRE & DevOps Engineer

Building resilient, self-healing cloud infrastructure at scale — from Kubernetes clusters to observability pipelines — so your teams can ship faster with confidence.

5+Years Experience
45%Incident Time Cut
900+Servers Automated
23%GKE Cost Reduced
Scroll to explore

Reliability is a feature

I'm an SRE & DevOps Engineer with 5+ years turning complex cloud infrastructure into dependable, cost-efficient systems. Currently an Associate Consultant at GlobalLogic (formerly Exabeam), I've driven platform modernization across CI/CD, observability, and Kubernetes at enterprise scale.

I specialize in reducing toil through infrastructure-as-code, building proactive observability stacks, and enabling engineering teams to deploy with confidence. My background spans cloud-native robotics infrastructure, RBAC compliance, and disaster recovery automation.

☁️

Google Cloud Professional Cloud Architect

Issued by Google Cloud · Credential verified

✓ Valid through Aug 2026
🎓

Master of Engineering

Cloud Computing
Manipal Academy of Higher Education
Jul 2018 – Jun 2020
🎓

Bachelor of Engineering

Computer Science
Vinayaka Mission University
Jul 2013 – May 2017

Skills & Tooling

The full-stack of reliability engineering — from infra provisioning to incident response.

☁️

Cloud Platforms

GCPAWS GKEEC2 IAMCloud Architect

Kubernetes & Containers

KubernetesHelm KustomizeDocker ArgoCDVelero VPA / HPA
🔁

CI/CD & GitOps

GitHub ActionsJenkins ArgoCDGitOps Release Automation
📊

Observability

PrometheusGrafana AlloyLoki CloudWatchCustom Metrics
🏗️

Infrastructure as Code

TerraformAnsible PerfectScaleGAR Server Automation
💻

Programming & Scripting

PythonBash REST APIsCrowdStrike API WebSockets

Work Experience

5+ years scaling cloud platforms across enterprise security, robotics, and AI/SIEM products.

Associate Consultant
GlobalLogic · Client: Exabeam
Jul 2025 – Present
📍 Bengaluru, India  ·  Employed by Exabeam until Jul 2025; continued same role post-merger
  • Migrated CI/CD from Jenkins → GitHub Actions & ArgoCD; upgraded logging from Promtail to Alloy; shifted container registry from GCR to GAR.
  • Built proactive observability with Prometheus & Grafana↓ 45% incident response time
  • Automated provisioning, scaling & decommissioning of 900+ servers using Terraform & Ansible.
  • Reduced GKE compute costs by 23% via Vertical VPA tuning & workload rightsizing using PerfectScale.
  • Designed automated release workflows integrating infra provisioning and application deployment.
Senior Site Reliability Engineer
Exabeam
Mar 2022 – Jul 2025
📍 Remote
  • Managed Kubernetes & GCP least-privilege access using Entitle — RBAC controls + audit logging for compliance.
  • Developed Python scripts to push custom application metrics to Prometheus and automate security tasks via CrowdStrike API.
  • Built and enhanced GitHub Actions pipelines for PR builds, image push, and Terraform deploys.
  • Implemented Velero-based disaster recovery — automated cluster backups and failover testing for critical services.
  • Troubleshot Kubernetes issues (PVC failures, CrashLoopBackOff, ImagePullBackOff, Ingress, node misconfigs) and performed node upgrade maintenance.
Lead DevOps Engineer
Bharati Robotic Systems
Jul 2021 – Feb 2022
📍 Pune, India
  • Engineered real-time robot-to-device communication via WebRTC, improving control responsiveness.
  • Scaled AWS infra (EC2, RoboMaker, IoT Core) with autoscaling & CI/CD → enhanced performance & uptime.
  • Built interactive monitoring dashboards (WebSockets + CloudWatch) for proactive fleet maintenance.
  • Led a 10-member team in robotics infrastructure scaling and operational excellence.
DevOps Engineer
Bharati Robotic Systems
Jul 2020 – Jul 2021
📍 Pune, India
  • Migrated robotics compute to AWS and integrated observability via CloudWatch & Prometheus.
  • Conducted IoT/cloud security audits and enforced compliance for robotics devices.
  • Built CI/CD pipelines for robotic apps using AWS RoboMaker, Lambda & SNS↑ 40% faster deployments
  • Implemented SLAM and path-planning updates for autonomous navigation.

Blog & Insights

Thoughts on SRE culture, Kubernetes operations, observability, and cloud cost engineering.

KubernetesComing Soon

Debugging Kubernetes in Production: A Field Guide

From CrashLoopBackOff to PVC failures — real-world patterns I've used to diagnose and resolve cluster issues without war-rooming for hours.

Coming Soon
📊
ObservabilityComing Soon

How We Cut Incident Response Time by 45% with Prometheus & Grafana

A deep-dive into building proactive alerting, custom metrics via Python, and dashboards that surface the right signal at the right time.

Coming Soon
💰
FinOpsComing Soon

Rightsizing GKE Workloads: VPA Tuning & PerfectScale in Practice

How I reduced GKE compute costs by 23% using Vertical Pod Autoscaler and PerfectScale — without compromising reliability or performance.

Coming Soon
🔁
GitOpsComing Soon

Jenkins to GitHub Actions + ArgoCD: A Migration Story

Lessons learned migrating a legacy CI/CD pipeline to a fully GitOps workflow — tooling choices, gotchas, and what I'd do differently.

Coming Soon
🤖
Robotics InfraComing Soon

SRE Principles Applied to Robotics Fleet Infrastructure

What happens when you apply SLOs, error budgets, and chaos engineering to a fleet of autonomous robots on AWS RoboMaker and IoT Core.

Coming Soon
🛡️
SecurityComing Soon

Zero-Trust Access in Kubernetes: RBAC + Entitle + Audit Logging

Building least-privilege access patterns for GCP and Kubernetes — how we used Entitle to enforce just-in-time access with full compliance audit trails.

Coming Soon

Ready to build something reliable?

I'm open to Senior SRE, Staff SRE, DevOps, and Platform Engineering roles. Reach out via any channel below.