M S Farhan — SRE & DevOps Engineer

// about me

Reliability is a feature

I'm an SRE & DevOps Engineer with 5+ years turning complex cloud infrastructure into dependable, cost-efficient systems. Currently an Associate Consultant at GlobalLogic (formerly Exabeam), I've driven platform modernization across CI/CD, observability, and Kubernetes at enterprise scale.

I specialize in reducing toil through infrastructure-as-code, building proactive observability stacks, and enabling engineering teams to deploy with confidence. My background spans cloud-native robotics infrastructure, RBAC compliance, and disaster recovery automation.

Bengaluru, India

msfarhan.official@gmail.com

linkedin.com/in/farhan-sre

☁️

Google Cloud Professional Cloud Architect

Issued by Google Cloud · Credential verified

✓ Valid through Aug 2026

🎓

Master of Engineering

Cloud Computing

Manipal Academy of Higher Education

Jul 2018 – Jun 2020

🎓

Bachelor of Engineering

Computer Science

Vinayaka Mission University

Jul 2013 – May 2017

// expertise

Skills & Tooling

The full-stack of reliability engineering — from infra provisioning to incident response.

☁️

Cloud Platforms

GCPAWS GKEEC2 IAMCloud Architect

⎈

Kubernetes & Containers

KubernetesHelm KustomizeDocker ArgoCDVelero VPA / HPA

🔁

CI/CD & GitOps

GitHub ActionsJenkins ArgoCDGitOps Release Automation

📊

Observability

PrometheusGrafana AlloyLoki CloudWatchCustom Metrics

🏗️

Infrastructure as Code

TerraformAnsible PerfectScaleGAR Server Automation

💻

Programming & Scripting

PythonBash REST APIsCrowdStrike API WebSockets

// career

Work Experience

5+ years scaling cloud platforms across enterprise security, robotics, and AI/SIEM products.

Associate Consultant

GlobalLogic · Client: Exabeam

Jul 2025 – Present

📍 Bengaluru, India · Employed by Exabeam until Jul 2025; continued same role post-merger

Migrated CI/CD from Jenkins → GitHub Actions & ArgoCD; upgraded logging from Promtail to Alloy; shifted container registry from GCR to GAR.
Built proactive observability with Prometheus & Grafana → ↓ 45% incident response time
Automated provisioning, scaling & decommissioning of 900+ servers using Terraform & Ansible.
Reduced GKE compute costs by 23% via Vertical VPA tuning & workload rightsizing using PerfectScale.
Designed automated release workflows integrating infra provisioning and application deployment.

Senior Site Reliability Engineer

Exabeam

Mar 2022 – Jul 2025

📍 Remote

Managed Kubernetes & GCP least-privilege access using Entitle — RBAC controls + audit logging for compliance.
Developed Python scripts to push custom application metrics to Prometheus and automate security tasks via CrowdStrike API.
Built and enhanced GitHub Actions pipelines for PR builds, image push, and Terraform deploys.
Implemented Velero-based disaster recovery — automated cluster backups and failover testing for critical services.
Troubleshot Kubernetes issues (PVC failures, CrashLoopBackOff, ImagePullBackOff, Ingress, node misconfigs) and performed node upgrade maintenance.

Lead DevOps Engineer

Bharati Robotic Systems

Jul 2021 – Feb 2022

📍 Pune, India

Engineered real-time robot-to-device communication via WebRTC, improving control responsiveness.
Scaled AWS infra (EC2, RoboMaker, IoT Core) with autoscaling & CI/CD → enhanced performance & uptime.
Built interactive monitoring dashboards (WebSockets + CloudWatch) for proactive fleet maintenance.
Led a 10-member team in robotics infrastructure scaling and operational excellence.

DevOps Engineer

Bharati Robotic Systems

Jul 2020 – Jul 2021

📍 Pune, India

Migrated robotics compute to AWS and integrated observability via CloudWatch & Prometheus.
Conducted IoT/cloud security audits and enforced compliance for robotics devices.
Built CI/CD pipelines for robotic apps using AWS RoboMaker, Lambda & SNS → ↑ 40% faster deployments
Implemented SLAM and path-planning updates for autonomous navigation.

// writing

Blog & Insights

Thoughts on SRE culture, Kubernetes operations, observability, and cloud cost engineering.

⎈

KubernetesComing Soon

Debugging Kubernetes in Production: A Field Guide

From CrashLoopBackOff to PVC failures — real-world patterns I've used to diagnose and resolve cluster issues without war-rooming for hours.

Coming Soon

📊

ObservabilityComing Soon

How We Cut Incident Response Time by 45% with Prometheus & Grafana

A deep-dive into building proactive alerting, custom metrics via Python, and dashboards that surface the right signal at the right time.

Coming Soon

💰

FinOpsComing Soon

Rightsizing GKE Workloads: VPA Tuning & PerfectScale in Practice

How I reduced GKE compute costs by 23% using Vertical Pod Autoscaler and PerfectScale — without compromising reliability or performance.

Coming Soon

🔁

GitOpsComing Soon

Jenkins to GitHub Actions + ArgoCD: A Migration Story

Lessons learned migrating a legacy CI/CD pipeline to a fully GitOps workflow — tooling choices, gotchas, and what I'd do differently.

Coming Soon

🤖

Robotics InfraComing Soon

SRE Principles Applied to Robotics Fleet Infrastructure

What happens when you apply SLOs, error budgets, and chaos engineering to a fleet of autonomous robots on AWS RoboMaker and IoT Core.

Coming Soon

🛡️

SecurityComing Soon

Zero-Trust Access in Kubernetes: RBAC + Entitle + Audit Logging

Building least-privilege access patterns for GCP and Kubernetes — how we used Entitle to enforce just-in-time access with full compliance audit trails.

Coming Soon

View All Posts →

M S Farhan
SRE & DevOps Engineer

Reliability is a feature

Google Cloud Professional Cloud Architect

Master of Engineering

Bachelor of Engineering

Skills & Tooling

Cloud Platforms

Kubernetes & Containers

CI/CD & GitOps

Observability

Infrastructure as Code

Programming & Scripting

Work Experience

Blog & Insights

Debugging Kubernetes in Production: A Field Guide

How We Cut Incident Response Time by 45% with Prometheus & Grafana

Rightsizing GKE Workloads: VPA Tuning & PerfectScale in Practice

Jenkins to GitHub Actions + ArgoCD: A Migration Story

SRE Principles Applied to Robotics Fleet Infrastructure

Zero-Trust Access in Kubernetes: RBAC + Entitle + Audit Logging

Ready to build something reliable?

M S FarhanSRE & DevOps Engineer

Reliability is a feature

Google Cloud Professional Cloud Architect

Master of Engineering

Bachelor of Engineering

Skills & Tooling

Cloud Platforms

Kubernetes & Containers

CI/CD & GitOps

Observability

Infrastructure as Code

Programming & Scripting

Work Experience

Blog & Insights

Debugging Kubernetes in Production: A Field Guide

How We Cut Incident Response Time by 45% with Prometheus & Grafana

Rightsizing GKE Workloads: VPA Tuning & PerfectScale in Practice

Jenkins to GitHub Actions + ArgoCD: A Migration Story

SRE Principles Applied to Robotics Fleet Infrastructure

Zero-Trust Access in Kubernetes: RBAC + Entitle + Audit Logging

Ready to build something reliable?

M S Farhan
SRE & DevOps Engineer