Senior Site Reliability Engineer

Chennai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Senior Site Reliability Engineer

Who we are

Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.

In 2014 Arcadia set out on its mission to break the fossil fuel monopoly and since then we have been knocking down the institutional barriers to unlock decarbonization. To date we have connected hundreds of thousands of consumers and small businesses with high-quality clean energy options. Fast forward to today and now were thinking even bigger. We have launched Arcadia Platform an industry-defining SaaS platform that empowers developers and energy innovators to deliver their own custom personalized energy experiences accelerating the transformation of the industry from an analog energy system into a digitized information network.

Tackling one of the worlds biggest challenges requires out-of-the-box thinking & diverse perspectives. Were building a team of individuals from different backgrounds industries & educational experiences. If you share our passion for ushering in the era of the clean electron we look forward to learning what you would uniquely bring to Arcadia! Visit .

HQ: Greenwood Village Colorado

What were looking for:

We are seeking an experienced Senior Site Reliability Engineer (L3) to join our SRE/Platform Engineering team in India. This role will focus on building scaling and maintaining our AWS- and Kubernetes-based platform ensuring high reliability cost efficiency and secure operations across multiple environments. The successful candidate will work closely with Engineering Security DevOps and Product teams to drive automation improve infrastructure resilience and elevate observability across mission-critical systems.

The ideal candidate is a self-starter and hands-on engineer who can dive deep into complex distributed systems automate away manual processes and proactively identify reliability gaps. They should have a proven track record of managing production-grade AWS infrastructure Kubernetes clusters CI/CD pipelines and cloud security. They will collaborate daily with US-based engineering teams and cross-functional partners to ensure our platform remains scalable secure and cost-optimized as we continue to grow.

What youll do:

Design build and maintain AWS infrastructure (EKS VPC RDS IAM CloudWatch CloudTrail GuardDuty Load Balancers S3 CloudFront) using Terraform and CloudFormation
Lead all aspects of Kubernetes operations including cluster upgrades performance tuning CNI troubleshooting workload scaling Helm chart packaging and GitOps deployments
Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting) GitHub Actions AWS CodePipeline ArgoCD and FluxCD
Improve platform reliability by reducing operational toil through automation scripting (Python/Bash) and proactive system hardening
Implement and enhance observability across Prometheus Grafana Loki Tempo Datadog and CloudWatchensuring actionable alerting dashboards and metrics alignment with SLO/SLIs
Drive FinOps initiatives identifying cost inefficiencies and working with engineering teams to implement best practices tagging standards budgeting and resource right-sizing
Manage database operations across MySQL and PostgreSQL including backups performance tuning replication and operational runbooks
Maintain and improve secret management using Vault AWS Secrets Manager and Parameter Store
Strengthen cloud security posture with IAM least privilege CSPM reviews audit readiness GuardDuty/CloudTrail monitoring and environment hardening
Troubleshoot complex production issues across networking Kubernetes compute databases and CI/CD systems
Collaborate daily with US-based teams for incident reviews migrations roadmap work and platform enhancements
Contribute to development and adoption of AI-enabled tooling (e.g. automation debugging assistants MCP RAG pipelinesgood to have not mandatory)
Document runbooks architecture diagrams SOPs troubleshooting guides and operational best practices
Participate in on-call rotations (if required) and drive post-incident analysis and long-term fixes

What will help you succeed:

Must-haves:

Bachelors degree in Computer Science Engineering or equivalent practical experience
810 years of experience in SRE/DevOps/Cloud Engineering with deep hands-on exposure to AWS and Kubernetes
Strong hands-on experience with:

Terraform & Infrastructure as Code
AWS core services (EKS IAM RDS EC2 VPC CloudWatch CloudTrail GuardDuty)
Jenkins Groovy GitHub Actions ArgoCD FluxCD
Kubernetes troubleshooting and operations
Prometheus/Grafana/Datadog observability stacks

Proven ability to operate in high-scale high-uptime multi-environment production systems
Experience building automation via Python/Bash and reducing operational toil
Strong understanding of incident management root cause analysis and reliability engineering principles
Experience working with globally distributed teams across multiple time zones
Excellent communication skills (must interact with US teams daily)
Ability to work independently with minimal supervision take ownership and drive initiatives end-to-end
A growth mindset strong troubleshooting ability and comfort with complex cloud-native environments

Nice to have (Good-to-haves):

Experience with n8n self-hosted workflow automation platforms
Exposure to LLMs RAG vector DBs MCP concepts
Experience with cloud security/DevSecOps tools (Trivy Inspector OPA Kyverno)
Hands-on experience with FinOps platforms and cloud cost governance
Certifications in related field ( AWS Kubernetes Terraform ..etc)

Benefits

Competitive compensation and employee stock options
Hybrid/remote-first working model (India-based role with global collaboration)
Flexible leave policy
Comprehensive medical insurance (self family members)
Annual performance cycle quarterly recognition awards
A supportive diverse engineering culture grounded in empathy teamwork and innovation

Eliminating carbon footprints eliminating carbon copies.

Here at Arcadia we cultivate diversity celebrate individuality and believe unique perspectives are key to our collective success in creating a clean energy future. Arcadia is committed to equal employment opportunities regardless of race color religion gender sexual orientation gender identity or expression national origin age disability genetic information protected veteran status or any status protected by applicable federal state or local law. While we are currently unable to consider candidates who will require visa sponsorship we welcome applications from all qualified candidates eligible to work in India

Thank you

Required Experience:

Senior IC

Senior Site Reliability EngineerWho we areArcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and cl...

Senior Site Reliability Engineer

Who we are

Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.

In 2014 Arcadia set out on its mission to break the fossil fuel monopoly and since then we have been knocking down the institutional barriers to unlock decarbonization. To date we have connected hundreds of thousands of consumers and small businesses with high-quality clean energy options. Fast forward to today and now were thinking even bigger. We have launched Arcadia Platform an industry-defining SaaS platform that empowers developers and energy innovators to deliver their own custom personalized energy experiences accelerating the transformation of the industry from an analog energy system into a digitized information network.

Tackling one of the worlds biggest challenges requires out-of-the-box thinking & diverse perspectives. Were building a team of individuals from different backgrounds industries & educational experiences. If you share our passion for ushering in the era of the clean electron we look forward to learning what you would uniquely bring to Arcadia! Visit .

HQ: Greenwood Village Colorado

What were looking for:

What youll do:

Design build and maintain AWS infrastructure (EKS VPC RDS IAM CloudWatch CloudTrail GuardDuty Load Balancers S3 CloudFront) using Terraform and CloudFormation
Lead all aspects of Kubernetes operations including cluster upgrades performance tuning CNI troubleshooting workload scaling Helm chart packaging and GitOps deployments
Own and evolve our CI/CD ecosystem across Jenkins (Groovy scripting) GitHub Actions AWS CodePipeline ArgoCD and FluxCD
Improve platform reliability by reducing operational toil through automation scripting (Python/Bash) and proactive system hardening
Implement and enhance observability across Prometheus Grafana Loki Tempo Datadog and CloudWatchensuring actionable alerting dashboards and metrics alignment with SLO/SLIs
Drive FinOps initiatives identifying cost inefficiencies and working with engineering teams to implement best practices tagging standards budgeting and resource right-sizing
Manage database operations across MySQL and PostgreSQL including backups performance tuning replication and operational runbooks
Maintain and improve secret management using Vault AWS Secrets Manager and Parameter Store
Strengthen cloud security posture with IAM least privilege CSPM reviews audit readiness GuardDuty/CloudTrail monitoring and environment hardening
Troubleshoot complex production issues across networking Kubernetes compute databases and CI/CD systems
Collaborate daily with US-based teams for incident reviews migrations roadmap work and platform enhancements
Contribute to development and adoption of AI-enabled tooling (e.g. automation debugging assistants MCP RAG pipelinesgood to have not mandatory)
Document runbooks architecture diagrams SOPs troubleshooting guides and operational best practices
Participate in on-call rotations (if required) and drive post-incident analysis and long-term fixes

What will help you succeed:

Must-haves:

Bachelors degree in Computer Science Engineering or equivalent practical experience
810 years of experience in SRE/DevOps/Cloud Engineering with deep hands-on exposure to AWS and Kubernetes
Strong hands-on experience with:

Terraform & Infrastructure as Code
AWS core services (EKS IAM RDS EC2 VPC CloudWatch CloudTrail GuardDuty)
Jenkins Groovy GitHub Actions ArgoCD FluxCD
Kubernetes troubleshooting and operations
Prometheus/Grafana/Datadog observability stacks

Proven ability to operate in high-scale high-uptime multi-environment production systems
Experience building automation via Python/Bash and reducing operational toil
Strong understanding of incident management root cause analysis and reliability engineering principles
Experience working with globally distributed teams across multiple time zones
Excellent communication skills (must interact with US teams daily)
Ability to work independently with minimal supervision take ownership and drive initiatives end-to-end
A growth mindset strong troubleshooting ability and comfort with complex cloud-native environments

Nice to have (Good-to-haves):

Experience with n8n self-hosted workflow automation platforms
Exposure to LLMs RAG vector DBs MCP concepts
Experience with cloud security/DevSecOps tools (Trivy Inspector OPA Kyverno)
Hands-on experience with FinOps platforms and cloud cost governance
Certifications in related field ( AWS Kubernetes Terraform ..etc)

Benefits

Competitive compensation and employee stock options
Hybrid/remote-first working model (India-based role with global collaboration)
Flexible leave policy
Comprehensive medical insurance (self family members)
Annual performance cycle quarterly recognition awards
A supportive diverse engineering culture grounded in empathy teamwork and innovation

Eliminating carbon footprints eliminating carbon copies.

Thank you

Required Experience:

Senior IC

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Arcadia

Renters and homeowners, connect to a local solar farm for no extra cost and get savings on your power bill. Two minutes is all you need to subscribe.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Senior Site Reliability Engineer

Chennai - India

Job Summary

Senior Site Reliability Engineer

Who we are

HQ: Greenwood Village Colorado

What were looking for:

What youll do:

Must-haves:

Nice to have (Good-to-haves):

Benefits

Senior Site Reliability Engineer

Who we are

HQ: Greenwood Village Colorado

What were looking for:

What youll do:

Must-haves:

Nice to have (Good-to-haves):

Benefits

Key Skills

About Company

Related Jobs