DescriptionWere looking for a SRE with Cloud Engineer and Automation skills which will work within Ford for Cloud Centric engagements collaborating with Site Reliability Engineering team by accelerating journey to the cloud by solving difficult problems in distributed systems Terraform Cloud Run Services databases and highly available services.
The specific responsibilities of an SRE managing a large distributed eCommerce application involving Adobe Experience manager as Content Management System and Service layer built on microservices spring boot and Google Cloud may include:
Responsibilities- Automate and manage a highly available and scalable cloud environment that allows development teams to deploy and run their services.
- Having depth knowledge in Terraform (Infrastructure as Cloud) and able to create new terraform or modify the existing file according to Ford formats to create new Monitoring Dashboards / Alert policies and SLA.
- Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions also leveraging scalability high-performance and security.
- Extensive Log monitoring and analysis for both application and deployment pipeline to keep the Cloud Run services up and running without any issues.
- Creating SLO / SLA / SLI with GCP / Grafana / Dynatrace dashboards.
- Ability to support incident escalation and troubleshooting and conducting blameless postmortem on the incident resolution.
- Ensuring efficient functioning of data storage and processing functions in accordance with company security policies and best practices in cloud security.
- Collaborate with Engineering teams to identify optimization strategies help develop self-healing capabilities.
- Experience in developing a strong observability capability.
- Regularly reviewing performance analysis of existing systems and making recommendations for improvements.
- Participating in 24x7 on-call production support rotations and handling incident response to minimize disruptions.
Qualifications- 4 Year College Degree in Computer Science or Equivalent
- 5 - 6 years experience with JAVA J2EE NoSQL/SQL Datastore Spring Boot GCP/AWS/Azure & Docker/K8 in Maintenance and Development of multi-tier applications.
- Proven workexperience in designing deploying and operating mid to large scale public cloud environments.
- Professional SRE Certification is preferrable
- Public Cloud >> GCP is a Must have.
- Proven work experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise or community edition.
- Experience in package config and deployment management.
- Strong knowledge in GitHub DevOps (Tekton is an advantage)
- Should be proficient in scripting and coding that include traditional languages like Python and React.
- Extensive knowledge and hands-on experience in Dynatrace Grafana and Prometheus micro libraries.
- Exposure to Cloud Monitoring and logging.
- Experience with automation tools should be a priority.