Site Reliability Engineer Specialist

Global Payment Holding Company

Job Location:

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Every day Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit debit prepaid and merchant services. Our worldwide team helps over 3 million companies more than 1300 financial institutions and over 600 million cardholders grow with confidence and achieve amazing results. We are driven by our passion for success and we are proud to deliver best-in-class payment technology and software solutions. Join our dynamic team and make your mark on the payments technology landscape of tomorrow.

Summary of This Role
manage DevOps tools such as Jenkins Git Docker Kubernetes and Terraform. Use these skills to support build and maintain Kubernetes clusters on-prem in OCP and in AWS. Responsible for availability latency performance efficiency change management monitoring emergency response and capacity planning. Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.

What Part Will You Play
- Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.
- Chaos engineering - youre expected to think laterally about how our systems might fail in theory design tests to demonstrate how they behave in practice and then formulate and implement remediation plans as appropriate.
- Pushing our systems to their limits and then coming up with designs for how to get them to the next performance tier.
- Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
- Safeguarding reliability. Ensuring that our services are highly available resilient against disasters self-monitoring and self-healing.
- Running game days to test assumptions about reliability and learn what will break before it matters to customers.
- Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
- Building systems to proactively monitor the health performance and security of our production and non-production virtualized infrastructure.
- Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and dont get paged when it doesnt).
- Troubleshooting systems and network issues alongside our Technical Operations Team.
- Mentoring other engineers in reliability-related skills.
- Evolving our SDLC practices and tooling to account for Site Reliability considerations and best practices.
- Developing runbooks and improving documentation.
What Are We Looking For in This Role
Minimum Qualifications
- BS in Computer Science Information Technology Business / Management Information Systems or related field
- Typically have 6 years of experience with programming in one or more programming languages and 4 years of experience working with Unix/Linux systems internals and administration (e.g. filesystems inodes system calls) or networking (e.g. TCP/IP routing network topologies and hardware SDN).
What Are Our Desired Skills and Capabilities

Basic familiarity with containerization tools like Docker.
Deep understanding of Kubernetes concepts architecture and best practices.
Familiarity with OpenShift Container Platform its features and how it extends Kubernetes.
Basic understanding of version control systems such as Git.
Basic knowledge of CI/CD concepts and tools (e.g. Jenkins GitLab CI).
Basic understanding of Infrastructure as Code principles.
Basic knowledge of Linux operating systems.
Understanding of basic networking concepts and protocols.
Awareness of fundamental security practices and principles.
Basic understanding of securing applications and infrastructure.
Analytical skills to troubleshoot and resolve basic technical issues.
Ability to identify and escalate complex issues to senior team members.
Eagerness to learn new technologies and continuously improve technical skills.
Active participation in training sessions workshops and relevant certifications.

Preferred

Experience with cloud platforms (e.g. AWS Azure GCP) and their services.
Proficiency in scripting languages (e.g. Python Bash Groovy) and experience with automation tools (e.g. Ansible Terraform Salt).
Basic knowledge of monitoring and logging tools (e.g. Prometheus Grafana).
Exposure to Kafka Nats Vault

Global Payments Inc. is an equal opportunity employer. Global Payments provides equal employment opportunities to all employees and applicants for employment without regard to race color religion sex (including pregnancy) national origin ancestry age marital status sexual orientation gender identity or expression disability veteran status genetic information or any other basis protected by law. If you wish to request reasonable accommodations related to applying for employment or provide feedback about the accessibility of this website please contact .

Required Experience:

Summary of This Role
manage DevOps tools such as Jenkins Git Docker Kubernetes and Terraform. Use these skills to support build and maintain Kubernetes clusters on-prem in OCP and in AWS. Responsible for availability latency performance efficiency change management monitoring emergency response and capacity planning. Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics. They split their time between operations/on-call duties and developing systems and software that help increase site reliability and performance.

What Part Will You Play
- Participate in architecture and R&D discussions for new technology or processes to increase the performance and reliability of our systems.
- Chaos engineering - youre expected to think laterally about how our systems might fail in theory design tests to demonstrate how they behave in practice and then formulate and implement remediation plans as appropriate.
- Pushing our systems to their limits and then coming up with designs for how to get them to the next performance tier.
- Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
- Safeguarding reliability. Ensuring that our services are highly available resilient against disasters self-monitoring and self-healing.
- Running game days to test assumptions about reliability and learn what will break before it matters to customers.
- Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
- Building systems to proactively monitor the health performance and security of our production and non-production virtualized infrastructure.
- Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and dont get paged when it doesnt).
- Troubleshooting systems and network issues alongside our Technical Operations Team.
- Mentoring other engineers in reliability-related skills.
- Evolving our SDLC practices and tooling to account for Site Reliability considerations and best practices.
- Developing runbooks and improving documentation.
What Are We Looking For in This Role
Minimum Qualifications
- BS in Computer Science Information Technology Business / Management Information Systems or related field
- Typically have 6 years of experience with programming in one or more programming languages and 4 years of experience working with Unix/Linux systems internals and administration (e.g. filesystems inodes system calls) or networking (e.g. TCP/IP routing network topologies and hardware SDN).
What Are Our Desired Skills and Capabilities

Basic familiarity with containerization tools like Docker.
Deep understanding of Kubernetes concepts architecture and best practices.
Familiarity with OpenShift Container Platform its features and how it extends Kubernetes.
Basic understanding of version control systems such as Git.
Basic knowledge of CI/CD concepts and tools (e.g. Jenkins GitLab CI).
Basic understanding of Infrastructure as Code principles.
Basic knowledge of Linux operating systems.
Understanding of basic networking concepts and protocols.
Awareness of fundamental security practices and principles.
Basic understanding of securing applications and infrastructure.
Analytical skills to troubleshoot and resolve basic technical issues.
Ability to identify and escalate complex issues to senior team members.
Eagerness to learn new technologies and continuously improve technical skills.
Active participation in training sessions workshops and relevant certifications.

Preferred

Experience with cloud platforms (e.g. AWS Azure GCP) and their services.
Proficiency in scripting languages (e.g. Python Bash Groovy) and experience with automation tools (e.g. Ansible Terraform Salt).
Basic knowledge of monitoring and logging tools (e.g. Prometheus Grafana).
Exposure to Kafka Nats Vault

Required Experience: