drjobs Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bangalore - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

The Opportunity

We are looking for talented curious and energetic Sr. Site Reliability Engineer who embrace solving complex challenges on a global scale. As a Visa Sr. SRE you will be responsible for supporting various digital projects and ensuring the reliability and stability of our production transaction processing systems. These systems are the backbone of our business and process millions of transactions daily.

 

The Work itself

The Product Reliability Engineering (PRE) group prides itself in keeping the applications and systems of Visa up and running to cater to the 24x7 needs of the business.

  • Provide Support for critical applications ensuring their stability and reliability by performing proactive maintenance activities and responding to alerts.

  • Engage in automation activities to improve operational efficiency and reduce manual intervention.

  • Support application and infrastructure built on modern technologies such as Kubernetes containers Kafka Grafana Prometheus and Elasticsearch.

  • Perform root cause analysis and remediation for incidents impacting application stability and performance.

  • Monitor application performance (e.g. memory usage logging latency) and take corrective actions as needed.

  • Write and maintain scripts for monitoring system activity including application smoke test activities during pre- and post-production implementations.

  • Support application deployments and code releases in test and production environments using industry-standard deployment tools (e.g. Chef Jenkins).

  • Respond to and resolve client-escalated issues related to applications (e.g. increased latency transactional issues features not working as expected).

  • Implement and maintain performance monitoring dashboards using industry-standard tools (e.g. Splunk ThousandEyes Keynote Runscope Ghost Inspector Evolven Graphite).

  • Participate in on-call rotation to provide 24/7 support for production environments.

  • Document incident resolutions troubleshooting steps and best practices to improve team knowledge and onboarding.

  • Collaborate with development infrastructure and product teams to resolve complex issues.

  • Support disaster recovery and business continuity exercises as required.

  • Assist in managing and executing change incident and problem management processes.

  • Provide regular status updates and incident reports to management and stakeholders.

  • Maintain and update runbooks and standard operating procedures for application support.

  • Participate in knowledge transfer sessions and contribute to team training initiatives.

  • Assist with user access management including provisioning and de-provisioning access as per company policies.

  • Support certificate renewals patch management and vulnerability remediation activities as required.

 

Essential Functions

The Skills You Bring

  • Collaboration and Teamwork: The Candidate should possess robust interpersonal skills and are adept at both written and verbal communication. Candidate should have strong inclination towards teamwork and can effectively collaborate with a globally dispersed virtual team.

  • Learning Capacity: Fast learner readily picking up new technologies and tools and can disseminate this knowledge to others.

  • Adaptability and Innovation: Comfortable pushing boundaries and exploring beyond traditional solutions. Embrace challenges new technologies and innovation.

  • Decision-making: The candidate should have the ability to prioritize multitask and deliver quality work on time. Candidate can effectively plan and make informed decisions on execution timelines and maintain focus under stressful situations.

  • Personal and Professional Growth: The candidate is a highly self-motivated and possess a strong sense of ownership. Candidate should have a keen interest in learning new technologies and business concepts to facilitate personal and organizational growth.

  • Professional Ethics: Candidate must have strong business ethics self-discipline and trustworthiness particularly when handling highly sensitive and confidential data in a live production environment.

This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.


Qualifications :

Basic Qualifications
2 years of relevant work experience and a Bachelors degree OR 5 years of relevant work experience

Preferred Qualifications
Bachelors or Masters degree in Computer Science or a related field from an accredited university.
Experience
35 years of hands-on experience in a Site Reliability Engineering (SRE) or DevOps role.
Proven experience supporting applications on public cloud platforms (AWS GCP Azure) and working with hybrid cloud/on-premises models.
Strong experience with the full software development lifecycle (SDLC) and Agile methodologies.
Experience participating in release management and on-call support for both cloud and on-premises technologies.
Technical Skills
Proficiency with Linux/UNIX operating systems.
Strong hands-on expertise with containers and orchestration tools especially Kubernetes and Docker.
Working knowledge of database technologies such as MySQL MSSQL or Oracle.
Advanced scripting skills in UNIX Shell PowerShell Python or similar languages.
Proficient in troubleshooting applications across middleware stacks (Tomcat Apache Kafka MQ) and streaming services (Flink Spark).
Experience implementing and managing DevOps pipelines using Jenkins Ansible Docker and Kubernetes.
Familiarity with application development and troubleshooting in Go and Rust and with resolving system integration issues.
Practical experience designing and implementing CI/CD processes for seamless deployments.
Solid understanding of monitoring and observability tools such as Prometheus Splunk and Grafana.
Hands-on experience creating deployments services and ingress flows for applications in Kubernetes clusters.
Soft Skills
Excellent problem-solving and troubleshooting abilities with a strong attention to detail.
Effective prioritization coordination and multitasking skills in a fast-paced environment.
Strong collaboration skills - Able to work as part of a cross-functional team.
Excellent written and verbal communication skills.
Fast learner with the ability to quickly adopt new technologies and industry trends.
Adherence to companys Work From Office policy.


Additional Information :

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.


Remote Work :

No


Employment Type :

Full-time

Employment Type

Full-time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.