drjobs Site Reliability Engineer

Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Lahore - Pakistan

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Requirements:

  • 5 years of experience in an SRE DevOps or infrastructure engineering role.
  • Strong experience with AWS or GCP including services like EC2Lambda S3 RDS and GKE (for GCP).
  • Experience with automation tools like Terraform.
  • Proficient in at least one scripting language (Python Bash Go etc.).
  • Solid understanding of Linux systems networking and cloud-based architectures.
  • Experience working with container orchestration platforms like Kubernetes.
  • Proficient with CI/CD pipelines preferably with cloud-native tools ().
  • Ability to troubleshoot complex distributed systems and provide solutions in high-pressure environments.
  • Ability to communicate effectively with both technical and non-technical stakeholders.

Nice to have:

  • Exposure to Execution Management Systems (EMS) / Portfolio Management Systems (PMS).
  • Experience with client-impact triage working cross-functionally with account managers or product teams.
  • Proficiency with Datadog or similar observability platforms.
  • Knowledge of serverless architectures (e.g. AWS Lambda GCP Cloud Functions).
  • Familiarity with RDBMS and NoSQL databases such as RDS CloudSQL and DynamoDB.
  • Prior experience in fintech trading platforms or 24/7 financial infrastructure.
  • Strong understanding of API integrations and how infrastructure issues might manifest in client environments.
  • Excellent problem-solving and communication skills with the ability to translate technical incidents into clear client updates.
  • Experience working with client-facing teams.

Responsibilities:

  • Ensure the reliability availability and performance of production systems particularly during weekends.
  • Take ownership of monitoring troubleshooting and incident response during weekends and off-hours.
  • Troubleshoot and resolve critical issues in a fast-paced high-availability environment.
  • Automate manual processes and workflows reducing operational overhead.
  • Work closely with engineering teams to design and deploy scalable fault-tolerant infrastructure solutions on AWS or GCP.
  • Improve observability by utilizing monitoring logging and alerting systems (e.g. CloudWatch Datadog).
  • Lead post-incident reviews contribute to the continuous improvement of system reliability and follow up on strategic fixes.
  • Develop and update runbooks incident response playbooks and documentation.
  • Work closely with Engineering Product and Client teams to proactively identify infrastructure pain points that could affect the user experience.
  • Monitor alert channels logs and infrastructure load for the entire stack.
  • Set up automation for alerting.

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.