Site Reliability Engineer

HR POD - Hiring Talent Globally

Posted on : 27-05-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Lahore - Pakistan

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 27-05-2025

Job Description

Requirements:

5 years of experience in an SRE DevOps or infrastructure engineering role.
Strong experience with AWS or GCP including services like EC2Lambda S3 RDS and GKE (for GCP).
Experience with automation tools like Terraform.
Proficient in at least one scripting language (Python Bash Go etc.).
Solid understanding of Linux systems networking and cloud-based architectures.
Experience working with container orchestration platforms like Kubernetes.
Proficient with CI/CD pipelines preferably with cloud-native tools ().
Ability to troubleshoot complex distributed systems and provide solutions in high-pressure environments.
Ability to communicate effectively with both technical and non-technical stakeholders.

Nice to have:

Exposure to Execution Management Systems (EMS) / Portfolio Management Systems (PMS).
Experience with client-impact triage working cross-functionally with account managers or product teams.
Proficiency with Datadog or similar observability platforms.
Knowledge of serverless architectures (e.g. AWS Lambda GCP Cloud Functions).
Familiarity with RDBMS and NoSQL databases such as RDS CloudSQL and DynamoDB.
Prior experience in fintech trading platforms or 24/7 financial infrastructure.
Strong understanding of API integrations and how infrastructure issues might manifest in client environments.
Excellent problem-solving and communication skills with the ability to translate technical incidents into clear client updates.
Experience working with client-facing teams.

Responsibilities:

Ensure the reliability availability and performance of production systems particularly during weekends.
Take ownership of monitoring troubleshooting and incident response during weekends and off-hours.
Troubleshoot and resolve critical issues in a fast-paced high-availability environment.
Automate manual processes and workflows reducing operational overhead.
Work closely with engineering teams to design and deploy scalable fault-tolerant infrastructure solutions on AWS or GCP.
Improve observability by utilizing monitoring logging and alerting systems (e.g. CloudWatch Datadog).
Lead post-incident reviews contribute to the continuous improvement of system reliability and follow up on strategic fixes.
Develop and update runbooks incident response playbooks and documentation.
Work closely with Engineering Product and Client teams to proactively identify infrastructure pain points that could affect the user experience.
Monitor alert channels logs and infrastructure load for the entire stack.
Set up automation for alerting.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

HR POD - Hiring Talent Globally

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.