About the company: Company is a rapidly growing private equity backed SaaSproduct company and provides cloudbased solutions.
Job Summary:As a Site Reliability Engineer (SRE) you will be responsible for building and maintaining theinfrastructure tools and pipelines that keep our systems running smoothly. You will collaborateclosely with DevOps engineering and product teams to design and deploy reliable scalable andautomated systems. You will also improve the application code for userfacing bugs ensuringenhanced performance and resilience.
RESPONSIBILITIES:Comfortable with work shift aligned with U.S. time zone 7 pm to 3 am IST Onsite)
1. CI/CD Pipeline Management:
Design implement and maintain robust CI/CD pipelines for automatedsoftware deployment.
Collaborate with DevOps and engineering teams to integrate testingmonitoring and security checks into pipelines.
Continuously improve deployment processes to ensure smooth and errorfreeproduction releases.
2. Monitoring and Observability:
Create and manage comprehensive logging dashboards in Datadog to monitorsystem health performance and logs.
Set up alerting mechanisms to proactively identify and respond to systemissues. Analyze and visualize key performance metrics to drive improvements.
3. Collaborate on Architectural Solutions:
Work closely with DevOps and engineering teams to design scalable resilientand secure infrastructure.
Ensure solutions adhere to best practices for performance security andmaintainability.
4. Code Optimization and Bug Fixing:
Improve application code to resolve userfacing bugs and enhance systemresilience.
Troubleshoot and fix issues that impact the performance or availability ofproduction systems.
Contribute to the continuous improvement of the codebase focusing onoptimizing performance and reliability.
5. Automation and Continuous Improvement:
Automate repetitive tasks related to infrastructure management monitoringand troubleshooting.
Identify and propose innovative solutions to improve system efficiency andperformance.6. Custom Node.js CLI Tool Development:
Develop and automate custom Node.js CLI tools to enhance operationalworkflows and streamline repetitive tasks.
Implement automated solutions to optimize system processe
Requirements
MUST HAVES:
Experience Level: 68 years
Comfortable with work shift aligned with U.S. time zone 7 pm to 3 am IST)
Prior experience working in crossfunctional teams
Systems architecture and design skills
Proficiency in scripting languages such as Bash Python or PowerShell.
Experience with CI/CD tools such as Github Actions or similar platforms.
Build and deployment automation experience especially in a containerized world
Proficiency with common ops tools (ECS Logstash Datadog Kibana EKS etc)
Experience with AWS or Azure
Comfort maintaining live production systems
Strong communication and collaboration skills with the ability to work effectively in afastpaced team environment.
Benefits
Medical & Life insurance
Motivating compensation
Paid Holidays
Great working environment
Rapid career development opportunities