About Parkar:
At Parkar we stand at the intersection of innovation and technology revolutionizing software development with our cutting-edge Low Code Application Platform . For almost a decade our expertise has expanded to four countries offering a full range of software development services including product management full-stack engineering DevOps test automation and data analytics.
our pioneering Low Code Application Platform redefines software development by integrating over 500 modular code components. It covers UI/UX front-end and back-end engineering and analytics for a streamlined efficient path to digital transformation through standardized software development and AIOps.
Our commitment to innovation has earned the trust of over 100 clients from large enterprises to small and medium-sized businesses. We proudly serve key sectors like Fintech Healthcare-Life Sciences Retail-eCommerce and Manufacturing delivering tailored solutions for success and growth.
At Parkar we dont just develop software; we build partnerships and pave the way for a future where technology empowers businesses to achieve their full potential.
For more info. Visit our website:
About Role:
We are looking for a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will focus on enhancing system reliability automation and performance ensuring high availability and scalability of our applications. You will work closely with development and operations teams to improve deployment pipelines monitoring and incident response.
Your Role at a Glance:
- Design develop and maintain scalable reliable and secure infrastructure.
- Implement monitoring logging and alerting solutions using tools like Datadog (Required); experience with SolarWinds Prometheus Grafana ELK Stack or Splunk is an advantage.
- Creation of automation scripts and playbooks for repetitive tasks
- Assist Deployment and configuration of observability stack
- Create dashboard for availability latency SLO/SLA tracking in Datadog (We are migrating from SolarWinds to Datadog)
- Improve system observability and enhance incident response through automation and root cause analysis.
- Optimize CI/CD pipelines to ensure smooth deployments and minimal downtime.
- Automate infrastructure provisioning and management using Terraform Ansible or Kubernetes. (good to have)
- Ensure high availability and disaster recovery through load balancing failover mechanisms and backups.
- Collaborate with development teams to enhance application performance reliability and scalability.
- Manage cloud-based environments (AWS Azure or GCP) for efficient resource utilization.
- Enhance security best practices including vulnerability assessments and patch management.
- Participate in on-call rotations to troubleshoot and resolve critical system issues.
The Expertise Youll Bring:
- 5 years of experience in Site Reliability Engineering DevOps or Infrastructure roles.
- Strong knowledge of Windows Linux/Unix systems and shell scripting.
- Hands-on experience with cloud platforms (AWS Azure or GCP) and Jira dashboard.
- Expertise in Kubernetes Docker and container orchestration.
- Experience with CI/CD tools like Copado (Required) Jenkins GitHub Actions or GitLab CI.
- Proficiency in Infrastructure as Code (IaC) tools like Terraform Ansible or CloudFormation.
- Solid experience with monitoring and observability tools (Datadog Required Prometheus Grafana ELK Splunk or New Relic).
- Strong knowledge of networking security and system architecture.
- Experience with scripting languages like Python Bash or Go.
- Familiarity with database performance tuning and optimization.
- Strong problem-solving skills and ability to work in a fast-paced Agile environment.
Education
- Bachelors degree in computer science engineering or similar domain.