Overview:
The Snowflake Site Reliability Engineer (SRE) with AWS Terraform plays a crucial role in ensuring the availability reliability and performance of Snowflake products. This position is essential for maintaining a stable infrastructure and continuous improvement of the platform. The SRE collaborates closely with the engineering and operations teams to address service failures and deployment challenges implementing effective monitoring and automation solutions.
Key Responsibilities:
- Design and develop infrastructure as code using AWS Terraform.
- Implement solutions to monitor and operate the Snowflake platform.
- Automate routine tasks to enhance system reliability and efficiency.
- Perform capacity planning and manage the scalability of the Snowflake system.
- Collaborate with crossfunctional teams to resolve operational issues and optimize system performance.
- Participate in oncall rotation and respond to incidents to minimize downtime.
- Conduct regular reviews of system configurations and architecture to identify and address potential weaknesses.
- Ensure compliance with security and regulatory requirements.
- Contribute to the evolution and improvement of the Snowflake platform by providing insights and recommendations.
- Document processes procedures and configurations related to infrastructure and operations.
- Participate in the evaluation and implementation of new technologies and tools to enhance the platform.
- Collaborate with development teams to improve application performance and resilience.
- Conduct performance testing and analysis to optimize system efficiency and reliability.
- Stay updated with industry trends and best practices in cloud infrastructure and reliability engineering.
- Ability and experience with the development of processes and procedures to standardize Database configuration
- Extensive experience with implementation and maintenance of Disaster Recovery and High availability.
- Ability to work on unusually complex technical problems and provide solutions that are highly innovative and ingenious.
- Ability to provide technical documentation and project plans for technical staff members.
- Excellent communication presentation and customer relationship skills.
- Excellent organizational and time management skills to handle multiple tasks simultaneously
Required Qualifications:
- Bachelors degree in Computer Science Information Technology or related field.
- Minimum 5 years of handson administration experience from setting up the Snowflake environments to successfully administering it (AWS preferred)
- Experience automating scripting and streamlining processes for efficiency and accuracy utilizing Unix shell scripting Python etc.
- Some experience with AWS with knowledge of S3 EC2 VPC IAM Security networking etc.
- Demonstrated expertise in configuring and maintaining AWS infrastructure using Terraform.
- Proven experience in designing and building faulttolerant scalable and secure cloud solutions.
- Strong proficiency in scripting and automation using Python Shell or similar languages.
- Experience with monitoring and logging tools such as CloudWatch Prometheus and Grafana.
- Familiarity with containerization and orchestration technologies like Docker and Kubernetes.
- Solid understanding of network protocols security principles and best practices.
- Ability to troubleshoot complex issues and perform root cause analysis effectively.
- Excellent communication skills and the ability to collaborate effectively with diverse teams.
- Certifications in AWS and/or Snowflake are a plus.
- Experience in Agile DevOps or Site Reliability Engineering practices is desirable.
- Ability to work in a fastpaced dynamic environment with a focus on continuous improvement.
- Strong problemsolving skills and a proactive approach to system reliability and performance.
Interested parties can reach us at with below details
Total Exp:
Rel Exp:
Notice Period:
Current CTC (fixed variable in detail):
Expected CTC:
LinkedIn Url:
Reason For Change:
Current Location:
unix shell scripting,devops,python,s3,iam,security,ec2,agile,aws,vpc,site reliability engineering,networking,terraform,disaster recovery,kubernetes,reliability,prometheus,snowflake,high availability,docker,grafana