Employer Active
Job Title: Site Reliability Engineer
Location: Remote EST
Duration: 6 Months
Visa: USC and GC
Look for Education/publishing domain exp.
Job Description:
We are looking for an adventurous Senior Site Reliability Engineer who loves AWS technologies. You will be a member of an engineering team where collaboration and innovation are a key focus. As part of this team you will design build deploy and monitor software and infrastructure that delivers new features to the market. Be prepared to explore new technologies and design concepts as an integral part of your job.
What you will be doing
Partner with engineering security and product teams to keep our services reliable available fast and cost efficient
Build tools and automation that eliminates repetitive tasks minimizes downtime achieves human free operations and provides selfservice solutions to product development teams
Design build and operate largescale production systems hosted within our onprem and AWS hosting environments
Lead technology initiatives that drive scalability and reliability improvements
Advocate and implement reliable design patterns (e.g. circuit breakers graceful degradation)
Share an oncall rotation with your team and respond to incidents; lead triage efforts and provide needed status updates
Skills and Qualifications:
7 years of industry experience
4 years of full stack software engineering experience in one or more of the following programming languages: Java Go C# or Python
3 years deploying operating and debugging server software on Linux. Comfortable diagnosing and resolving common system issues.
Deep experience implementing infrastructure as code with Terraform
You have designed built and operated highly available AWS ECS EKS or independent K8s clusters.
Strong knowledge of common AWS technologies like ELB CloudFront EC2 RDS ElastiCache S3 ElasticSearch IAM and Route 53
You have participated in a 24x7 oncall rotation with your team and responded to incidents
Proficient with APM infrastructure and log aggregation tooling to monitor system health and customer experience (e.g. New Relic OpenTelemetry Cloudwatch Sumologic ELK)
A proven track record of diagnosing and fixing time sensitive and critical production issues
Experience developing and maintaining ci/cd pipelines (e.g. jenkins circleci git gitflow sonarqube blue/green)
Big Pluses
Ansible Cloudformation Packer
Database administration skills (AWS Aurora MySQL Postgres Oracle)
Have leveraged deployment strategies such as bluegreen and canary
Experience building RESTful services and/or web applications
Experience automating software deployments and following a continuous delivery and deployment model
Experience with system analysis and troubleshooting in largescale Linux environment
People who have been successful in this role:
Passionate and adept at software development and/or system engineering
Love to understand how new technologies and architectures work educate coworkers and channel their knowledge into improving system reliability and performance
Continuously learning about application scalability availability reliability and security
Intensely curious about how complex distributed systems operate and fail at scale
Think freely and independently and are ready to share their views
Eager to learn from mistakes and socialize the lessons learned
Like to take ownership of infrastructure components and leading projects
Full Time