Build systems and infrastructure for monitoring complex large-scale distributed systems
Identify stability and performance issues and collaborate with developers to triage critical issues in production systems
Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
Devise ways to actively monitor system throughput capacity and reliability
Debug complex systems and evolve a running environment without causing downtime
Engage in service capacity planning and demand forecasting as well as software performance analysis and system tuning
Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
Monitor and troubleshoot Elasticsearch performance issues and outages
Qualifications :
Fundamental knowledge of technologies across a broad range of disciplines including virtualization storage networking server and security
Bachelors degree in computer science or equivalent work experience as a System Administrator with programming skills
Understanding of systems and application design including the operational trade-offs of various designs
Experience with monitoring and logging solutions such as Prometheus Grafana and ELK stack
Proficiency in scripting languages such as Python
Experience with infrastructure-as-code tools such as Terraform or CloudFormation
Strong understanding of Linux system administration and networking concepts
Demonstrable knowledge of Unix TCP/IP HTTP web application security and experience supporting multi-tier web application architectures
Experience in analyzing logs and troubleshooting large-scale distributed systems
WOULD BE A PLUS
Experience with instrumenting and monitoring production systems using tools such as ELK stack Zabbix Nagios Statsd/Graphite APM etc
Experience with Amazon AWS Infrastructure (including EC2 S3 VPC Security Groups RDS) and related services is desirable
Practical knowledge of Docker Vagrant and configuration management tools like Ansible Chef or Puppet
Experience with one or more general-purpose programming or scripting languages including but not limited to Python Bash Perl or Go
Additional Information :
PERSONAL PROFILE
Excellent troubleshooting and problem-solving skills
Ability to work independently and collaboratively in a fast-paced environment
Strong communication and interpersonal skills
Excellent organizational time management and communication skills
Remote Work :
No
Employment Type :
Full-time
At Sigma Software, we are involved with the clients team to contribute to the design and development of a technical solution for their tokenized domain reservation platform. We started by assigning a software architect to design the smart contracts and integrate blockchain into the s ... View more