Role: Lead Site Reliability Engineer with Java
Location: San Antonio Texas
Relevant Experience: 15 Years
Job Description & Key Responsibilities:
As a Lead Site Reliability Engineer (SRE) you will leverage your extensive experience in SRE practices to maintain and enhance the reliability performance and scalability of missioncritical systems. You will play a crucial role in ensuring the continuous availability and optimal functioning of our services.
Key Responsibilities:
- SeniorLevel SRE Expertise: Apply your deep understanding of SRE principles to lead efforts in improving system reliability and operational efficiency.
- Incident Management: Provide expertlevel support during incidents ensuring swift resolution with minimal service disruption. Lead postincident reviews to drive continuous improvement.
- Monitoring & Alerting: Design implement and optimize monitoring alerting and incident response processes. Ensure the effectiveness of these systems to proactively address potential issues.
- Automation: Drive the automation of manual processes to enhance operational efficiency reduce human error and increase overall system resilience.
- CI/CD Pipeline Management: Develop maintain and improve automated CI/CD pipelines using tools such as GitLab CI/CD and Jenkins ensuring seamless and reliable deployment processes.
- CrossFunctional Collaboration: Work closely with crossfunctional teams to ensure the reliability performance and scalability of our infrastructure. Foster a culture of collaboration and knowledge sharing.
- Support Across Time Zones: Provide support across all U.S. time zones with the flexibility to work weekends rotational shifts and overtime as required to maintain service continuity.
Required Skills & Qualifications:
- Java Programming: Advanced proficiency in Java with a deep understanding of contemporary software development practices.
- Kubernetes & Containerization: Extensive handson experience with Kubernetes including containerization technologies like Docker and Kubernetes storage solutions such as Portworx.
- Linux/Unix Systems: Strong command of Linux/Unix operating systems and Shell Scripting (BASH) with a focus on system reliability and automation.
- Functional Programming: Proficiency in functional programming languages such as Prolog Haskell and OCaml.
- Scripting & Automation: Experience with Python or Go particularly in the context of scripting and automation tasks.
- Virtualization: Indepth knowledge of VMware and other virtualization platforms with a focus on optimizing virtual environments for reliability and performance.
- Streaming Technologies: Expertise with Kafka Stream Generator KSQLDB cluster federation and Spark Streams including experience in managing and optimizing streaming data architectures.
- Service Mesh & Networking: Familiarity with Istio and Anthos Service Mesh with the ability to manage and optimize service meshes for complex environments.
- Performance Monitoring & Debugging: Proficiency in using EBPF (Extended Berkeley Packet Filter) for performance monitoring and debugging.
- Monitoring & Logging Tools: Experience with industrystandard monitoring and logging tools such as Splunk Prometheus Datadog and Kiali.
- Load Balancing: Familiarity with Nginx Controller and Seesaw for effective load balancing and traffic management.
- InfrastructureasCode (IaC): Competence in using Terraform for managing cloud infrastructure ensuring consistency and scalability across environments.
Additional Requirements:
- Flexibility: Willingness to work weekends rotational shifts and provide 24/7 support as necessary to maintain service reliability and meet project deadlines.
- Required: Kubernetes
-
Regards
Manoj
Derex Technologies INC
Contact : Ext 206
Additional Information :
All your information will be kept confidential according to EEO guidelines.
Remote Work :
No
Employment Type :
Fulltime