Job Title: Sr. SRE Engineer
Location: San Diego CA (Remote)-PST Zone Candidates (preferred)
Duration: 12 Months (Contract)
Interview: Video Interview
Visa- USC/ GC Only (must work on w2) (need strong communication)
Job Description
Candidates must also be local to the area
The Site Reliability Engineer (SRE) will work closely with cross-functional teams including software development platform and operations to support the availability and performance of our cloud-based systems. You will take ownership of the cloud infrastructure support automation and implement monitoring and alerting systems to proactively manage issues.
Key Responsibilities:
Cloud Infrastructure Management:
o Design deploy and maintain scalable secure and highly available cloud infrastructure on AWS and Azure.
o Proficient in infrastructure-as-code (Terraform AWS CDK and CloudFormation) and scripting languages (TypeScript PowerShell or Go-Lang).
o Ensure cloud environments adhere to regulatory standards for healthcare data security and familiarity with (e.g. SOC II and ePHI compliance).
Observability and Monitoring:
o Implement configure and optimize Datadog for application and infrastructure monitoring ensuring full-stack visibility into system performance.
o Set up alerting mechanisms for critical metrics (e.g. system health latency error rates) and establish runbooks for incident response.
o Develop and maintain dashboards to provide real-time insights into system performance.
Performance Optimization & Troubleshooting:
o Identify and resolve performance bottlenecks and ensure the reliability and scalability of production systems.
o Perform root cause analysis for incidents and participate in on-call rotations to manage critical system incidents.
o Drive improvements to system architecture security and disaster recovery strategies.
Collaboration & DevOps Enablement:
o Work closely with development teams to incorporate CI/CD pipelines and foster a culture of infrastructure as code and automation.
o Collaborate with security and compliance teams to ensure systems meet all regulatory and security requirements.
o Promote best practices for software delivery system monitoring and infrastructure scalability.
Security & Compliance:
o Work with the compliance and cybersecurity teams to maintain healthcare data security ensuring that systems are SOC II and ePHI compliant.
o Implement security best practices within cloud environments including encryption IAM and regular audits.
Qualifications:
Bachelors degree in computer science Engineering or related field or equivalent practical experience.
3 years of experience as a Site Reliability Engineer managing infrastructure on AWS and/or Azure.
Experience with monitoring and observability tools (Prometheus Grafana Datadog etc.).
Expertise in Terraform CloudFormation AWS CDK or similar infrastructure-as-code technologies.
Proficiency in container orchestration and management (e.g. Docker Kubernetes).
Knowledge of automation tools (e.g. Ansible Puppet Chef).
Familiarity with CI/CD pipeline tools such as Jenkins GitHub Actions or Azure DevOps.
Experience with healthcare data security and compliance (e.g. SOC II and ePHI requirements) is a plus.
Excellent problem-solving and troubleshooting skills.
Strong collaboration and communication skills.
Nice to Have:
Experience working in a regulated industry particularly healthcare or medical devices.
Certifications such as AWS Certified Solutions Architect Azure Administrator or Certified Kubernetes Administrator (CKA).
Experience with AI/ML models for predictive maintenance and performance monitoring.
Familiarity with serverless architectures (e.g. AWS Lambda Azure Functions).
Any Additional Information
Strong analytical and decision-making abilities
Able to build strong partnership with business partners and the project teams
Takes responsibility for delivering superior value and client service
Works well with people who have diverse abilities experiences and perspectives
Influences others without direct authority
Approaches opportunities and issues with an optimistic action-oriented and solution-based approach.
Good writing skills to document plans and process
Job Title: Sr. SRE Engineer Location: San Diego CA (Remote)-PST Zone Candidates (preferred) Duration: 12 Months (Contract) Interview: Video Interview Visa- USC/ GC Only (must work on w2) (need strong communication) Job Description Candidates must also be local to the area The Site Reliability Eng...
Job Title: Sr. SRE Engineer
Location: San Diego CA (Remote)-PST Zone Candidates (preferred)
Duration: 12 Months (Contract)
Interview: Video Interview
Visa- USC/ GC Only (must work on w2) (need strong communication)
Job Description
Candidates must also be local to the area
The Site Reliability Engineer (SRE) will work closely with cross-functional teams including software development platform and operations to support the availability and performance of our cloud-based systems. You will take ownership of the cloud infrastructure support automation and implement monitoring and alerting systems to proactively manage issues.
Key Responsibilities:
Cloud Infrastructure Management:
o Design deploy and maintain scalable secure and highly available cloud infrastructure on AWS and Azure.
o Proficient in infrastructure-as-code (Terraform AWS CDK and CloudFormation) and scripting languages (TypeScript PowerShell or Go-Lang).
o Ensure cloud environments adhere to regulatory standards for healthcare data security and familiarity with (e.g. SOC II and ePHI compliance).
Observability and Monitoring:
o Implement configure and optimize Datadog for application and infrastructure monitoring ensuring full-stack visibility into system performance.
o Set up alerting mechanisms for critical metrics (e.g. system health latency error rates) and establish runbooks for incident response.
o Develop and maintain dashboards to provide real-time insights into system performance.
Performance Optimization & Troubleshooting:
o Identify and resolve performance bottlenecks and ensure the reliability and scalability of production systems.
o Perform root cause analysis for incidents and participate in on-call rotations to manage critical system incidents.
o Drive improvements to system architecture security and disaster recovery strategies.
Collaboration & DevOps Enablement:
o Work closely with development teams to incorporate CI/CD pipelines and foster a culture of infrastructure as code and automation.
o Collaborate with security and compliance teams to ensure systems meet all regulatory and security requirements.
o Promote best practices for software delivery system monitoring and infrastructure scalability.
Security & Compliance:
o Work with the compliance and cybersecurity teams to maintain healthcare data security ensuring that systems are SOC II and ePHI compliant.
o Implement security best practices within cloud environments including encryption IAM and regular audits.
Qualifications:
Bachelors degree in computer science Engineering or related field or equivalent practical experience.
3 years of experience as a Site Reliability Engineer managing infrastructure on AWS and/or Azure.
Experience with monitoring and observability tools (Prometheus Grafana Datadog etc.).
Expertise in Terraform CloudFormation AWS CDK or similar infrastructure-as-code technologies.
Proficiency in container orchestration and management (e.g. Docker Kubernetes).
Knowledge of automation tools (e.g. Ansible Puppet Chef).
Familiarity with CI/CD pipeline tools such as Jenkins GitHub Actions or Azure DevOps.
Experience with healthcare data security and compliance (e.g. SOC II and ePHI requirements) is a plus.
Excellent problem-solving and troubleshooting skills.
Strong collaboration and communication skills.
Nice to Have:
Experience working in a regulated industry particularly healthcare or medical devices.
Certifications such as AWS Certified Solutions Architect Azure Administrator or Certified Kubernetes Administrator (CKA).
Experience with AI/ML models for predictive maintenance and performance monitoring.
Familiarity with serverless architectures (e.g. AWS Lambda Azure Functions).
Any Additional Information
Strong analytical and decision-making abilities
Able to build strong partnership with business partners and the project teams
Takes responsibility for delivering superior value and client service
Works well with people who have diverse abilities experiences and perspectives
Influences others without direct authority
Approaches opportunities and issues with an optimistic action-oriented and solution-based approach.
Good writing skills to document plans and process
View more
View less