Sr. Staff SRE
San Diego, CA - USA
Job Summary
Why Lytx:
At Lytx our engineering culture is built around being hungry low-ego and highly capable. We are pragmatic engineers who take ownership collaborate openly and focus on delivering measurable operationalimpact. Our mission is to designoperate and continuously evolve the cloud infrastructure and operational platforms that power mission-critical SaaS and IoTservices atglobal scale.
We are growing rapidly and expanding the use of AI across our platform and engineering operations. As our systems scale in complexity and business criticality we are investing in next-generation observability intelligent automation and AIOps capabilities to enable proactive insight-driven operations.
The Site Reliability Engineering (SRE) organizationis responsible forthe availability reliability observability and resilience of our cloud-native environments. This includes buildingthe operationalplatforms telemetry strategy and automation frameworks that allow engineering teams tooperateconfidently and efficiently.
This role sits at the center of operational intelligence for the company. As a Sr. Staff SRE you will define the technical vision for observability and operational automation influence architecture across the organization and lead initiatives that reduce operational risk improve system insight and enable predictive automated response at scale.
If you enjoy building foundational platforms shaping engineering standards and driving the evolution toward AI-enabled operations this roleprovidesan opportunity to havebroadorganizational impact.
Responsibilities /Youllget to:
Strategic Technical Leadership:Define and drive the long-term strategy for observability operational intelligence and reliability engineering across the organization aligning technical direction with business growth customer experience and service-levelobjectives.
Operational Intelligence & AIOps:Lead the evolution toward intelligent operations by designing capabilities such as event correlation anomaly detection alert noise reduction predictive signal detection and automated remediation to improve MTTD MTTR and operational efficiency.
Observability Platform Architecture:Architect and lead the end-to-end observability platform across metrics logs traces and events. Establish scalable telemetry standards instrumentation patterns and onboarding models that enable consistent visibility across AWS and cloud-native services.
Automation at Scale:Drive large-scale automation initiatives that reduce operational toil including self-service infrastructure workflows policy-as-code guardrails reliability automation and automatedresponsefor common failure scenarios.
Reliability & Resilience Engineering:Partner with product platform and data teams to embed reliability performance cost efficiency and fault tolerance into system design. Lead capacity modeling resilience planning and architecture improvements for multi-AZ and multi-region environments.
Incident Leadership & Continuous Learning:Provide technical leadership during high-severity incidents and guide blameless postmortems thatidentifysystemic issues and drive long-term reliability improvements.
Organizational Standards & Governance:Define and standardize SLO/SLI frameworks error budget practices telemetry conventions and infrastructure patterns to ensure consistent operational excellence across teams.
Innovation & Technology Evaluation:Evaluate and introduce emerging AWS-native cloud-native and AI-enabled observability and automation technologies. Leadproofs-of-conceptand guide organization-wide adoption.
Mentorship & Influence:Mentor Staff and Senior SREs raising the bar for system design operational rigor and engineering judgment while fostering a culture of ownership learning and continuous improvement.
Cross-Organizational Influence:Act as a senior technical authority for reliability and observability shaping engineering roadmaps and influencing architectural decisions across product and platform domains.
Requirements /YoullNeed:
810 years of experience in SRE platform engineering or cloud infrastructure roles supporting large-scale production environments.
Demonstrated experience leading architecture reliability strategy or operational platforms across multiple teams or organizational domains.
Proventrack recordoperating in 24/7 production environments including incident leadership postmortem practices and proactive reliability management.
Cloud & Architecture
Deepexpertisedesigning andoperatinglarge-scale AWS environments including services such as VPC EC2 EKS/ECS RDS/DynamoDB S3 ALB/NLB IAM KMS Route 53 and multi-account architectures.
Experience designing resilient fault-tolerant systems using multi-AZ/multi-region patterns graceful degradation rate limiting and capacity management.
Observability & Operational Intelligence
Senior-level experience with observability platforms (metrics logs traces events) such as New Relic Datadog Prometheus/GrafanaOpenTelemetry or similar.
Experience defining telemetry standards instrumentation strategies centralized dashboards andlow-noisealerting practices.
Experience improving operational signal quality through correlation noise reduction or advanced analytics.
AIOps / Intelligent Automation (Preferred)
Experience implementing or evaluating AIOps capabilities such as anomaly detection event correlation predictive alerting automated remediation or AI-assisted incident analysis.
Familiarity with applying machine learning or AI techniques to operational data incident trends or reliability workflows.
Automation & Infrastructure as Code
Expert-level experience with Infrastructure-as-Code using Terraform and/or CloudFormation including reusable modulesGitOpsworkflows and policy-as-code guardrails.
Strong scripting or programming skills (Python Go Bash or similar) for automation and operational tooling.
Systems & Platform Expertise
Expertunderstanding of Linux systems networking (TCP/IP DNS TLS) and distributed system behavior.
Expertwith Kubernetes and cloud-native architecture patterns.
Leadership & Impact
Demonstrated ability to influence technical direction without direct authority.
Experience mentoring senior engineers and setting organization-wide engineering standards.
Ability tooperateeffectively in complex high-impactenvironmentsand drive initiatives from concept through adoption.
Benefits:
- Medical dental and vision insurance
- Health Savings Account
- Flexible Spending Accounts
- Telehealth
- 401(k) and 401(k) match
- Life and AD&D insurance
- Short-Term and Long-Term Disability
- FTO or PTO
- Employee Well-Being program
- 11 paid holidays plus 1 inclusive holiday per year
- Volunteer Time Off
- Employee Referral program
- Education Reimbursement Program
- Employee Recognition and Appreciation program
- Additional perk and voluntary benefit programs
Salary is based on a number of factors including market location and may vary depending on job-related knowledge skills and experience. This position is also eligible for an incentive compensation plan. The expected hiring salary for this position is:
$207000.00 - $261000.00Youre driven to succeed and so are we. At Lytx our mission is to protect a world in motion and we do it by building technology and partnerships that help keep people safe on the road. The way we work is guided by our shared values: Deliver for the customer Responsibility in every outcome Innovate with purpose Velocity with excellence and Elevate each other.
If youre looking for meaningful work a team that challenges and supports you and the chance to grow your career while making a real impact wed love to meet you.
Together were helping make roadways safer and saving lives!
Lytx Inc. is proud to be an equal opportunity employer. Were committed to building a diverse and inclusive workforce and do not discriminate based on race color religion sex sexual orientation gender identity or expression gender genetic information uniformed service national origin age veteran status disability pregnancy or any other status protected by federal or state law. We are committed to providing reasonable accommodation for candidates with disabilities who need assistance during the hiring process. To request a reasonable accommodation please email . Lytx conducts background checks on applicants who receive a conditional offer of employment in accordance with applicable local state federal and regional laws. Qualified applicants with arrest or conviction records will be considered. Background check results may potentially result in the withdrawal of a conditional offer of employment and will be made in accordance with all applicable local state federal and regional laws.
Required Experience:
Staff IC
About Company
Since 1998, Lytx has led the video telematics industry using proprietary machine vision, artificial intelligence, and big data to protect and connect thousands of fleets and millions of drivers in more than 85 countries worldwide. At Lytx, you'll be a part something good - helping sav ... View more