Systems Engineer – Site Reliability Engineering
Bethesda, MD - USA
Job Summary
JOB SUMMARY:
The Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability scalability and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management driving automation efforts and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams Applications teams Infrastructure and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability availability and performance. The ideal candidate will bring strong communication skills collaborating with key stakeholders across the company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic fast-paced environment.
CANDIDATE PROFILE:
Required:
Undergraduate degree in an engineering or computer science discipline and/or equivalent experience/certification
5 years of hands-on experience in designing building and operating production grade systems including:
2 years of experience as a Site Reliability Engineer (SRE) building and managing highly available and mission critical systems
Deep understanding of SRE practices such as Service Level Objectives Error Budgets Toil Management Observability & Monitoring Blameless Postmortems Incident Response Process Capacity Planning
Expertise in AWS services including designing highly available multi-AZ and multi-region architectures for example:
Compute: EC2 Auto Scaling Lambda
Containers: EKS (Mandatory) ECS (good to have)
Networking: VPC subnets route tables NAT gateways Transit Gateway
Security: IAM roles/Policies KMS Secret manager
Storage and Databases: S3 EBS EFS RDS DocumentDB.
Proven automation and programming experience in one or more of the following languages: Python PowerShell
Experience using modern continuous development techniques and pipelines (e.g. Agile Kanban Jira CI/CD Helm Harness Jenkins Git Artifactory Vault)
Experience designing and implementing end-to-end observability solutions across metrics logs and traces using tools like Prometheus Grafana ELK Stack and OpenTelemetry.
Hands-on experience with Linux administration (RHEL Ubuntu CentOS AWS Linux)
Experience troubleshooting API-related issues in distributed systems including latency authentication/authorization failures rate limiting and upstream/downstream dependency failures.
Experience with containerization orchestration engines such as Kubernetes (EKS AKS ACK)
Familiarity with service mesh technologies to enable secure and resilient service communication including mTLS traffic shaping and policy enforcement.
Familiarity with Infrastructure as Code (Iac) tools like Terraform and CloudFormation.
Familiarity with configuration management and automation tools such as Ansible.
Familiarity with vulnerability management OS hardening patching security compliance of infrastructure applications and databases
Understanding of basic networking fundamentals
Preferred:
Experience driving cloud cost optimization initiatives (rightsizing reserved instances autoscaling strategies cost observability)
Networking expertise including Load Balancing Firewalls Security Groups NACLs TCP/IP DNS HTTP/HTTPS SSL/TLS etc
CORE WORK ACTIVITIES:
Ensure the reliability availability and performance of mission-critical cloud services implementing best practices for monitoring alerting and incident management.
Oversee the management of high-severity incidents driving quick resolution and post-incident analysis to identify root causes and prevent recurrence.
Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand optimizing cloud and on-prem infrastructure and resource usage.
Develop and execute the SRE strategy aligned with business goals and communicate service health reliability and performance metrics to senior leadership and stakeholders
Drive Applications Performance Management and Monitoring:
Assess application architectures to identify key monitoring points
Identify Key Performance Indicators apply monitoring and report out on compliance.
Gather information to develop reporting metrics and KPIs
Ensure that all applications adhere to appropriate monitoring standards based on their technology/business process
Determine forums and cadence to provide regular monitoring updates
Building Successful Relationships:
Collaborates with Enterprise Application and Architecture and Infrastructure teams to continuously improve processes and procedures.
Liaises with vendors and Service Providers to select services and tools that best meet company goals
Managing Projects and Priorities:
Develops specific goals and plans to prioritize organize and accomplish work.
Champions leaders vision for product and service delivery.
Executes the necessary decisions to keep moving forward toward achievement of goals.
Determines priorities schedules plans and necessary resources to promote completion of any projects on schedule.
Delivering on the Needs of Key Stakeholders:
Understands and meets the needs of key stakeholders.
Communicates concepts in a clear and persuasive manner that is easy to understand.
Demonstrates an understanding of business priorities.
Supports achievement of performance goals budget goals team goals etc.
Providing Technical Support and Consultation:
Provides technical expertise within own and other teams.
Provides recommendations to improve the effectiveness of processes and programs.
Demonstrates advanced knowledge of job-relevant issues products systems and processes.
Keeps up-to-date technically and applies new knowledge to job.
Performs other reasonable duties as required for this position.
At Marriott International we are dedicated to being an equal opportunity employer welcoming all and providing access to opportunity. We actively foster an environment where the unique backgrounds of our associates are valued and greatest strength lies in the rich blend of culture talent and experiences of our are committed to non-discrimination on any protected basis including disability veteran status or other basis protected by applicable law.
Required Experience:
IC
About Company
At Le Méridien, we are inspired by the era of glamorous travel, celebrating each culture through the distinctly European spirit of savouring the good life. Our guests are curious and creative, cosmopolitan culture seekers that appreciate moments of connection and slowing down to savou ... View more