drjobs Senior AI Reliability Engineer

Senior AI Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bellevue - USA

Monthly Salary drjobs

$ 107300 - 193500

Vacancy

1 Vacancy

Job Description

At T-Mobile we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant employee stock purchase plan 401(k) and access to free year-round money coaches. Thats how were UNSTOPPABLE for our employees!

Job Overview
As a Senior AI Reliability Engineer you will play a critical role in ensuring the operational excellence scalability and performance of AI-powered platforms and services at T-Mobile. This role requires strong SRE fundamentals experience in managing LLM-based services and APIs and the ability to drive observability and reliability for GenAI systems across cloud environments.

We pride ourselves on encouraging a culture of innovation advocating for agile methodologies and promoting transparency in all that we do. Join us in embodying the spirit of the Un-carrier and make a tangible impact! Our team is dynamic where no day is the same and we are diverse and inclusive passionate about grow

Job Responsibilities:

  • Implement observability tools dashboards and SLO frameworks for LLM-based services and inference pipelines.

  • Monitor and improve the health latency and throughput of AI infrastructure in multi-cloud (primarily Azure) and hybrid environments.

  • Manage incident detection response and root cause analysis (RCA) for production issues affecting AI services.

  • Support cost attribution and token usage observability using tools like Weave Splunk OpenSearch and Grafana.

  • Operationalize and support AI services such as ChatGPT Infobot Glean and AI Gateway in partnership with platform and architecture teams.

  • Automate deployment monitoring and rollback processes using CI/CD and IaC pipelines (e.g. Terraform Azure DevOps).

  • Contribute to incident playbooks runbooks and knowledge base documentation for GenAI systems.

  • Partner with engineering product and compliance teams to enforce policy and governance on LLM usage and API integrations.

  • Contribute to ongoing AI projects like Weave auto-eval GC Academy or OVA SRE support as needed.

  • Gitlab deployment pipeline and software SDLC deployment flow including new feature definition development testing and deployment


Education and Work Experience:

  • Bachelors Degree Computer Science Engineering or related field (Preferred)

  • Masters/Advanced Degree Computer Science Engineering or related field (Preferred)

  • 4-7 years Working in operations or develops environments Required

  • 4-7 years Solving customer related issues and managing customer relationships Required

  • 4-7 years Developing software solutions using Python or similar programming languages Required

  • Experience supporting or integrating with APIs from LLM providers (e.g. OpenAI Azure OpenAI Glean).

  • Strong understanding of service reliability concepts: monitoring alerting SLOs RCA chaos testing.

  • Familiarity with container orchestration (Kubernetes) CI/CD systems and IaC tools (Terraform ARM etc.).

  • Hands-on experience working in Azure (preferred) AWS or GCP.

  • AI RAG experience

  • Gitlab deployment pipeline and software SDLC deployment flow including new feature definition development testing and deployment

Preferred Qualifications

  • Exposure to GenAI/LLMOps tools like LiteLLM Weights & Biases LLMJ scoring or Prompt Engineering telemetry.

  • Experience supporting secure API gateways and centralized LLM access platforms (e.g. Solo Gloo Gateway).

  • Familiarity with AI governance and compliance practices in enterprise settings.

  • Experience with real-time systems auto-evaluation platforms or AI assistant integration (e.g. Infobot Coach Assist).


Knowledge Skills and Abilities:

  • Programming Proficiency in programming and scripting languages such as Python and Bash. (Required)

  • Automation Ability to automate processes and reduce manual effort. (Required)

  • Incident Management Understanding of incident response management and operational support. (Required)

  • Experience with designing and maintaining CICD Pipelines. (Required)

  • Ability to learn new skills and technologies quickly and adapt to changing circumstances. (Required)

  • Understanding of system reliability and resilience principles. (Required)

  • Ability to drive innovation and improve software development and deployment processes. (Preferred)

  • Experience with cloud native platforms. (Preferred)

  • At least 18 years of age
  • Legally authorized to work in the United States

Travel:
Travel Required (Yes/No): No

DOT Regulated:
DOT Regulated Position (Yes/No): No
Safety Sensitive Position (Yes/No): No

Base Pay Range: $107300 - $193500

Corporate Bonus Target: 15%

The pay range above is the general base pay range for a successful candidate in the role. The successful candidates actual pay will be based on various factors such as work location qualifications and experience so the actual starting pay will vary within this range.

At T-Mobile employees in regular non-temporary roles are eligible for an annual bonus or periodic sales incentive or bonus based on their role. Most Corporate employees are eligible for a year-end bonus based on company and/or individual performance and which is set at a percentage of the employees eligible earnings in the prior year. Certain positions in Customer Care are eligible for monthly bonuses based on individual and/or team performance. To find the pay range for this role based on hiring location T-Mobile our benefits exemplify the spirit of One Team Together! A big part of how we care for one another is working to ensure our benefits evolve to meet the needs of our team members. Full and part-time employees have access to the same benefits when eligible. We cover all of the bases offering medical dental and vision insurance a flexible spending account 401(k) employee stock grants employee stock purchase plan paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave family building benefits back-up care enhanced family support childcare subsidy tuition assistance college coaching short- and long-term disability voluntary AD&D coverage voluntary accident coverage voluntary life insurance voluntary disability insurance and voluntary long-term care insurance. We dont stop there - eligible employees can also receive mobile service & home internet discounts pet insurance and access to commuter and transit programs! To learn about T-Mobiles amazing benefits check out.

Never stop growing!
As part of the T-Mobile team you know the Un-carrier doesnt have a corporate ladderits more like a jungle gym of possibilities! We love helping our employees grow in their careers because its that shared drive to aim high that drives our business and our culture forward. By applying for this career opportunity youre living our values while investing in your career growthand we applaud it. Youre unstoppable!

T-Mobile USA Inc. is an Equal Opportunity Employer. All decisions concerning the employment relationship will be made without regard to age race ethnicity color religion creed sex sexual orientation gender identity or expression national origin religious affiliation marital status citizenship status veteran status the presence of any physical or mental disability or any other status or characteristic protected by federal state or local law. Discrimination retaliation or harassment based upon any of these factors is wholly inconsistent with how we do business and will not be tolerated.

Talent comes in all forms at the Un-carrier. If you are an individual with a disability and need reasonable accommodation at any point in the application or interview process please let us know by emailing or calling 1-. Please note this contact channel is not a means to apply for or inquire about a position and we are unable to respond to non-accommodation related requests.



Required Experience:

Senior IC

Employment Type

Full-Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.