Were looking for a smart motivated and results-oriented Software Development Engineer II who is passionate about building AI-driven automation systems that transform operational complexity into seamless customer experiences. Youll be instrumental in developing the intelligent systems automation frameworks and operational tooling that enable autonomous management of hundreds of thousands of live events and linear stations.
This role sits at the intersection of AI/ML distributed systems and operational excellence. Youll build systems that learn from operational patterns make intelligent decisions about playback quality and automatically resolve issues before customers notice. Your work will directly impact whether millions of viewers experience perfect playback.
Key job responsibilities
As an SDE II on our team you will:
Develop AI-powered operational intelligence systems that analyze telemetry data detect anomalies and make autonomous decisions about event health and intervention strategies
Build automated incident response frameworks that reduce mean time to recovery through intelligent root cause analysis and automated remediation
Create scalable automation pipelines that handle event lifecycle managementfrom technical on boarding and readiness validation to live execution and post-event analysis
Design customercentric quality assessment systems that evaluate playback from the viewers perspective not just technical metrics ensuring our interventions improve rather than disrupt the viewing experience
Implement predictive analytics capabilities that identify potential failures before they occur learning from historical patterns to prevent recurring issues
Develop operational tooling and dashboards that provide real-time visibility into system health AI decision-making and quality metrics across our global operations
Youll work with technologies spanning machine learning frameworks distributed systems real-time data processing and cloud infrastructure. Your systems will need to operate reliably at Amazon scalehandling massive telemetry volumes making sub-second decisions and maintaining 24/7 availability across global time zones.
A day in the life
You start your morning analyzing overnight performance data from your automated quality system. During a live sports event it detected micro-stuttering but intelligently chose not to interveneavoiding worse disruption for viewers. You investigate the root cause discovering patterns in CDN behavior during high-concurrency scenarios.
You design and implement ML-enhanced detection logic validate it against production data and document everything thoroughly. Code reviews architecture discussions and collaboration with data scientists fill your afternoon as you continuously improve the systems predictive capabilities.
By evening your automation is autonomously managing live events for millions of viewersdetecting analyzing and resolving issues before operations teams even notice.
Youre building intelligent systems at the intersection of AI distributed systems and live videowhere impact is immediate and every improvement prevents real incidents at massive scale.
About the team
The Prime Video Playback Operations team ensures flawless viewing experiences for millions of customers watching live sports breaking news concerts and linear programming worldwide. We operate 24/7 across global time zones managing over hundreds of thousands annual events todaya number projected to grow by triple-digit percentage year over year.
Were in the midst of a transformative journey to build AI-powered autonomous operations system. We except that all operational decisions are diagnosed and mitigated independently. This isnt incremental improvement; its re-imagining how operations scale.
- 3 years of non-internship professional software development experience
- 2 years of non-internship design or architecture (design patterns reliability and scaling) of new and existing systems experience
- 3 years of programming with at least one software programming language experience
- 3 years of full software development life cycle including coding standards code reviews source control management build processes testing and operations experience
- Bachelors degree in computer science or equivalent
- Experience in machine learning data mining information retrieval statistics or natural language processing
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit
for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.