Amazon Music is awash in data! To help make sense of it all the DISCO (Data Insights Science & Optimization) team: (i) enables the Consumer Product Tech org make data driven decisions that improve the customer retention engagement and experience on Amazon Music. We build and maintain automated selfservice data solutions data science models and deep dive difficult questions that provide actionable insights. We also enable measurement personalization and experimentation by operating key data programs ranging from attribution pipelines northstar weblabs metrics to causal frameworks. (ii) delivering exceptional Analytics & Science infrastructure for DISCO teams fostering a datadriven approach to insights and decision making. As platform builders we are committed to constructing flexible reliable and scalable solutions to empower our customers. (iii) accelerates and facilitates content analytics and provides independence to generate valuable insights in a fast agile and accurate way. This domain provides analytical support for the below topics within Amazon Music: Programming / Label Relations / PR / Stations / Livesports / Originals / Case & CAM. DISCO team enables repeatable easy in depth analysis of music customer behaviors. We reduce the cost in time and effort of analysis data set building model building and user segmentation. Our goal is to empower all teams at Amazon Music to make data driven decisions and effectively measure their results by providing high quality high availability data and democratized data access through selfservice tools.
If you love the challenges that come with big data then this role is for you. We collect billions of events a day manage petabyte scale data on Redshift and S3 and develop data pipelines using Spark/Scala EMR SQL based ETL Airflow and Java services.
We are looking for talented enthusiastic and detailoriented Data Engineer who knows how to take on big data challenges in an agile way. Duties include big data design and analysis data modeling and development deployment and operations of big data pipelines. Youll help build Amazon Musics most important data pipelines and data sets and expand selfservice data knowledge and capabilities through an Amazon Music data university.
DISCO team develops data specifically for a set of key business domains like personalization and marketing and provides and protects a robust selfservice core data experience for all internal customers. We deal in AWS technologies like Redshift S3 EMR EC2 DynamoDB Kinesis Firehose and Lambda. Your team will manage the data exchange store (Data Lake) and EMR/Spark processing layer using Airflow as orchestrator. Youll build our data university and partner with Product Marketing BI and ML teams to build new behavioural events pipelines datasets models and reporting to support their initiatives. Youll also continue to develop big data pipelines.
Key job responsibilities
Deep understanding of data analytical techniques and how to connect insights to the business and you have practical experience in insisting on highest standards on operations in ETL and big data pipelines. With our Amazon Music Unlimited and Prime Music services and our top music provider spot on the Alexa platform providing high quality high availability data to our internal customers is critical to our customer experiences.
Assist the DISCO team with management of our existing environment that consists of Redshift and SQL based pipelines. The activities around these systems will be well defined via standard operation procedures (SOP) and typically involve approving data access requests subscribing or adding new data to the environment
SQL data pipeline management (creating or updating existing pipelines)
Perform maintenance tasks on the Redshift cluster.
Assist the team with the management of our nextgeneration AWS infrastructure. Tasks includes infrastructure monitoring via CloudWatch alarms infrastructure maintenance through code changes or enhancements and troubleshooting/root cause analysis infrastructure issues that arise and in some cases this resource may also be asked to submit code changes based on infrastructure issues that arise.
About the team
Amazon Music is an immersive audio entertainment service that deepens connections between fans artists and personalized music playlists to exclusive podcastsconcert livestreams to artist merchwe are innovating at some of the most exciting intersections of music and offer experiences that serve all listeners with our different tiers of service:Prime members get access to all music in shuffle modeand top adfree podcastsincluded with their membership;customers can upgrade to Music Unlimited for unlimited ondemand access to 100 million songs including millions in HDUltra HDspatial audio and anyone can listen for free by downloading Amazon Music app or via Alexaenabled us for opportunity to influence how Amazon Music engages fans artistsand creators on a global scale.
2 years of data engineering experience
Experience with data modeling warehousing and building ETL pipelines
Experience with SQL
Experience with one or more scripting language (e.g. Python KornShell)
Experience in Unix
Experience in Troubleshooting the issues related to Data and Infrastructure issues.
Experience with big data technologies such as: Hadoop Hive Spark EMR
Experience with any ETL tool like Informatica ODI SSIS BODI Datastage etc.
Knowledge of distributed systems as it pertains to data storage and computing
Experience in building or administering reporting/analytics platforms
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process including support for the interview or onboarding process please visit
for more information. If the country/region youre applying in isnt listed please contact your Recruiting Partner.