Admin for a small Redshift cluster Create manage basic glue jobs that make structured data in S3 accessible via Athena the Redshift cluster. Leverage Glue (or other appropriate tooling) to develop better training data pipelines Handle security and admin work for the account particularly interfacing with internal corporate tools in a compliant manner Improve AWS Batch setup. We use Batch for running model jobs but I doubt our current setup is ideal Work with scientists to improve training infrastructure. See last bullet to a degree; we dont leverage Sagemaker to the full extent we could and would be interested in improving on that front Work with scientists to deploy models. Potentially. We dont know if well be doing our own deployments but if we do collaboration with scientists on setting up API endpoints for external model access would be a valueadd
Key job responsibilities Comfort or at least familiarity with S3 Glue Athena Redshift IAM/Secrets Manager EC2 security configs etc. Some familiarity with Quicksight Basics of DB (Redshift) management best practices Comfort or at least familiarity with PySpark oOptimization of highly distributed Spark SQL jobs may well come up oSome experience running Spark jobs on distributed clusters might be helpful. We have internal tools that do this but understanding how to leverage them better would be a valueadd. SQL Python (basics) Data pipeline management Ideally comfortable with Amazon internal tooling (internal candidates only obviously) oCradle oDataCentral ecosystem (managing paths between team data on S3 semiprivate Redshift to Andes for data we want to make public) oQuicksight oHas handled internal AWS stuff before
3 years of data engineering experience Experience in at least one modern scripting or programming language such as Python Java Scala or NodeJS Knowledge of batch and streaming data architectures like Kafka Kinesis Flink Storm Beam Experience with AWS technologies like Redshift S3 AWS Glue EMR Kinesis FireHose Lambda and IAM roles and permissions
Experience with nonrelational databases / data stores (object storage document or keyvalue stores graph databases columnfamily databases) Experience with data modeling warehousing and building ETL pipelines
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race national origin gender gender identity sexual orientation protected veteran status disability age or other legally protected status.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.