Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailThis role is a member of the AI/ML Infrastructure Engineering team and will be dedicated to implementing and supporting AI/ML infrastructure solutions in cloud and onpremise environments. The role will work directly with infrastructure teams and potentially face off with data scientists machine learning engineers application developers and quantitative analysts by functioning as both a solutions architect helping them implement their own AI/ML solutions and as a professional services engineer implementing solutions for them in cloud environments such as AWS GCP and Kubernetes.
This is a handson developer role and candidates ideally have had experience deploying and supporting their own productionready AI/ML models in cloud environments as well as automating the build and management of a broad range of cloud infrastructure using tools like Terraform. Candidates should be familiar with developing unit and functional tests have experience designing and implementing CI/CD tools with infrastructure as code pipelines and have knowledge of Linux systems administration containerization networking security automated configuration and state management crosssystem orchestration configuration management logging metrics monitoring and alerting.
Principal Responsibilities:
Architect develop and maintain internal AI/ML infrastructure components frameworks and offerings
Architect develop and maintain AI/ML solutions for customers in cloud environments
Help customers architect develop and maintain their own AI/ML solutions in cloud environments
Implement CI/CD pipelines which include application tests security tests and gates
Implement availability security performance monitoring and alerting of AI/ML solutions
Automate data resiliency and replication for AI/ML models
Manage multiple environments and promote code between them
Automate systems configuration and orchestration using tools such as Terraform Chef Ansible or Salt
Automate creation of machine images and containers
Required Qualifications/Skills
6 years of experience designing and supporting production cloud environments
Experience consulting with customers to develop AI/ML solutions
Experience developing collaboratively including infrastructure as code preferably in Python
Systems engineering knowledge including understanding of Linux security and networking
Cloud templating tools such as Terraform
Experience with AI/ML frameworks (e.g. TensorFlow PyTorch)
Experience with distributed computing tools (e.g. Ray Dask)
Experience with model serving tools (e.g. vLLM KFServing)
Experience with building monitoring and alerting on logs and metrics
Cloud Networking including connectivity routing DNS VPCs proxies and load balancers
Cloud Security including IAM Certificate Management and Key Management
Excellent written and verbal communications
Excellent troubleshooting and analytical skills
Selfstarter able to execute independently on a deadline and under pressure
Full-Time