We are seeking a highly skilled Senior Terraform Engineer with deep expertise in Azure services to join our Enterprise AI Platform team. This role is Azure-centric with a strong emphasis on deploying Machine Learning (ML) and Generative AI (GenAI) models in scalable secure enterprise environments.
The ideal candidate will have hands-on experience with multi-cloud architectures Infrastructure as Code (IaC) best practices and a strong foundation in ML workflows enterprise AI platforms and cloud-based ML services. You will play a key role in automating infrastructure provisioning integrating AI/ML pipelines and optimizing deployments for performance cost security and compliance across a multi-cloud landscape.
This position requires a proactive engineer who can bridge DevOps and MLOps leveraging Terraform to support high-impact AI initiatives. If you thrive in fast-paced environments and are passionate about building robust automated cloud infrastructures for AI at scale this role offers a unique opportunity to drive innovation.
Design implement and maintain Infrastructure as Code (IaC) solutions using Terraform to provision and manage Azure resources including:
Azure Machine Learning (Azure ML)
Azure AI Studio
Azure Kubernetes Service (AKS)
Azure Databricks
Related services supporting ML and GenAI model deployment
Develop and enforce IaC best practices including:
Modular Terraform design
Remote state management (Azure Storage backends)
Drift detection
Automated policy and security testing using tools such as Terragrunt and Checkov
Deploy and orchestrate ML and GenAI models on enterprise ML platforms
Enable end-to-end automation across the ML lifecycle from model training through inference
Integrate AI/ML workflows with CI/CD pipelines (Azure DevOps GitHub Actions)
Collaborate with data scientists ML engineers and cross-functional teams to design multi-cloud architectures with Azure as the primary platform and AWS/Google Cloud Platform integrations
Support hybrid deployments data sovereignty requirements and disaster recovery strategies
Implement cross-cloud networking identity federation and resource orchestration
Optimize cloud infrastructure for AI/ML workloads including:
Compute clusters
Storage (Azure Blob Storage Azure Data Lake Storage ADLS)
Networking (Virtual Networks Private Endpoints)
Security controls (Azure RBAC Azure Key Vault Azure Sentinel)
Ensure infrastructure meets enterprise security availability and compliance standards (e.g. GDPR SOC 2)
Implement MLOps best practices including:
Model versioning
Monitoring
Logging
Alerting
Leverage observability tools such as Azure Monitor Prometheus and MLflow to ensure reliable production-grade deployments
Troubleshoot and resolve infrastructure issues in production AI environments
Ensure high availability scalability and reliability of AI platforms
Conduct code reviews mentor junior engineers and contribute to documentation for ML/GenAI-specific IaC patterns
Stay current with emerging Azure ML services including:
Azure OpenAI Service
Prompt Flow
Participate in on-call rotations and incident response for critical AI infrastructure
Bachelors or Masters degree in Computer Science Engineering or a related field (or equivalent professional experience)
5 years of experience as a Cloud Engineer DevOps Engineer or similar role
At least 3 years of hands-on experience with Terraform for IaC in Azure environments
Proven experience deploying ML and GenAI models using Azure ML including:
Model training
Model registration
Managed endpoints
Inference pipelines
Strong hands-on experience with multi-cloud architectures
Azure required
AWS and/or Google Cloud Platform preferred
In-depth understanding of Terraform concepts including:
Modules
Providers (AzureRM)
Variables and outputs
Workspaces and backends
Solid understanding of the machine learning lifecycle including:
Data ingestion
Feature engineering
Model serving
Scaling in enterprise AI platforms (Azure ML SageMaker Vertex AI)
Experience with containerization and orchestration tools:
Docker
Kubernetes (AKS)
Helm
Proficiency in scripting languages such as Python PowerShell or Bash
Familiarity with cloud security best practices for ML environments including:
Encryption
Access controls
Vulnerability scanning
Strong problem-solving skills and experience working in Agile teams
Relevant certifications including:
Microsoft Certified: Azure DevOps Engineer Expert
Azure AI Engineer Associate
HashiCorp Certified: Terraform Associate
Experience with additional IaC tools such as:
ARM Templates
Bicep
Pulumi (for hybrid Azure setups)
Background in MLOps tooling including:
Kubeflow
MLflow
Azure ML Pipelines
Experience with cloud cost optimization for AI workloads using tools like Azure Cost Management
Prior experience working in regulated industries (finance healthcare etc.) with compliance-driven infrastructure requirements
Required Skills:
Requirements: Bachelors or Masters degree in Computer Science Information Technology or related field. Minimum of 3-5 years of experience in data engineering with at least 2 years of experience in EKG platforms such as SPARQL RDF and Stardog. Strong skills in Graph DB with Python AML. Experience with some of the following technologies: R language Machine Learning Data Engineering Cloud Platforms ML Ops. Knowledge of SQL and NoSQL databases data modeling and data warehousing concepts. Experience with distributed systems and big data technologies such as Hadoop Spark and Kafka. Strong programming skills in Python and/or Java. Excellent problem-solving skills and attention to detail. Strong communication and collaboration skills.
Fashion Accessories Manufacturing / Apparel Manufacturing / Fabricated Metal Products