Cloud & AI Infrastructure Engineer | AWS, Azure, Multi-LLM Deployment, Kubernetes, Terraform, Security & Observability
Job Summary
Job Summary
Synechron is seeking a highly skilled Agentic AI Platform Engineer to support the deployment management and optimization of cloud infrastructure for multi-model AI this role you will design and maintain scalable secure and resilient AI platform components on AWS including Bedrock EKS and Redshift. Your expertise will enable efficient AI model routing data integration and observability critical to the organizations AI and digital transformation strategies. You will collaborate with cross-functional teams to develop robust APIs automate deployment workflows and ensure compliance with enterprise standards.
Software Requirements
Required: AWS Bedrock EKS EC2 Terraform (latest stable) Jenkins CloudWatch X-Ray Splunk New Relic PostgreSQL Redshift REST SOAP GraphQL OAuth2 streaming protocols
Preferred: AWS Nova OpenAI Anthropic Gemini Kubernetes Helm monitoring tools (Datadog Instana) enterprise security tools (SCPs IAM roles) multi-LLM routing platforms
Experience level: 5 years of AWS cloud platform engineering with a focus on large-scale AI infrastructure and microservice deployment
Overall Responsibilities
Design deploy and manage cloud infrastructure supporting multi-tenant fault-tolerant AI platforms on AWS including Bedrock and EKS clusters
Develop and maintain infrastructure as code (Terraform modules CloudFormation templates) for scalable and repeatable deployments
Configure and optimize gateways such as Kong MCP servers and multi-LLM routing architectures
Implement comprehensive observability using CloudWatch X-Ray Splunk and New Relic to track system health SLA adherence and incident detection
Automate deployment workflows and CI/CD pipelines to accelerate model updates and platform releases
Support data infrastructure including PostgreSQL and Redshift for RAG data storage and retrieval workflows
Collaborate with AI/ML teams security and network teams to ensure compliant high-security deployments
Conduct root cause analysis of incidents optimize platform performance and implement preventive measures
Document system architecture platform APIs and operational procedures ensuring adherence to enterprise standards
Technical Skills (By Category)
Programming Languages:
Essential: Terraform Python Shell scripting Azure CLI Azure PowerShell
Preferred: Java Go or other scripting languages for automation and integration
Cloud Technologies:
AWS (Bedrock EC2 EKS S3 Redshift IAM VPC) Azure (AKS Function Apps Azure SQL) GCP (optional)
Frameworks and Libraries:
Kubernetes Helm charts Istio or other service mesh tools monitoring SDKs (CloudWatch Datadog Prometheus)
Development Tools & Methodologies:
Terraform modules Jenkins Bitbucket/GitHub CI/CD pipelines Agile/Scrum Infrastructure as Code version control and automation best practices
Security & Compliance:
Managing IAM roles SCPs encryption standards network security policies and compliance standards (GDPR SOC HIPAA as applicable)
Experience Requirements
5 years supporting cloud infrastructure and deployment of large-scale AI/ML workloads on AWS or multi-cloud environments
Proven experience deploying and optimizing multi-model LLM platforms such as Bedrock OpenAI or Anthropic
Strong expertise in infrastructure automation (Terraform CloudFormation) container orchestration (Kubernetes EKS)
Skilled in multi-LLM routing SLA management and incident response in AI systems
Experience working with data infrastructure like PostgreSQL and Redshift for RAG workflows
Industry experience in enterprise AI fintech or cloud-native architectures is preferred; equivalent enterprise infrastructure experience is acceptable
Day-to-Day Activities
Design manage and optimize cloud infrastructure supporting multi-tenant AI workloads
Automate deployment and scaling of AI platform components pipelines and models
Configure and monitor gateways routing engines and security modules for enterprise-grade availability
Implement and improve observability frameworks to proactively detect and resolve issues
Support data infrastructure and model deployment workflows including data ingestion storage and retrieval
Conduct root cause analysis incident resolution and disaster recovery drills
Collaborate with AI/ML teams security and infrastructure teams to ensure compliance and security
Develop and maintain documentation on platform architecture API endpoints and operational procedures
Stay informed on emerging AI platform architectures cloud services and security practices
Qualifications
Bachelors or Masters degree in Computer Science Cloud Computing Data Science or related field
5 years supporting cloud platform engineering AI infrastructure deployment and data workflows
Certifications in AWS Azure or GCP cloud solutions and security (preferred)
Proven experience managing multi-LLM deployment platforms and model routing architectures
Strong troubleshooting performance tuning and incident analysis skills
Excellent communication and collaboration abilities across technical teams
Professional Competencies
Critical thinking for designing scalable fault-tolerant platforms
Leadership and team management skills supporting cross-team collaboration
Effective communication for stakeholder engagement and documentation
Adaptability to evolving cloud and AI architecture trends
Ownership of platform stability security and continuous improvement
Time management proficiency to coordinate multiple deployment cycles and incident responses
SYNECHRONS DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity Equity and Inclusion (DEI) initiative Same Difference is committed to fostering an inclusive culture promoting equality diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger successful businesses as a global company. We encourage applicants from across diverse backgrounds race ethnicities religion age marital status gender sexual orientations or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements mentoring internal mobility learning and development programs and more.
All employment decisions at Synechron are based on business needs job requirements and individual qualifications without regard to the applicants gender gender identity sexual orientation race ethnicity disabled or veteran status or any other characteristic protected by law.
Required Experience:
IC
Key Skills
About Company
Chez Synechron, nous croyons en la puissance du numérique pour transformer les entreprises en mieux. Notre cabinet de conseil mondial combine la créativité et la technologie innovante pour offrir des solutions numériques de premier plan. Les technologies progressistes et les stratégie ... View more