Description
Overview
As a Principal Software Engineer in the Artificial Intelligence group you will play a crucial role in building and optimizing the core software infrastructure that powers AIdriven solutions. You will focus on architecting and deploying highly scalable productionready backend systems that support AI assistants intelligent agents and foundational AI services. Collaborating with machine learning engineers and crossfunctional teams you will drive best practices in software engineering DevOps Kubernetesbased deployments and backend service development. Your expertise will be instrumental in accelerating AI innovation by ensuring robust reliable and efficient system operations.
Responsibilities
- Design and implement highperformance backend architectures that seamlessly integrate with AIpowered products. Focus on building modular faulttolerant and efficient services that support largescale AI workloads while ensuring lowlatency interactions between data pipelines inference engines and enterprise applications.
- Develop robust modelserving APIs and containerized microservices that enable realtime AI inference and batch processing with high throughput and low latency.
- Implement endtoend monitoring logging and alerting solutions to ensure AI systems operate reliably at scale.
- Improve scalability by designing distributed systems that efficiently handle AI workloads and inference pipelines.
- Own Kubernetesbased deployments by developing and maintaining Helm charts Kubernetes operators and cloudnative workflows to streamline AI model deployment.
- Automate infrastructure management using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Optimise CI/CD pipelines for AI applications ensuring smooth model retraining testing and deployment cycles.
- Improve security and compliance by implementing best practices in access control container security and vulnerability management.
- Partner closely with AI/ML teams to ensure seamless model integration into production environments.
- Lead architecture discussions and provide strategic technical guidance on AI platform evolution.
- Mentor and guide engineers to enhance team skills in backend development DevOps and cloud technologies.
Requirements
- Strong backend development experience in Python (preferred) or Java with expertise in building RESTful APIs microservices and eventdriven architectures.
- Deep understanding of Kubernetes and container orchestration with experience in deploying AI/ML workloads at scale.
- Expertise in DevOps and CI/CD pipelines including experience with Jenkins GitHub Actions ArgoCD or similar tools.
- Cloud expertise (AWS/GCP/Azure) including handson experience with cloudnative services for AI workloads (e.g. S3 Lambda EKS/GKE/AKS DynamoDB RDS etc..
- Experience in performance tuning and system optimization for largescale AI/ML workloads.
- Proven ability to collaborate with ML engineers data scientists data engineers and product teams to deliver AIpowered solutions efficiently.
- Experience in technical leadership driving architectural decisions and mentoring engineers.
- Strong problemsolving skills with the ability to balance tradeoffs between scalability maintainability and performance.
Preferred Experience
- Prior experience working with AI/ML pipelines model serving frameworks or distributed AI workloads.
- Experience in AI observability monitoring model drift and optimizing inference latency.
- Understanding of cybersecurity observability or related domains to enhance AIdriven decisionmaking.
Splunk a Cisco company is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race colour religion gender sexual orientation national origin genetic information age disability veteran status or any other legally protected basis.
Note:
Thank you for your interest in Splunk!
Required Experience:
Staff IC