Site Reliability Engineer II (SRE II) Full-Time Remote U.S. Only
Join Balto as a Site Reliability Engineer II and play a critical role in scaling the reliability security and performance of our AI-powered platform!
About the Role
Balto a pioneering tech startup delivering AI-driven tools for large sales and customer service teams is seeking an experienced Site Reliability Engineer II to help design build and maintain resilient scalable and secure this technical role you will partner with engineering and security teams to improve platform performance reduce operational toil and strengthen compliance across our systems. This remote role can be completed from anywhere in the United States but eligibility to work in the US is addition occasional travel for and participation in full-company in-person all-hands events up to 4 times a year is mandatory.
Who We Are Looking For
We seek a skilled engineer with deep expertise in cloud infrastructure automation and observability who thrives in fast-paced startup environments. You are motivated by solving complex technical problems building scalable systems and implementing robust security and compliance practices. You combine hands-on technical ability with strategic thinking and enjoy collaborating across teams to deliver measurable reliability improvements.
Key Responsibilities
- Infrastructure Management: Architect build and scale AWS infrastructure using Infrastructure as Code (IaC) tools such as Terraform.
- CI/CD & Deployment: Design implement and optimize CI/CD pipelines using tools like GitHub Actions ArgoCD or similar to streamline deployments and improve release velocity.
- Kubernetes Operations: Manage and optimize Kubernetes-based infrastructure (Amazon EKS) to ensure scalability reliability and efficient resource utilization.
- Observability & Incident Response: Build and maintain monitoring alerting and logging systems (Prometheus Grafana Datadog Loki) to ensure high availability; participate in the on-call rotation to resolve incidents.
- Security & Compliance: Implement and maintain security controls to meet PCI DSS HIPAA GDPR and SOC 2 standards and support audit readiness.
- System Architecture: Contribute to designing fault-tolerant architectures with disaster recovery and high-availability strategies within and out of the CDE environments.
- Developer Enablement: Partner with developers to improve deployment workflows reduce lead time for changes and provide platform tooling support.
- Documentation & Knowledge Sharing: Create clear runbooks technical documentation and knowledge base articles to support team-wide learning and operational excellence.
Skills and Qualifications
Required:
- 3-5 years of experience in SRE DevOps or Platform Engineering roles with at least 2 years in a senior or mid-level capacity.
- Strong hands-on experience with AWS services and IaC tools like Terraform.
- Expertise in Kubernetes operations in production environments (Amazon EKS preferred).
Proficiency in CI/CD pipeline tools (e.g. GitHub Actions Jenkins ArgoCD). - Strong knowledge of monitoring and observability tooling (Prometheus Grafana Datadog CloudWatch).
- Familiarity with compliance frameworks (PCI DSS HIPAA GDPR SOC 2) and cloud security best practices.
- Excellent problem-solving troubleshooting and incident management skills.
Preferred:
- Experience supporting developers in platform engineering or internal tooling contexts.
- Familiarity with NIST Cybersecurity Framework (CSF) implementation in SaaS/cloud environments.
- Strong networking fundamentals (TCP/IP DNS HTTP TLS firewalls).
- Experience with AWS networking services (VPC Route 53 NAT Gateway ALB/NLB).
- Background in cost optimization and cloud governance.
- Strong scripting/programming skills (Bash Python Go).
Our Culture: Were AI Obsessed
At Balto we dont just build AIwe live it. If youre not...
- Automating infrastructure with the latest DevOps tools.
- Experimenting with AI-powered observability or security tools.
- Following the latest drops from AWS CNCF and open-source SRE communities.
- Reading engineering blogs RFCs and architecture deep dives.
- Playing with side projects that push the boundaries of automation
then Balto might not be the right place for you. But if that does sound like you youll feel right at home.
Why Balto
- Fully remote team work from anywhere in the U.S.
- Mission-driven culture with smart supportive and AI-obsessed teammates
- Career growth this role is built for someone who wants to continue to level up
- Great benefits: healthcare 401(k) unlimited PTO learning stipends and more
Assessment Process
Our hiring process includes virtual interviews and take-home exercises designed to evaluate your strategic selling skills problem-solving ability and communication prowess.
Ready to put your sales skills to work at a company that breathes AI
Apply at
Required Experience:
Manager