Senior MLOps Platform Engineer {S}

ARKA Group, L.P.

Not Interested
Bookmark
Report This Job

profile Job Location:

Colorado Springs, CO - USA

profile Monthly Salary: Not Disclosed
Posted on: 11 hours ago
Vacancies: 1 Vacancy

Job Summary

ARKA Group L.P. (ARKA) is an advanced technologies company serving the U.S. military intelligence community and commercial space industry delivering next-generation solutions to support the national security space enterprise. Built on more than six decades of excellence ARKA brings modern approaches and a culture of innovation to the challenges of today.

Join the ARKA team to learn how Beyond Begins Here. Discover your next career opportunity now!

Position Overview:

Our AI Center of Excellence builds the next generation ofAgentic AIproducts that autonomously reason plan and act on behalf of our customers.â To deliver these capabilities at scale we need a platform engineering group that provides a robust secure andhighly availableMLOpsfoundation across bothon premiseclusters and AWS.â The team works closely with data scientists product engineers and SREs to turn experimental models into reliable services that powermission criticalapplications.

In support of work/life balance many positions are available for a flexible schedule within the pay period. Ask us about the opportunity for flex scheduling if thats of interest to you.

Why join us

  • Shape theend-to-endlifecycle ofcutting-edgeAI servicesfrom model training to production inference.
  • Influence architecture decisions for a hybrid cloud environment that will serve thousands of concurrent agents.
  • Collaborate with world-class researchers and product teams while enjoying a strong engineering culture focused on automation observability and reliability.

      Responsibilities:

      • Design implement andoperatea unifiedMLOpsplatformthat supports bothon-premiseKubernetes clusters and AWS. The platform should enable rapid onboarding of newAgentic AIservices and provide consistent governance across environments.
      • Develop reusable CI/CD pipelines(GitLab CI) for model packaging containerization automated testing canary releases and rollbacks.
      • Build observability monitoring and alerting stacks(Prometheus GrafanaOpenTelemetry CloudWatch) to track inference latency throughput resourceutilization anddata driftforreal timeand batch workloads.
      • Createself-servicetooling(CLI SDKs UI dashboards) that allowsdata scienceand product teams to register models define inference endpoints and manage versioning without deep DevOps involvement.
      • Architect andmaintaindata pipelinesthat feed training data model artifacts and inference logs into a governed data lake (S3on premobject store).
      • Collaborate with research and product engineersto translate experimentalAgentic AIprototypes intoproduction gradeservices ensuring reproducibility security and compliance.
      • Drive performance optimizationfor inference workloads (GPU/CPU scaling model quantization batching strategies)
      • Champion best practicesin security (IAM network policies secret management)cost efficiency anddisaster recoveryfor the hybrid infrastructure.
      • Mentor junior engineersand contribute to internal knowledge basesupskilling and reviewprocesses.

      Required Qualifications:

      • BS in computer science or related engineering field
      • 5âyears of experiencebuilding and operating production gradesoftwareinfrastructure preferably ina hybrid onprem/ cloud environment
      • Deep expertise with Kubernetes(cluster provisioning Helm operators custom resources) and container runtimes (Docker OCI)
      • Hands on experience with AWS services(EKS SageMaker S3 IAM CloudWatch Step Functions) and the ability to bridge onprem resources with AWS via VPN/Direct Connect
      • Strong software engineering skillsin Python and at least one compiled language (GoRust or Java) for building platform components and SDKs
      • Proficiency with CI/CD andGitOpstooling(Argo CD FluxGitlab GitHub Actionsor similar)
      • Solidunderstanding of distributed systems (consensus fault tolerance load balancing) and experience tuning high throughput low latency inference pipelines
      • Experience with data engineering frameworks(Airflow Prefect Kafka Spark Flink) and building robust versioned data pipelines
      • Familiarity with observability stacks(Prometheus GrafanaOpenTelemetryELK) andthe ability to define meaningful SLIs/SLOs for AI services
      • Track record of collaborating with research or product teamsto move prototypes to production translating experimental code into maintainable services
      • Strongproblem solvingmindset excellent written and verbal communication and a passion for building scalable AI platforms

      Preferred Qualifications:

      • Working knowledge of Scrum and Agile software development methodology

      Location: Remote

      This is a remote position that will primarily be supporting our Aurora CO and King of Prussia PA locations. Due to contract requirements the job has to be performed from a remote location in the United States.

      What We Offer:

      • Comprehensive medical/vision/dental insurance packages
      • Company contributions to qualified HSA accounts
      • 401k retirement plan with industry leading company contributions
      • 3 weeks of vacation accrual per year plus time off for sick leave and unscheduled life events
      • 13 paid holidays
      • Upfront tuition assistance for approved degree programs
      • Annual bonus program based on company and employee performance
      • Company paid life insurance AD&D Short-Term and Long-Term disability insurance
      • 4 weeks paid Parental Leave
      • Employee assistance program (EAP)

      EHS/Environmental Requirements:

      This job operates alongside a professional office environment. While performing the duties of this job the employee routinely is required to use hands to keyboard communicate listen to and interpret instructions and remain stationary for extended periods of the time. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions of the job.

      Applicants are invited to apply for a reasonable accommodation to perform the essential duties of the job. To apply send a request to or contact and press 2 for Human Resources.

      ITC & Security Clearance Requirements:

      This position requires the incumbent to access export-controlled information. If you are not a U.S. Person any offer is contingent upon the Companys ability to obtain a special license granting you access. This could take several months. You will not be able to begin employment until such license is obtained.

      Visa Restrictions:

      No visa sponsorship is available for this position.

      Pre-employment Screenings:

      Employment with any ARKA companies in the U.S. is contingent upon satisfactory completion of several pre-employment requirements to include a credit check background check and drug screen.


      Required Experience:

      Senior IC

      ARKA Group L.P. (ARKA) is an advanced technologies company serving the U.S. military intelligence community and commercial space industry delivering next-generation solutions to support the national security space enterprise. Built on more than six decades of excellence ARKA brings modern approaches...
      View more view more

      Key Skills

      • APIs
      • C/C++
      • Computer Graphics
      • Go
      • React
      • Redux
      • Node.js
      • AWS
      • Library Services
      • Assembly
      • GraphQL
      • High Voltage

      About Company

      Company Logo

      We’re a government contracting enterprise driven by innovation, mission performance, and advanced engineering.

      View Profile View Profile