Lead DevOps Engineer GCP US Citizens Only Remote

Saransh Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Boston, NH - USA

profile Monthly Salary: Not Disclosed
Posted on: 5 hours ago
Vacancies: 1 Vacancy

Job Summary

Title: Lead DevOps Engineer
Location: 100% Remote

The Lead DevOps Engineer a key member of the EIT DevOps Team is responsible for the staging and production infrastructure of Iron Mountains Digital Services within the federal sector. This role is pivotal in managing and optimizing staging and production deployment environments across Google Cloud Platform (GCP) Amazon Web Services (AWS) and Microsoft Azure.

Core responsibilities include provisioning and maintaining secure scalable and robust cloud infrastructure for the InSight DXP Platform. The Senior DevOps Engineer will apply extensive knowledge of cloud services and DevOps best practices to ensure application efficiency high availability and performance.

Additionally this role involves creating and maintaining FedRAMP controls and documentation compliance. The Senior DevOps Engineer will execute automation pipelines upgrade infrastructure troubleshoot complex issues and contribute to the ongoing enhancement of deployment processes. Close collaboration with development operations and other EIT teams is crucial for delivering seamless and reliable solutions.

Core Responsibilities:

  • Cloud Infrastructure Management: Deploy manage and maintain cloud infrastructure across AWS Azure and/or GCP ensuring compliance for government workloads.
  • Infrastructure Automation: Automate infrastructure provisioning using Infrastructure as Code (IaC) tools like Terraform OpenTofu or AWS CloudFormation.
  • Deployment Pipeline Streamlining: Collaborate with development teams to streamline CI/CD pipelines using tools such as GitLab and OpenTofu for efficient infrastructure and application delivery.
  • Performance Optimization: Monitor system performance participate in capacity planning and optimize application and infrastructure performance by tuning configurations and identifying bottlenecks.
  • Automation Development: Develop scripts and tools to automate routine operations including patching scaling and monitoring.
  • Self-Healing Systems: Design and implement self-healing systems that proactively detect and resolve faults.
  • Data Integrity & Availability: Manage backup and disaster recovery strategies to ensure data integrity and availability across environments.
  • Security & Compliance: Perform regular security audits and vulnerability patching adhering to government compliance requirements (e.g. FedRAMP NIST).

Incident Management & Observability:

  • Real-time Incident Resolution: Respond to and resolve infrastructure incidents and outages in real-time minimizing disruption.
  • Root Cause Analysis (RCA): Conduct RCA for production issues and implement long-term corrective actions.
  • On-Call Participation: Participate in an on-call rotation escalating and coordinating responses to high-severity issues.
  • Incident Documentation: Document incidents responses and postmortems to capture lessons learned.
  • Complex Problem Diagnosis: Diagnose complex infrastructure and application problems including database performance issues latency and service connectivity challenges.
  • Comprehensive Logging & Telemetry: Ensure comprehensive logging and telemetry to support incident response performance tuning and auditing.
  • Observability Improvements: Drive observability improvements by collaborating with Engineering and Platform teams to enhance system reliability and traceability.

Application & Knowledge Management:

  • Application Incident Leadership: Lead resolution efforts for application-level incidents ensuring coordinated response across teams.
  • Application Lifecycle Management: Oversee application lifecycle management including version upgrades security patches and regional rollouts.
  • Knowledge Base Contribution: Contribute to a shared knowledge base documenting recurring issues and resolution steps.
  • Scaling Strategies: Support scaling strategies to meet regional demand ensuring infrastructure resilience and compliance with service-level objectives (SLOs).

Qualifications:

  • Must be eligible and willing to submit for U.S. Government security clearances; active clearance is a plus.
  • Experience supporting FedRAMP Authorized platforms is highly desirable.
  • Minimum 5 years of experience leading and supporting enterprise-level applications in production environments.
  • Proven experience in cloud infrastructure provisioning and management on Google Cloud Platform (GCP) Amazon Web Services (AWS) or Microsoft Azure.
  • Proficiency in scripting languages such as Python Bash or PowerShell for automation and systems management.
  • Strong understanding of containerization and orchestration technologies including Docker Kubernetes and Helm.
  • Hands-on experience with cloud object storage services such as AWS S3 Google Cloud Storage or Azure Blob Storage.
  • Working knowledge of database and persistence technologies particularly MongoDB and PostgreSQL.
  • Experience supporting and integrating microservices architectures and RESTful APIs.
  • Familiarity with incident and service management systems such as ServiceNow and Jira.
  • Experience with SAST/DAST security and compliance tooling such as Prisma Cloud CrowdStrike XSOAR and Burp Suite.
  • Basic understanding of identity and access management (IAM) and SSO technologies particularly Okta and application integration practices.
  • Excellent troubleshooting skills especially in complex distributed cloud-based environments.
  • Strong written and verbal communication skills with the ability to clearly document procedures incidents and solutions.
  • Effective at producing support documentation and conducting knowledge transfer or training sessions.
  • Demonstrated ability to work independently with minimal supervision in a fast-paced collaborative and globally distributed team.
A motivated proactive mindset with a commitment to delivering high-quality secure and reliable systems
Title: Lead DevOps Engineer Location: 100% Remote The Lead DevOps Engineer a key member of the EIT DevOps Team is responsible for the staging and production infrastructure of Iron Mountains Digital Services within the federal sector. This role is pivotal in managing and optimizing staging an...
View more view more

Key Skills

  • Administrative Skills
  • Facilities Management
  • Biotechnology
  • Creative Production
  • Design And Estimation
  • Architecture