Senior AWS Platform Engineer

Cloudious LLC


Job Location:

Warren, OH - USA

Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

Role : Senior AWS Platform Engineer

Location : Warren NJ (Hybrid)

Salary : $125k to $130k

Experience: 8 years overall Strong hands-on AWS focus Certifications: AWS Solutions Architect Associate (minimum) Professional preferred Engagement Type: Cloud Platform Engineering & DevOps

About the Role

We are looking for a Senior AWS Platform Engineer with 8 years of hands-on cloud and platform engineering experience to join a high-impact team delivering and operating enterprise-grade AWS infrastructure. This is a deeply technical individual contributor role - not an architecture advisory or management position. You will design build and operate the foundational platform capabilities that engineering teams depend on: infrastructure automation container platforms CI/CD pipelines security controls and cloud-native APIs.

The right candidate writes production Terraform that other engineers build on operates EKS clusters that run real workloads and builds Python tooling that solves operational problems at scale. You are equally comfortable in the AWS console diagnosing a networking issue and in a code review ensuring Terraform modules meet reusability standards. You take pride in platforms that are not just functional but operationally excellent - well-observed cost-governed and secure by default.

This role is based in New Jersey with hybrid onsite expectations.

Key Responsibilities

AWS Platform Architecture & Operations

  • Design deploy and operate production AWS environments at scale - including compute (EC2 ECS EKS Lambda) storage (S3 EBS EFS) networking (VPC Transit Gateway Direct Connect Route53) and identity (IAM AWS SSO SCPs).
  • Architect and maintain multi-account AWS environments using AWS Organizations Control Tower and account vending pipelines - including OU structure SCP design guardrail enforcement and account baseline automation.
  • Implement and govern cloud security controls at scale - IAM least-privilege design permission boundaries VPC security architecture KMS key management Secrets Manager Security Hub GuardDuty AWS Config rules and CloudTrail governance.
  • Own cost governance for the AWS platform - implementing tagging policies cost allocation structures budget alerting rightsizing analysis using Compute Optimizer and Savings Plans / Reserved Instance strategy. Build FinOps reporting tooling where native tooling falls short.
  • Maintain deep operational familiarity with AWS networking - including VPC design Transit Gateway routing NAT Gateway PrivateLink DNS architecture security group management and network troubleshooting at scale.
  • Perform platform reliability engineering - designing for HA/DR implementing auto-scaling strategies performing capacity planning and ensuring production workloads meet availability and performance SLOs.

Infrastructure as Code - Terraform

  • Design build and maintain reusable modular Terraform codebases that serve as the foundational IaC layer for the platform - covering networking compute security identity and data services.
  • Architect Terraform module libraries with clear interface design versioning strategy and documentation standards that enable other engineering teams to consume infrastructure safely and consistently.
  • Manage remote Terraform state across multi-account multi-region environments - including S3/DynamoDB state backend configuration state isolation strategy and state migration procedures.
  • Implement Terraform drift detection workflows - identifying configuration drift between IaC-defined state and actual infrastructure establishing remediation processes and preventing drift accumulation.
  • Enforce policy-as-code using Terraform Sentinel OPA/Conftest or Checkov - implementing guardrails that prevent misconfigured or non-compliant infrastructure from being deployed through automated pipelines.
  • Integrate Terraform into automated delivery pipelines - including plan/apply automation PR-based infrastructure review workflows and environment promotion gates with automated compliance validation.

Containers & Kubernetes (EKS)

  • Design deploy and operate production EKS clusters - including cluster version management node group and Fargate profile configuration cluster autoscaler and Karpenter implementation and managed add-on lifecycle management.
  • Implement workload security on EKS - including IRSA (IAM Roles for Service Accounts) OPA/Gatekeeper policy enforcement Pod Security Standards network policies secrets management integration (External Secrets Operator Secrets Store CSI) and runtime security tooling.
  • Design and implement EKS observability - including Prometheus/Grafana stack deployment CloudWatch Container Insights Fluent Bit log routing distributed tracing integration and custom dashboards for cluster and workload health.
  • Manage autoscaling at both cluster and workload level - Cluster Autoscaler / Karpenter for node-level scaling HPA and KEDA for workload-level scaling and VPA for resource optimisation.
  • Define and enforce Kubernetes operational standards - resource requests/limits pod disruption budgets topology spread constraints liveness/readiness probes and namespace isolation patterns - ensuring production workloads are deployed safely and reliably.
  • Own the EKS upgrade lifecycle - planning and executing cluster version upgrades with minimal workload disruption including add-on compatibility validation and node group rotation strategies.

DevOps & CI/CD Pipeline Engineering

  • Design build and maintain automated delivery pipelines for infrastructure and application workloads - using GitHub Actions GitLab CI/CD or AWS CodePipeline/CodeBuild - with integrated security scanning quality gates and artifact management.
  • Integrate security scanning into CI/CD pipelines - including SAST (static analysis) container image scanning (Trivy Grype ECR scanning) dependency vulnerability scanning (OWASP Dependency-Check Snyk) and IaC security scanning (Checkov tfsec).
  • Implement artifact management - ECR lifecycle policies Helm chart repository management (ECR OCI Artifactory) artifact signing and provenance validation and dependency pinning strategies.
  • Build GitOps workflows for Kubernetes deployments - including ArgoCD or Flux configuration application set management environment promotion automation and drift detection between Git state and cluster state.
  • Establish release management standards - environment promotion gates canary and blue/green deployment patterns automated rollback triggers and deployment frequency and lead time metrics.
  • Implement developer platform tooling - self-service infrastructure provisioning environment creation automation and internal developer portal integrations that reduce platform friction for engineering teams.

Python & Bash Automation

  • Build production-quality Python tooling for cloud automation infrastructure management and operational tasks - including AWS SDK (boto3) automation CLI tooling and platform utilities that are reusable well-tested and maintainable.
  • Write Bash automation scripts for operational tasks CI/CD pipeline steps and system-level configuration - with appropriate error handling logging and idempotency.
  • Develop Lambda functions for event-driven cloud automation - including resource lifecycle management compliance enforcement cost optimisation automation and operational response workflows.
  • Implement unit and integration tests for infrastructure tooling and automation code - applying software engineering discipline to platform engineering work not just infrastructure configuration.
  • Build operational tooling that improves platform reliability and reduces toil - including automated remediation scripts health check utilities and runbook automation that the operations team can execute safely.

API Development

  • Design and build RESTful APIs for platform services - including internal developer APIs FinOps reporting endpoints CMDB integrations and operational tooling interfaces - with proper authentication (AWS IAM API Keys OAuth/JWT) authorisation throttling and versioning.
  • Deploy APIs using AWS API Gateway - including usage plans API key management Lambda integration request/response mapping and custom domain configuration.
  • Produce API documentation to engineering standards - OpenAPI/Swagger specifications usage examples error code catalogues and integration guides that enable other teams to consume platform APIs reliably.
  • Apply API security best practices - input validation rate limiting authentication enforcement and least-privilege access patterns - ensuring platform APIs do not introduce security risk to the broader environment.

Required Qualifications

Certifications

  • AWS Solutions Architect Associate - minimum requirement.
  • AWS Solutions Architect Professional or AWS DevOps Engineer Professional - strongly preferred.
  • Certified Kubernetes Administrator (CKA) - advantageous.

Experience

  • 8 years of hands-on experience in cloud platform engineering DevOps or infrastructure engineering.
  • Strong AWS hands-on experience across compute storage networking serverless and identity - in production environments not lab or sandbox contexts.
  • Proven experience designing and maintaining production-grade Terraform codebases - module design state management drift detection and pipeline integration.
  • Deep hands-on experience operating EKS in production - cluster operations workload security autoscaling and observability.
  • Demonstrated experience building CI/CD pipelines with integrated security scanning and artifact management.
  • Strong Python and Bash scripting for cloud automation - with evidence of production-quality reusable code rather than one-off scripts.
  • Experience designing and building RESTful APIs with appropriate security authentication and documentation standards.
  • Must be based in New Jersey.

Technical Depth - Non-Negotiables

The following are hard requirements. Candidates should expect hands-on technical assessment:

Domain

Required Depth

AWS Multi-Account

OU design SCP authoring Control Tower account vending

Terraform

Module design remote state drift detection Sentinel/OPA policy

EKS

Cluster ops IRSA Karpenter network policies upgrade lifecycle

CI/CD

Pipeline design security scanning integration GitOps (ArgoCD/Flux)

AWS Networking

VPC TGW PrivateLink Route53 security groups at scale

AWS Security

IAM KMS Secrets Manager Security Hub GuardDuty Config

Python

boto3 Lambda reusable tooling unit testing

API Development

REST design API Gateway auth throttling OpenAPI spec

Nice-to-Have Qualifications

  • Data & SQL: Ability to query and transform data using SQL and AWS-native data services (Athena Redshift Glue) - useful for FinOps reporting CMDB analytics and platform observability dashboards.
  • Application Development: Comfort building lightweight backend services with Go or TypeScript writing unit and integration tests and understanding application architecture well enough to advise on platform requirements.
  • Multi-Cloud & Tooling: Exposure to GCP and familiarity with enterprise platform tooling - observability platforms (Datadog Grafana) developer portals (Backstage) or ITSM integrations (ServiceNow).
  • FinOps tooling: Experience with third-party FinOps platforms (Apptio Cloudability CloudHealth Infracost) for cost analysis and governance reporting.
  • Service mesh experience: Istio or AWS App Mesh for inter-service communication mTLS enforcement and traffic management in microservices environments.

The Technical Bar

This role has a high technical bar. Candidates will be evaluated through a hands-on technical assessment covering:

  • Terraform module design - writing a reusable well-structured module with appropriate variable design outputs and documentation
  • EKS troubleshooting - diagnosing a simulated production issue in a Kubernetes environment
  • AWS architecture - designing a multi-account architecture for a given scenario including security controls and network design
  • Python automation - writing a boto3 script to solve a real operational problem with appropriate error handling and testability
Role : Senior AWS Platform Engineer Location : Warren NJ (Hybrid) Salary : $125k to $130k Experience: 8 years overall Strong hands-on AWS focus Certifications: AWS Solutions Architect Associate (minimum) Professional preferred Engagement Type: Cloud Platform Engineering & DevOps A...