drjobs Platform/Site Reliability Engineer

Platform/Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Austin - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Position Title: Platform Engineer/Site Reliability Engineer
Location: Remote

Roles & Responsibilities

As a Platform Engineer at Unizin your primary responsibility is to ensure the reliability scalability security and performance of our infrastructure and applications hosted on the Google Cloud Platform (GCP) and Amazon Web Services (AWS). You will leverage tools such as Kubernetes ArgoCD GitLab CI/CD Python Terraform Pulimi and Ansible to achieve these goals.

Key Responsibilities

Infrastructure and Configuration Management

  • Implement and maintain infrastructure as code (IaC).
  • Manage multiple Kubernetes clusters ensuring high availability scalability and security.
  • Automate deployment scaling and management of containerized applications.

Monitoring and Alerting

  • Design and implement monitoring and reporting solutions.
  • Set up alerts and response procedures to ensure rapid response to incidents and outages.
  • Perform monitoring and reporting on infrastructure costs and usage as well as provide solutions for savings and optimization.

Continuous Integration and Deployment

  • Develop and maintain CI/CD pipelines.
  • Automate testing builds and deployments to achieve a consistent and reliable delivery process.

Performance and Reliability

  • Conduct performance testing and capacity planning to ensure systems can scale with demand.
  • Optimize system performance and resource utilization across our products and platform.

Incident Response and Post-Mortems

  • Participate in incident response activities ensuring timely resolution of issues.
  • Conduct thorough post-mortem analyses to identify root causes and prevent recurrence.

Security

  • Identify and deploy cybersecurity measures by continuously performing vulnerability assessment and risk management.
  • Administer and enforce time-bound access controls for cloud infrastructure ensuring least privilege and JIT (Just-In-Time) access principles.
  • Oversee remote endpoint management of workstations using centralized tooling including patch deployment configuration enforcement and compliance with security standards.
  • Collaborate with the broader engineering team to implement and maintain best practices for security controls.

Cloud Platform SME

  • Act as a subject matter expert on cloud platform (GCP/AWS) services technologies automation and security.
  • Provide guidance and recommendations on infrastructure design implementation and optimization.
  • Maintain a strong understanding of containerization microservices architecture and cloud-native technologies.
  • Stay updated with industry trends and best practices related to cloud platforms and DevOps methodologies.

Documentation and Knowledge Sharing

  • Create and maintain comprehensive documentation regarding systems configurations recurring issues procedures knowledge transfer material etc.
  • Share knowledge and best practices with the broader engineering team.
  • Mentoring and guiding broader engineering team members on infrastructure and CI/CD processes.

Automation and Tooling

  • Identify opportunities for automation to streamline operations and improve efficiency. Encourage and build automated processes wherever possible.
  • Develop and contribute to scripts and tools using Python Bash or other scripting languages as needed.

On-Call Responsibilities

  • Participate in a 24/7 on-call rotation with other Platform Engineering team members.
  • Respond promptly to alerts and incidents during your on-call shift.
  • Coordinate with other team members to resolve critical issues and minimize downtime.
  • Document incidents actions taken and follow-up tasks for review during business hours.

Qualifications

  • At least 4 years of Platform Engineering / SRE / DevOps experience
  • Extensive experience with Linux-based infrastructure and systems administration.
  • Strong understanding and practical experience with containerization microservices architecture and cloud-native technologies.
  • Solid experience with a cloud platform preferably GCP.
  • Expert in automated deployments and CI/CD (Gitlab CI/CD Tekton Jenkins etc.).
  • Proficiency in scripting and automation using Python or Bash.
  • Experience working with a SQL architecture such as PostgreSQL or MySQL.
  • Familiarity with infrastructure as code (IaC) principles and tools (Ansible Terraform Pulumi etc.).
  • Proven track record of working on production software projects that scale efficiently.
  • Expertise in application scaling methodologies including horizontal and vertical scaling strategies.
  • Experience with monitoring and logging tools preferably Stackdriver.
  • Excellent problem-solving skills and ability to troubleshoot complex issues under pressure.
  • Strong communication skills and ability to collaborate effectively across teams and with end users.
  • Demonstrated ability to articulate and represent technical viewpoints effectively coupled with active listening skills to understand diverse perspectives.

Skills that will set you apart

  • Apache Airflow Kafka Beam
  • ElasticSearch
  • Learning Management System (LMS) such as Canvas by Instructure
  • Shibboleth
  • Helm / Kustomize
  • GitFlow / GitOps
  • Akuity platform products (Argo Kargo etc.)
  • Vault or comparable secret management tooling
  • Britive
  • NinjaOne
  • Certifications in Kubernetes GCP or related technologies

Conclusion

As a Platform Engineer your role is crucial in maintaining the stability scalability and reliability of our cloud systems. By leveraging your expertise in Kubernetes CI/CD IaC scripting and serving as a cloud platform SME you will contribute significantly to advancing our DevOps practices and ensuring seamless delivery of high-quality services to our engineering teams and our members.

Furthermore your ability to articulate a strong technical perspective coupled with your openness to constructive dialogue and collaboration will drive innovative solutions and foster a culture of continuous improvement within our engineering teams.

Additional Information

Must be comfortable working with a computer daily for several hours. Must be able to communicate detailed work verbally and in writing. Must be able to context switch for concurrent tasks and interruptions when they arise.

Unizin is proud to be an equal opportunity workplace and is committed to equal employment opportunities regardless of race color ancestry religion sex national origin sexual orientation age disability or gender identity. Accommodations will be provided to any candidate with special needs who requests them.

At this time we are unable to sponsor applicants for work visas. Candidates must be authorized to work in the United States without current or future sponsorship.

If you are a resident of a state with designated pay transparency requirements and this role is available remotely you may be eligible to receive additional information about the compensation and benefits for this role which we will provide upon request. Please send an email to emailprotected.


Required Experience:

Manager

Employment Type

Full-Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.