This is a fulltime role in our product organization for an expert in systems design with considerable skill and expertise in large software development in an AZURE dev environment. Designs and implements Continuous Integration/Continuous Deployment (CI/CD) tooling using GitHub Actions / Azure DevOps and related technologies. This includes defining and implementing: build and test pipelines for containerized architectures infrastructure as code (IaC) for the stateful deployment of environments RoleBased Access Control (RBAC) linting and other code quality controls gitops and kubernetes pipelines and managing SaaS deployment APIs.
Individuals in this role will assist in the design engineering development planning and administration of Azure Kubernetes AKS clusters for a set of critical business applications. This role will work closely with application engineering security and operations teams to engineer and build Kubernetes and Azure PaaS & IaaS solutions within an agile and modern enterprise grade operating model. Qualified applicants will have a demonstrated capability to learn new concepts quickly and/or have robust domain expertise.
Qualifications :
Key Responsibilities:
- Responsible for availability latency performance efficiency monitoring/observability emergency response capacity planning setting and maintaining SLOs SLIs and Error Budgets creating dashboards.
- Analyze troubleshoot and resolve operational challenges contributing to defined SLOs.
- Manage site stability performance reliability and maintain uptime for production environments.
- Develop a fully automated multienvironment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
- Strive for automation to reduce toil and increase development velocity.
- Perform applicationspecific production support incident management change management problem management RCAs and service restoration as needed.
- Identify changes for the product architecture from the reliability performance and availability perspective with a data driven approach.
- Analyze and address complex technical challenges and issues that arise during the software development & run lifecycle. Debug troubleshoot and resolve technical problems efficiently.
- Create and maintain technical documentation including design specifications user guides run books and best practice guidelines.
- Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
- Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
- Participate in Agile ceremonies such as sprint planning standup meetings and retrospectives.
- Collaborate with product managers designers and other engineers to ensure alignment and efficient project .
- Share your expertise and mentor engineers helping them grow and develop their skills. Foster a culture of continuous learning and improvement within the team.
- Stay updated with the latest technologies tools and cloud computing. Proactively learn and adapt to new technologies to drive innovation.
- Collaborate with customers to understand their needs gather feedback and provide technical support and guidance as needed.
- Triage incoming Web Support escalation requests routing to applicable internal teams
- Contribute to incident root cause analysis service restoration and serve as an incident commander during outage events.
- Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
- Solid experience with Monitoring/APM/Observability tools (Data dog Application Insights Prometheus Grafana etc.
- Strong backgroud with Azure Resources like Key Vault Data Factory Azure Databricks and Storage Accounts.
- Experience implementing observability plans around logs metrics and traces.
- Experience in an agile development team developing software.
- Implement and participate exercising best practices for CI/CD.
- Experience with cloud infrastructure environments preferably Azure and Infrastructure as code (Terraform Bicep ARM).
- Design develop and maintain infrastructure using popular IaC tools and technologies like Terraform Helm others.
- Strong experience with containerization technology and/or Kubernetes.
- Experience with Release automation system administration configuration management.
- Experience with programming languages (Python Go etc..
- Strong understanding of Linux Windows software development systems networking and cloud concepts.
- Strong interpersonal and teaming skills ability to set and enforce process and influence engineers who are not direct reports.
- Strong analytical and programming skills (Python Go etc..
- (Bonus) Experience with MLFlow and other MLOps pipeline technology
Practices Principles Techniques
- Continuous Integration/Continuous Deployment (CI/CD)
- Instrumentation strategy and Site Reliability Engineering (SRE)
- Release Communication and Collaboration
- Security and Compliance
- TDD (Test Driven Development especially with respect to CI/CD and DevOps)
Additional Information :
Base Salary Pay Range*: $142500 $198750 USD
*The current applicable Base Salary Pay Range for this role is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job education experience knowledge skills relevant to the role internal equity alignment with market data or other law.
NOTE: WHILE THIS ROLE IS REMOTE YOU MUST BE A US CITIZEN OR ABLE TO WORK WITHIN THE UNITED STATES WITHOUT SPONSORSHIP.
We are an equal opportunity employer. All applicants will be considered for employment without attention to age race color religion sex sexual orientation gender identity national origin veteran or disability status.
Other Compensation / Benefit Overview
In addition to Base Salary the successful candidate may be eligible to participate in the following plans / programs upon satisfying all hiring requirements:
- Medical Dental and Vision Coverage
- Life Insurance and Disability Programs
- Retirement Savings with Company Match
- Flexible Work Arrangements including Remote Work
We are an equal opportunity employer. All applicants will be considered for employment without attention to age race color religion sex sexual orientation gender identity national origin veteran or disability status.
All your information will be kept confidential according to EEO guidelines
#REMOTE
#LIJH1
Beware of scams
Our recruiting team may communicate with candidates via our @hitachisolutions domain email address and/or via our SmartRecruiters (Applicant Tracking System) domain email address regarding your application and interview requests.
All offers will originate from our @hitachisolutions domain email address. If you receive an offer or information from someone purporting to be an employee of Hitachi Solutions from any other domain it may not be legitimate.
Remote Work :
Yes
Employment Type :
Fulltime