SreOps

Chennai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Role	SRE/OPS
Location	Chennai Flexibility to travel for business trips
Experience:	3 to 6 Years of experience
What awaits you/ Job Profile	Our vision is to provide an overarching platform for AI-based quality assurance (AIQX) in the global production network to accelerate the end-to-end quality control cycle in vehicle manufacturing. We develop these in an international feature team based on state-of-the-art technologies in close cooperation with our users in the plants. We are looking for a SRE to join our BMW teams of rock-solid specialists developing and operating our AI-based quality assurance solution for BMWs plants. In this position you will take an important role in maintaining and operating a highly complex platform working in an international team securing the quality of our BMW Products. Our services mainly run on the Microsoft Azure Cloud Platform. If you are a passionate SRE preferably with a developer background willing to take responsibility for our platform sharing knowledge and giving guidance within the team are thrilled about latest technology full of energy and ambition hands-on and not afraid of making your hands dirty this is the right position for you.
What should you bring along	3 to 6 years of experience in IT operations or a similar role Willing and able to travel internationally (twice a year) Monitor and Operate IT Products: Perform regular and sporadic operational tasks to ensure optimal performance of IT services Own and maintain the Regular OPS Tasks list refining sporadic tasks based on input from the Operations Experts (OE) network Manage IT Service Continuity: Prepare for and attend emergency exercises (EE) reviewing outcomes and deriving follow-up tasks Communicate findings and improvements to the OE network Manage Availability: Participate in Gamedays and backup/restore test sessions practicing and executing backup and restore processes. Own the recovery and backup plan reviewing success and identifying follow-up tasks. Manage Capacity: Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues Plan and execute capacity extensions as needed Manage Service Configuration: Oversee service configuration management using ITSM tools Manage Events: Observe dashboards and alerts take action for root cause analysis (RCA) and create tasks for the DevOps team. Provide proactive feedback and maintain monitoring and alerting solutions. Manage Problems: Conduct root cause analysis and manage known issues creating Jira defects for further assistance if required Enable Changes: Create and sync changes with the team assisting with releases and deployment plans. Manage Service Requests and Incidents: Observe and resolve service requests and incidents creating Jira tasks for the DevOps team as necessary. Manage Knowledge: Create use and extend knowledge articles ensuring availability and consistency. You take part in 24/7 on-call rotations in a future setup with teams around the world and can restore systems in an efficient manner.
Must have technical skill	Strong understanding of IT service management principles and practices Proficiency in monitoring and management tools (e.g. dashboards alerting systems) Strong analytical and problem-solving abilities particularly in IT service management Experience in conducting root cause analysis (RCA) and managing known issues Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services Ability to manage IT service continuity availability and capacity effectively Experience with change management processes including creating and syncing changes with teams Ability to plan and execute capacity extensions and backup/restore processes Any additional responsibilities assigned in the Agile Working Model (AWM) Charter
Good to have technical skills	Experience with IT service management frameworks (e.g. ITIL SRE practices) Familiarity with cloud platforms (e.g. Azure) and their operational management Experience with automation tools (e.g. Ansible Puppet Terraform) and scripting languages (e.g. Python Bash) to streamline operational tasks Understanding of DevOps methodologies and practices including CI/CD (Continuous Integration/Continuous Deployment) processes Knowledge of network protocols configurations and troubleshooting to support IT infrastructure Understanding of IT security best practices and compliance requirements to ensure secure operations Skills in data analysis and visualization tools (e.g. Splunk Grafana) to interpret operational metrics and trends Above-board work ethics

Required Experience:

Senior IC

RoleSRE/OPSLocationChennaiFlexibility to travel for business tripsExperience:3 to 6 Years of experienceWhat awaits you/ Job ProfileOur vision is to provide an overarching platform for AI-based quality assurance (AIQX) in the global production network to accelerate the end-to-end quality control cycl...

Role	SRE/OPS
Location	Chennai Flexibility to travel for business trips
Experience:	3 to 6 Years of experience
What awaits you/ Job Profile	Our vision is to provide an overarching platform for AI-based quality assurance (AIQX) in the global production network to accelerate the end-to-end quality control cycle in vehicle manufacturing. We develop these in an international feature team based on state-of-the-art technologies in close cooperation with our users in the plants. We are looking for a SRE to join our BMW teams of rock-solid specialists developing and operating our AI-based quality assurance solution for BMWs plants. In this position you will take an important role in maintaining and operating a highly complex platform working in an international team securing the quality of our BMW Products. Our services mainly run on the Microsoft Azure Cloud Platform. If you are a passionate SRE preferably with a developer background willing to take responsibility for our platform sharing knowledge and giving guidance within the team are thrilled about latest technology full of energy and ambition hands-on and not afraid of making your hands dirty this is the right position for you.
What should you bring along	3 to 6 years of experience in IT operations or a similar role Willing and able to travel internationally (twice a year) Monitor and Operate IT Products: Perform regular and sporadic operational tasks to ensure optimal performance of IT services Own and maintain the Regular OPS Tasks list refining sporadic tasks based on input from the Operations Experts (OE) network Manage IT Service Continuity: Prepare for and attend emergency exercises (EE) reviewing outcomes and deriving follow-up tasks Communicate findings and improvements to the OE network Manage Availability: Participate in Gamedays and backup/restore test sessions practicing and executing backup and restore processes. Own the recovery and backup plan reviewing success and identifying follow-up tasks. Manage Capacity: Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues Plan and execute capacity extensions as needed Manage Service Configuration: Oversee service configuration management using ITSM tools Manage Events: Observe dashboards and alerts take action for root cause analysis (RCA) and create tasks for the DevOps team. Provide proactive feedback and maintain monitoring and alerting solutions. Manage Problems: Conduct root cause analysis and manage known issues creating Jira defects for further assistance if required Enable Changes: Create and sync changes with the team assisting with releases and deployment plans. Manage Service Requests and Incidents: Observe and resolve service requests and incidents creating Jira tasks for the DevOps team as necessary. Manage Knowledge: Create use and extend knowledge articles ensuring availability and consistency. You take part in 24/7 on-call rotations in a future setup with teams around the world and can restore systems in an efficient manner.
Must have technical skill	Strong understanding of IT service management principles and practices Proficiency in monitoring and management tools (e.g. dashboards alerting systems) Strong analytical and problem-solving abilities particularly in IT service management Experience in conducting root cause analysis (RCA) and managing known issues Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services Ability to manage IT service continuity availability and capacity effectively Experience with change management processes including creating and syncing changes with teams Ability to plan and execute capacity extensions and backup/restore processes Any additional responsibilities assigned in the Agile Working Model (AWM) Charter
Good to have technical skills	Experience with IT service management frameworks (e.g. ITIL SRE practices) Familiarity with cloud platforms (e.g. Azure) and their operational management Experience with automation tools (e.g. Ansible Puppet Terraform) and scripting languages (e.g. Python Bash) to streamline operational tasks Understanding of DevOps methodologies and practices including CI/CD (Continuous Integration/Continuous Deployment) processes Knowledge of network protocols configurations and troubleshooting to support IT infrastructure Understanding of IT security best practices and compliance requirements to ensure secure operations Skills in data analysis and visualization tools (e.g. Splunk Grafana) to interpret operational metrics and trends Above-board work ethics