REQUIREMENTS:
- Total experience 10 years.
- Strong working expertise with observability and monitoring tools: Splunk Datadog ELK Prometheus or similar.
- Proven experience in anomaly detection alert tuning event correlation and custom dashboards.
- Deep understanding of alert deduplication incident impact scoring and automation frameworks.
- Hands-on with automation platforms (Rundeck StackStorm Jenkins or custom scripting).
- Strong Python expertise (scripting & automation) and proficiency in Bash or other scripting languages.
- Experience in leveraging AI/ML for Ops: log analysis chatbot incident assistance predictive alerts.
- Knowledge of multi-cloud platforms and tools like PolyCloud Terraform or CloudFormation.
- Strong experience with ITSM tools (ServiceNow Remedy) and their integration into AIOps pipelines.
- Expertise in integrating ServiceNow via REST/SOAP APIs for incident automation CMDB sync and workflow orchestration.
- Working knowledge of ITIL processes and how AIOps enhances Incident Problem and Change Management.
- Exposure to CMDB integration dependency graphs and service maps for contextual alerting and automation.
- Excellent communication and collaboration skills with the ability to interact effectively with senior stakeholders.
RESPONSIBILITIES:
- Understanding the clients business use cases and technical requirements and be able to convert them into technical design which elegantly meets the requirements.
- Mapping decisions with requirements and be able to translate the same to developers.
- Identifying different solutions and being able to narrow down the best option that meets the clients requirements.
- Defining guidelines and benchmarks for NFR considerations during project implementation.
- Writing and reviewing design document explaining overall architecture framework and high-level design of the application for the developers.
- Reviewing architecture and design on various aspects like extensibility scalability security design patterns user experience NFRs etc. and ensure that all relevant best practices are followed.
- Developing and designing the overall solution for defined functional and non-functional requirements; and defining technologies patterns and frameworks to materialize it.
- Understanding and relating technology integration scenarios and applying these learnings in projects.
- Resolving issues that are raised during code/review through exhaustive systematic analysis of the root cause and being able to justify the decision taken.
- Carrying out POCs to make sure that suggested design/technologies meet the requirements.
Qualifications :
Bachelors or masters degree in computer science Information Technology or a related field.
Remote Work :
No
Employment Type :
Full-time