Job Title: Monitoring and Observability Architect
Location: Raritan NJ 08869 (Onsite)
Contract
Role Overview
- We are seeking an experienced Monitoring and Observability Architect to design implement and optimize enterprise-wide observability solutions across cloud on-premises and hybrid environments.
- This role is responsible for defining monitoring strategies improving system reliability and enabling proactive incident detection through metrics logs and traces.
- The ideal candidate combines deep technical expertise with architectural vision to build scalable secure and resilient observability platforms that support modern DevOps and SRE practices.
Key Responsibilities
Architecture & Strategy
- Define enterprise observability architecture aligned with business and IT objectives.
- Design monitoring frameworks for applications infrastructure networks and cloud-native platforms.
- Establish standards governance and best practices for monitoring and alerting.
Implementation & Engineering
- Architect and deploy tools such as Prometheus Grafana Datadog Splunk ELK New Relic Dynatrace AppDynamics etc.
- Implement distributed tracing (OpenTelemetry Jaeger Zipkin).
- Design centralized logging and log aggregation solutions.
- Enable APM RUM synthetic monitoring and infrastructure monitoring.
Cloud & DevOps Integration
- Integrate observability into CI/CD pipelines.
- Support Kubernetes and container observability.
- Enable Infrastructure-as-Code monitoring automation (Terraform ARM CloudFormation).
- Collaborate with SRE and DevOps teams to enhance reliability and performance.
Reliability & Incident Management
- Define SLI/SLO/SLAs and error budgets.
- Develop intelligent alerting strategies to reduce noise.
- Enable root cause analysis and performance optimization.
- Support major incident investigations.
Security & Compliance
- Ensure monitoring solutions meet security and compliance requirements.
- Implement role-based access control (RBAC) and secure data handling.
Stakeholder Collaboration
- Partner with customer engineering operations security and business teams.
- Provide technical leadership and mentorship.
- Present architecture designs to leadership and governance boards.
Job Title: Monitoring and Observability Architect Location: Raritan NJ 08869 (Onsite) Contract Role Overview We are seeking an experienced Monitoring and Observability Architect to design implement and optimize enterprise-wide observability solutions across cloud on-premises and hybrid envir...
Job Title: Monitoring and Observability Architect
Location: Raritan NJ 08869 (Onsite)
Contract
Role Overview
- We are seeking an experienced Monitoring and Observability Architect to design implement and optimize enterprise-wide observability solutions across cloud on-premises and hybrid environments.
- This role is responsible for defining monitoring strategies improving system reliability and enabling proactive incident detection through metrics logs and traces.
- The ideal candidate combines deep technical expertise with architectural vision to build scalable secure and resilient observability platforms that support modern DevOps and SRE practices.
Key Responsibilities
Architecture & Strategy
- Define enterprise observability architecture aligned with business and IT objectives.
- Design monitoring frameworks for applications infrastructure networks and cloud-native platforms.
- Establish standards governance and best practices for monitoring and alerting.
Implementation & Engineering
- Architect and deploy tools such as Prometheus Grafana Datadog Splunk ELK New Relic Dynatrace AppDynamics etc.
- Implement distributed tracing (OpenTelemetry Jaeger Zipkin).
- Design centralized logging and log aggregation solutions.
- Enable APM RUM synthetic monitoring and infrastructure monitoring.
Cloud & DevOps Integration
- Integrate observability into CI/CD pipelines.
- Support Kubernetes and container observability.
- Enable Infrastructure-as-Code monitoring automation (Terraform ARM CloudFormation).
- Collaborate with SRE and DevOps teams to enhance reliability and performance.
Reliability & Incident Management
- Define SLI/SLO/SLAs and error budgets.
- Develop intelligent alerting strategies to reduce noise.
- Enable root cause analysis and performance optimization.
- Support major incident investigations.
Security & Compliance
- Ensure monitoring solutions meet security and compliance requirements.
- Implement role-based access control (RBAC) and secure data handling.
Stakeholder Collaboration
- Partner with customer engineering operations security and business teams.
- Provide technical leadership and mentorship.
- Present architecture designs to leadership and governance boards.
View more
View less