Hybrid 3 days onsite a week
Location: Richmond VA
This contractor does need to report to the office on a Hybrid model and we are currently ONLY considering the Richmond location. Capital Ones current hybrid model requires 3 in person days per week with Monday- Thursday being optional in person days and Friday being a dedicated work from home day.
This person is needed in order to create dashboards in DataDog to measure customer impact and severity for Discover applications.
Key Responsibilities:
- Implement and manage full-stack observability using Datadog ensuring seamless monitoring across infrastructure applications and services.
- Instrument agents for on-premise cloud and hybrid environments to enable comprehensive monitoring.
- Design and deploy key service monitoring including dashboards monitor creation SLA/SLO definitions and anomaly detection with alert notifications.
- Configure and integrate Datadog with third-party services such as ServiceNow SSO enablement and other ITSM tools.
Core Responsibilities:
- Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems incorporating logs metrics and traces.
- System Instrumentation: Instrument applications infrastructure and services to collect telemetry data using frameworks like OpenTelemetry.
- Data Analysis & Visualization: Develop dashboards reports and alerts using tools like Prometheus Grafana and Splunk to visualize system performance and detect issues.
- Collaboration: Work with development SRE and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.
- Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.
Key Skills & Tools:
- Observability Tools: Proficiency in monitoring logging and tracing tools including Prometheus Grafana ELK Stack (Elasticsearch Logstash Kibana) Splunk Datadog New Relic and cloud-native solutions like AWS CloudWatch.
- Programming Languages: Expertise in languages such as Python and Go for scripting and automation.
- Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS GCP Azure) and container orchestration systems like Kubernetes.
- Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations.
- CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins.
- System & Software Engineering: A strong background in both system operations and software development.
- Optimize cloud agent instrumentation with cloud certifications being a plus.
- Datadog Fundamental APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)
- Strong understanding of Observability concepts (Logs Metrics Tracing)
- Expertise in security & vulnerability management in observability
- Possesses 2 years of experience in cloud-based observability solutions specializing in monitoring logging and tracing across AWS Azure and GCP environments.
Hybrid 3 days onsite a week Location: Richmond VA This contractor does need to report to the office on a Hybrid model and we are currently ONLY considering the Richmond location. Capital Ones current hybrid model requires 3 in person days per week with Monday- Thursday being optional in person d...
Hybrid 3 days onsite a week
Location: Richmond VA
This contractor does need to report to the office on a Hybrid model and we are currently ONLY considering the Richmond location. Capital Ones current hybrid model requires 3 in person days per week with Monday- Thursday being optional in person days and Friday being a dedicated work from home day.
This person is needed in order to create dashboards in DataDog to measure customer impact and severity for Discover applications.
Key Responsibilities:
- Implement and manage full-stack observability using Datadog ensuring seamless monitoring across infrastructure applications and services.
- Instrument agents for on-premise cloud and hybrid environments to enable comprehensive monitoring.
- Design and deploy key service monitoring including dashboards monitor creation SLA/SLO definitions and anomaly detection with alert notifications.
- Configure and integrate Datadog with third-party services such as ServiceNow SSO enablement and other ITSM tools.
Core Responsibilities:
- Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems incorporating logs metrics and traces.
- System Instrumentation: Instrument applications infrastructure and services to collect telemetry data using frameworks like OpenTelemetry.
- Data Analysis & Visualization: Develop dashboards reports and alerts using tools like Prometheus Grafana and Splunk to visualize system performance and detect issues.
- Collaboration: Work with development SRE and DevOps teams to integrate observability best practices and align monitoring with business and operational goals.
- Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.
Key Skills & Tools:
- Observability Tools: Proficiency in monitoring logging and tracing tools including Prometheus Grafana ELK Stack (Elasticsearch Logstash Kibana) Splunk Datadog New Relic and cloud-native solutions like AWS CloudWatch.
- Programming Languages: Expertise in languages such as Python and Go for scripting and automation.
- Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS GCP Azure) and container orchestration systems like Kubernetes.
- Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations.
- CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins.
- System & Software Engineering: A strong background in both system operations and software development.
- Optimize cloud agent instrumentation with cloud certifications being a plus.
- Datadog Fundamental APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)
- Strong understanding of Observability concepts (Logs Metrics Tracing)
- Expertise in security & vulnerability management in observability
- Possesses 2 years of experience in cloud-based observability solutions specializing in monitoring logging and tracing across AWS Azure and GCP environments.
View more
View less