Job Title: SITE RELIABILITY ENGINEER
Location: Reston. VA
Duration: 12 Months
Visa: USC GC H1B and EAD
Contract Type: W2
Description:
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with a strong background in observability telemetry and monitoring to join our development team. In this role you will be responsible for implementing and maintaining observability solutions using OpenTelemetry (Otel) and Splunk ensuring the reliability scalability and performance of our systems
Key Responsibilities
- Design and implement observability strategies using OpenTelemetry for distributed tracing metrics and logging.
- Instrument microservices written in Java and Python using Otel SDKs and auto-instrumentation tools.
- Develop and maintain Splunk dashboards alerts and reports to provide actionable insights into system performance and reliability.
- Collaborate with development and operations teams to ensure consistent and effective telemetry across services.
- Automate monitoring and alerting pipelines to proactively detect and resolve issues.
- Participate in on-call rotations incident response and postmortem analysis to improve system resilience.
- Drive adoption of SRE best practices including SLIs SLOs and error budgets.
- Continuously evaluate and improve observability tools and practices.
Required Qualifications:
- Certifications: Splunk Certified Developer Admin (At least one of them)
- 3 years of experience in Splunk development (Create Dashboards Visualizations Statistical reports scheduled searches alerts custom applications using Python and knowledge objects)
- Experience with both XML and dashboard studio development is a must
- Expert level knowledge and understanding of Splunk Search language and building complex queries
- Implement KV stores lookups and data model acceleration to optimize search performance and reporting
- Knowledge of how to customize Dashboards via the simple XML advanced XML source JavaScript CSS advanced HTML
- Expert-level capabilities with regular expression and statistical functions
- Experience with creating Splunk knowledge objects (field extractions macros event types etc.)
- Strong problem solving logic and analytical skills
- Prior experience as web developer using Java XML JavaScript AJAX or other programming languages is a plus