SRE (Onsite Hartford, CT)

Vinsys Information Technology Inc

Job Location:

Hartford, CT - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Location: Hartford CT
Position: Senior SRE Engineer - (Cloud Platform)
Job Role: Lead SRE implementation specifically for frontend portal monitoring reliability and performance on Google Cloud Platform or Microsoft Azure .

Job Description:
- Design and implement comprehensive SRE monitoring for web portal on GCP
- Set up JVM metrics collection and performance monitoring for Java applications using GCP Monitoring
- Implement logging and tracing standards across all portal components using Cloud Logging and Cloud Trace
- Configure APIGEE monitoring and API performance tracking for portal services
- Implement distributed tracing with W3C Trace Context headers and OpenTelemetry
- Create drill-down dashboards with correlation between metrics logs and traces using GCP tools
- Integrate GCP Monitoring Logging and Trace with existing Prometheus/Grafana stack
- Configure GMP (Google Managed Prometheus) for enhanced metrics collection
- Implement UI zero code instrumentation for frontend monitoring and traceability
- Create RED (Request Error Duration) dashboards for Performance and Production environments
- Build service health dashboards with drill-down capabilities and error message analysis
-Develop and maintain SRE automation/scripts within GKE namespaces (SRE and others) for monitoring deployment and troubleshooting.

Experience: 5 years in SRE/DevOps with proven JVM APIGEE GCP observability Grafana stack GKE OpenTelemetry and UI instrumentation implementation experience

Clear Skills Needed:
- Technical: Python Linux Prometheus Grafana Kubernetes Docker Loki Tempo
- JVM Metrics: Java application monitoring JVM performance tuning heap analysis garbage collection optimization for portal applications
- Logging & Tracing: Splunk distributed tracing log aggregation standards correlation IDs across portal systems
- API Management: APIGEE experience API monitoring rate limiting security performance tracking for portal APIs
- Infrastructure: CI/CD pipelines AI tools like GIT copilot Cursor etc.
- Observability Tools & Query Languages: PromQL InfluxQL for querying metrics(Grafana)
- Strong experience with Kubernetes (GKE) including namespace management RBAC and deploying/maintaining SRE tools via code (Python Bash YAML Helm).

Additional Critical Skills:
- Distributed Tracing Standards: W3C Trace Context headers implementation
- Structured Logging: JSON format with specific fields (traceid )
- Performance Baseline Establishment: Ability to collect and analyze 2-4 weeks historical data for performance baselines
- Dashboard Implementation: Drill-down capabilities service mapping from trace data correlation between metrics/logs/traces

GCP-Specific Observability Skills (CRITICAL):
- Google Cloud Monitoring: GMP (Google Managed Prometheus) Cloud Monitoring dashboards alerting policies
- Google Cloud Logging: Centralized logging log-based metrics log exports
- OpenTelemetry (OTEL): Instrumentation collectors data collection from GCP services

UI Instrumentation & Frontend Monitoring (CRITICAL):
- UI Span Management: Naming conventions for UI-initiated spans W3C Trace Context headers for frontend
- Frontend Observability: User session tracking component-level monitoring UI performance metrics
- Cross-Platform Tracing: End-to-end traceability from UI to backend services

Required Skills : PrometheusGrafanaGoogle Cloud Platform (GCP)Google Cloud LoggingKubernetesweb metricsCloud

Basic Qualification :

Additional Skills :

This is a high PRIORITY requisition. This is a PROACTIVE requisition

Background Check : Yes

Drug Screen : No

Location: Hartford CTPosition: Senior SRE Engineer - (Cloud Platform)Job Role: Lead SRE implementation specifically for frontend portal monitoring reliability and performance on Google Cloud Platform or Microsoft Azure .Job Description:- Design and implement comprehensive SRE monitoring for web port...