Site Reliability Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

New York City, NY - USA

profile Monthly Salary: Not Disclosed
Posted on: 16 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Role: Senior Site Reliability Engineer

Job Location: NYC NY (Hybrid)

Job Duration: Long Term Contract

Overview:

  • Support the SRE team in developing and implementing enhancements to support workflows focusing on automation and efficiency improvements
  • Handle technical escalations troubleshoot complex FIX and API connectivity issues and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
  • Adhere to and administer incident and change management policies.
  • Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability.
  • Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones.
  • Coordinate Incident Post Mortems and RCA analysis.
  • Design implement and maintain comprehensive monitoring logging and tracing solutions (observability stack) to provide deep insights into system performance and user experience.
  • Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs) managing error budgets to ensure service reliability meets business needs

Required Qualifications:

  • 5 years in a senior SRE role or a similar position demonstrating deep knowledge and expertise in site reliability engineering and operations.
  • Knowledge of FIX protocol and messages ability to read FIX logs.
  • Familiarity with REST APIs and a strong understanding of API integration.
  • Proficient in Python and scripting for automation and system management with a proven track record of developing and implementing automation solutions.
  • Expertise in SQL and transactional databases including querying and troubleshooting.
  • Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis.
  • In-depth knowledge of core networking concepts including TCP/IP routing and DNS.
  • Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo).
  • Availability for flexible work hours and willingness to cover US markets trading sessions including L2 on-call coverage.
  • Knowledge of change management processes and risk management.

Preferred Qualifications:

  • Experience in the brokerage or financial industry.
  • Proficient with cloud services particularly AWS and knowledgeable about cloud architecture best practices including IAM EC2 S3 and DynamoDB.
  • Experience maintaining and supporting containerized systems with familiarity in orchestration tools.
  • Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation.
  • Ability to manage and troubleshoot job scheduling tools like Run deck or Apache Airflow.
  • Advanced skills in managing containerized environments using Kubernetes and OpenShift.
  • Practical experience with Confluent Cloud Red Panda for event streaming architectures.
  • Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging.

Job Role: Senior Site Reliability Engineer Job Location: NYC NY (Hybrid) Job Duration: Long Term Contract Overview: Support the SRE team in developing and implementing enhancements to support workflows focusing on automation and efficiency improvements Handle technical escalations troubleshoot c...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting