Job Role: Senior Site Reliability Engineer
Job Location: NYC NY (Hybrid)
Job Duration: Long Term Contract
Overview:
- Support the SRE team in developing and implementing enhancements to support workflows focusing on automation and efficiency improvements
- Handle technical escalations troubleshoot complex FIX and API connectivity issues and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
- Adhere to and administer incident and change management policies.
- Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability.
- Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones.
- Coordinate Incident Post Mortems and RCA analysis.
- Design implement and maintain comprehensive monitoring logging and tracing solutions (observability stack) to provide deep insights into system performance and user experience.
- Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs) managing error budgets to ensure service reliability meets business needs
Required Qualifications:
- 5 years in a senior SRE role or a similar position demonstrating deep knowledge and expertise in site reliability engineering and operations.
- Knowledge of FIX protocol and messages ability to read FIX logs.
- Familiarity with REST APIs and a strong understanding of API integration.
- Proficient in Python and scripting for automation and system management with a proven track record of developing and implementing automation solutions.
- Expertise in SQL and transactional databases including querying and troubleshooting.
- Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis.
- In-depth knowledge of core networking concepts including TCP/IP routing and DNS.
- Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo).
- Availability for flexible work hours and willingness to cover US markets trading sessions including L2 on-call coverage.
- Knowledge of change management processes and risk management.
Preferred Qualifications:
- Experience in the brokerage or financial industry.
- Proficient with cloud services particularly AWS and knowledgeable about cloud architecture best practices including IAM EC2 S3 and DynamoDB.
- Experience maintaining and supporting containerized systems with familiarity in orchestration tools.
- Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation.
- Ability to manage and troubleshoot job scheduling tools like Run deck or Apache Airflow.
- Advanced skills in managing containerized environments using Kubernetes and OpenShift.
- Practical experience with Confluent Cloud Red Panda for event streaming architectures.
- Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging.
Job Role: Senior Site Reliability Engineer Job Location: NYC NY (Hybrid) Job Duration: Long Term Contract Overview: Support the SRE team in developing and implementing enhancements to support workflows focusing on automation and efficiency improvements Handle technical escalations troubleshoot c...
Job Role: Senior Site Reliability Engineer
Job Location: NYC NY (Hybrid)
Job Duration: Long Term Contract
Overview:
- Support the SRE team in developing and implementing enhancements to support workflows focusing on automation and efficiency improvements
- Handle technical escalations troubleshoot complex FIX and API connectivity issues and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
- Adhere to and administer incident and change management policies.
- Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability.
- Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones.
- Coordinate Incident Post Mortems and RCA analysis.
- Design implement and maintain comprehensive monitoring logging and tracing solutions (observability stack) to provide deep insights into system performance and user experience.
- Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs) managing error budgets to ensure service reliability meets business needs
Required Qualifications:
- 5 years in a senior SRE role or a similar position demonstrating deep knowledge and expertise in site reliability engineering and operations.
- Knowledge of FIX protocol and messages ability to read FIX logs.
- Familiarity with REST APIs and a strong understanding of API integration.
- Proficient in Python and scripting for automation and system management with a proven track record of developing and implementing automation solutions.
- Expertise in SQL and transactional databases including querying and troubleshooting.
- Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis.
- In-depth knowledge of core networking concepts including TCP/IP routing and DNS.
- Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo).
- Availability for flexible work hours and willingness to cover US markets trading sessions including L2 on-call coverage.
- Knowledge of change management processes and risk management.
Preferred Qualifications:
- Experience in the brokerage or financial industry.
- Proficient with cloud services particularly AWS and knowledgeable about cloud architecture best practices including IAM EC2 S3 and DynamoDB.
- Experience maintaining and supporting containerized systems with familiarity in orchestration tools.
- Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation.
- Ability to manage and troubleshoot job scheduling tools like Run deck or Apache Airflow.
- Advanced skills in managing containerized environments using Kubernetes and OpenShift.
- Practical experience with Confluent Cloud Red Panda for event streaming architectures.
- Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging.
View more
View less