Client Name: Kforce
End Client Name: Financial Services Company Name Confidential
Job Title: Resiliency Architect
Location: Tampa FL
Work Type: Hybrid (3 days onsite per week)
Job Type: Contract (Through End of Year likely extension)
Rate: $70/hour on w2
LinkedIn needed
Notes:
- Interviews ongoing - urgent requirement
- End client is a major financial services provider specializing in clearing settlement custody and risk management for securities transactions
- Strong opportunity for extension beyond EOY
Job Description:
Seeking a skilled Resiliency Architect to design and implement chaos engineering experiments that assess and improve the stability and fault tolerance of large-scale distributed systems. This individual will lead efforts in building a resilient architecture by working closely with cross-functional engineering teams and utilizing advanced cloud and observability tools.
What Will You Be Involved With
- Design implement and execute chaos experiments to test the resilience of our distributed systems.
- Develop and maintain a robust framework for chaos engineering including tools automation and documentation.
- Collaborate with engineers across different teams to identify potential areas for chaos experiments.
- Analyze and interpret experiment results providing actionable insights to improve system resilience.
- Contribute to the development of best practices and standards for chaos engineering within the organization.
- Stay up-to-date with industry best practices in chaos engineering distributed systems and cloud architectures.
What Will You Bring to the Table
- Bachelors degree in Computer Science Engineering or a related field or equivalent experience.
- Proficient in programming languages such as Python or Go with the ability to write clean maintainable and efficient code.
- Strong understanding of TCP/IP networking principles.
- Proven experience with AWS services and architecture including but not limited to EC2 S3 RDS and Lambda.
- Strong understanding of distributed systems with hands-on experience using technologies such as Cassandra and Kafka.
- Experience with monitoring and observability tools (e.g. Prometheus Grafana ELK stack).
- Familiarity with continuous integration/deployment (CI/CD) practices and tools.
- Strong analytical and problem-solving skills passionate about reliability and performance improvement.
- Excellent communication skills with the ability to articulate technical concepts to non-technical stakeholders.