Cloud Site Reliability Engineer (SRE) Data Management & Analytics Platform

Bloomberg


Job Location:

Princeton, NJ - USA

Monthly Salary: Not Disclosed
Posted on: 9 hours ago
Vacancies: 1 Vacancy

Job Summary

Cloud Site Reliability Engineer (SRE) - Data Management & Analytics Platform
Location
Princeton
Business Area
Engineering and CTO
Ref #

Description & Requirements

At Bloomberg data is at the heart of everything we do. As part of the Data Management and Analytics Platform (DMAP) SRE team you will play a critical role in driving analytics throughout the organization to improve our products better engage with our customers create greater efficiencies and unlock new business opportunities through data-driven insights.

Our team is responsible for capturing and processing the who what when where and why of how clients use Bloomberg products how our systems perform and how employees interact with customers. We ingest and prepare massive volumes of data to power reporting dashboards self-service tools and advanced analytics used across the company.

We are looking for a Cloud Site Reliability Engineer (SRE) who is passionate about building and operating highly reliable scalable data platforms in the this role you will focus on ensuring the availability performance and scalability of critical data pipelines and analytics infrastructure. You will work at the intersection of software engineering and infrastructure applying automation observability and reliability best practices to support large-scale distributed systems.

Youll Be Trusted To

  • Design build and operate highly available scalable and resilient cloud infrastructure supporting large-scale data ingestion and analytics platforms

  • Define implement and monitor SLIs/SLOs for data systems and services; drive reliability improvements using error budgets and operational metrics

  • Improve observability across data pipelines and platforms through logging metrics tracing and alerting

  • Automate infrastructure provisioning and system management using Infrastructure as Code (IaC)

  • Lead incident response efforts perform root cause analysis (RCA) and implement post-incident improvements

  • Optimize performance reliability and cost efficiency of cloud-based data systems

  • Ensure data platform reliability including batch and streaming pipelines storage systems and reporting infrastructure

  • Partner with data engineers software engineers and stakeholders to improve system reliability and operational maturity

  • Strengthen platform security through proactive monitoring vulnerability management and cloud security best practices

  • Continuously improve CI/CD pipelines and deployment processes for data infrastructure

Youll Need To Have

  • 5 years of experience in Site Reliability Engineering DevOps or Cloud Infrastructure roles

  • Strong proficiency in at least one programming or scripting language (Python and/or Go)

  • Experience supporting production systems with a focus on reliability scalability and observability

  • Hands-on experience operating or designing highly available distributed systems.

  • A Bachelors degree in Computer Science Engineering Mathematics or a related field or equivalent professional experience

Wed Love To See

  • Experience supporting large-scale data platforms data pipelines or analytics infrastructure

  • Strong experience operating production systems in AWS at scale

  • Experience defining and managing SLIs SLOs and error budgets

  • Strong background in monitoring and observability tools (e.g. Prometheus Grafana CloudWatch Datadog)

  • Experience leading incident management and conducting postmortems

  • Hands-on experience with Infrastructure as Code (Terraform or CloudFormation)

  • Experience building and maintaining CI/CD pipelines

  • Strong understanding of distributed systems and cloud architecture

  • Experience with containerized workloads (Docker Kubernetes)

  • Knowledge of AWS services related to data platforms (e.g. S3 EMR Lambda Kinesis Glue Redshift)

  • Knowledge of Databricks or Snowflake platform

  • Experience with cloud networking concepts (VPCs routing security groups)

  • Experience optimizing cloud costs in large-scale environments

  • AWS certification (Associate level or above)

  • A security-first mindset and familiarity with compliance and data governance best practices

  • Experience using operational metrics and data to drive continuous improvement

Our most successful engineers are collaborative data-driven and take strong ownership of production systems end-to-end ensuring the reliability of the data platforms that power Bloombergs analytics and insights.


Salary Range 00 USD Annual Benefits Bonus

The referenced salary range is based on the Companys good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location work experience market conditions education/training and skill level.


We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases incentive compensation (exempt roles only) paid holidays paid time off medical dental vision short and long term disability benefits 401(k) match life insurance and various wellness programs among others. The Company does not provide benefits directly to contingent workers/contractors and interns.


Required Experience:

IC

Cloud Site Reliability Engineer (SRE) - Data Management & Analytics Platform ...

About Company

Company Logo

Bloomberg is the world's primary distributor of financial data and a top news provider of the 21st century. A global information and technology company, we use our dynamic network of data, ideas and analysis to solve difficult problems every day. Our customers around the world rely on ... View more

View Profile View Profile