drjobs Reliability Engineer, Ai & Data Platforms

Reliability Engineer, Ai & Data Platforms

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Austin - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

As part of our team you will be responsible for developing and operating our big data platform using open source or other solutions to aid critical applications such as analytics reporting and AI/ML apps. This includes working to optimize performance and cost automate operations and identifying and resolving production errors and issues to ensure the best data platform experience.


  • 3 years of professional software engineering experience with strong programming skills in languages such as Java Scala Python or Go preferably with critical large-scale distributed systems.
  • Expertise in designing building and operating critical large-scale distributed systems with a focus on low latency fault-tolerance and high availability.
  • Proven experience with data processing ecosystems and distributed computing frameworks like Spark or Flink as well as MPP Query Engines such as Trino or Starrocks.
  • Experience designing and developing stateless APIs (e.g. HTTP) for service-oriented architectures across multi-cloud environments.
  • Proficiency with container orchestration (e.g. Kubernetes Helm) CI/CD pipelines (e.g. GitHub Actions Jenkins) and infrastructure as code tools (e.g. Terraform Pulumi).
  • Strong troubleshooting and performance analysis skills in complex production environments with proficiency in Unix/Linux operating systems and command-line tools.


  • Experience with contribution to Open Source projects is a plus.
  • Experience with multiple public cloud infrastructure managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes/Spark issues.
  • Experience with workflow and data pipeline orchestration tools (e.g. Airflow DBT).
  • Understanding of data modeling and data warehousing concepts.
  • Familiarity with the AI/ML stack including GPUs MLFlow or Large Language Models (LLMs).
  • A learning attitude to continuously improve the self team and the organization.
  • Solid understanding of software engineering best practices including the full development lifecycle secure coding and experience building reusable frameworks or libraries.

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.