A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.
To meet this growth the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer) supporting Kafka Redis Opensearch RabbitMq ClickHouse for products.
Tasks
- Manage monitor and optimize ClickHouse clusters in production including schema design query performance tuning replication configuration and capacity planning;
- Operate and maintain Kafka clusters OpenSearch deployments and other distributed systems ensuring high availability and optimal performance;
- Deploy configure and manage containerized applications and stateful workloads on Kubernetes implementing best practices for resource management and scaling;
- Implement and maintain GitOps workflows for infrastructure and application deployments ensuring version-controlled and automated deployment processes;
- Design and implement comprehensive monitoring logging and alerting solutions for distributed systems enabling proactive issue detection and rapid troubleshooting;
- Conduct performance analysis identify bottlenecks and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
- Create and maintain technical documentation runbooks and operational procedures while collaborating with development teams to ensure smooth integration and operations.
Requirements
- Strong programming skills in Python TypeScript or similar.
- Experience building software that uses LLM or generative AI APIs.
- Hands on experience using AI coding assistants like GitHub Copilot Cursor Claude Code or similar.
- Understanding of LLM fundamentals for example tokenization context limits temperature prompt structure.
- Experience with RAG agents or function calling in production.
- Ability to design experiments and measure impact on quality cost and performance.
- Good communication skills and ability to work with cross functional teams.
Nice to have
- Experience with cloud environments like AWS GCP or Azure.
- Background in NLP ML or developer tools.Working conditions: CET business hours
Other:
Location: Serbia Portugal Poland
Working conditions: CET business hours
A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.To meet this growth the company is looking to add an Inf...
A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.
To meet this growth the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer) supporting Kafka Redis Opensearch RabbitMq ClickHouse for products.
Tasks
- Manage monitor and optimize ClickHouse clusters in production including schema design query performance tuning replication configuration and capacity planning;
- Operate and maintain Kafka clusters OpenSearch deployments and other distributed systems ensuring high availability and optimal performance;
- Deploy configure and manage containerized applications and stateful workloads on Kubernetes implementing best practices for resource management and scaling;
- Implement and maintain GitOps workflows for infrastructure and application deployments ensuring version-controlled and automated deployment processes;
- Design and implement comprehensive monitoring logging and alerting solutions for distributed systems enabling proactive issue detection and rapid troubleshooting;
- Conduct performance analysis identify bottlenecks and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
- Create and maintain technical documentation runbooks and operational procedures while collaborating with development teams to ensure smooth integration and operations.
Requirements
- Strong programming skills in Python TypeScript or similar.
- Experience building software that uses LLM or generative AI APIs.
- Hands on experience using AI coding assistants like GitHub Copilot Cursor Claude Code or similar.
- Understanding of LLM fundamentals for example tokenization context limits temperature prompt structure.
- Experience with RAG agents or function calling in production.
- Ability to design experiments and measure impact on quality cost and performance.
- Good communication skills and ability to work with cross functional teams.
Nice to have
- Experience with cloud environments like AWS GCP or Azure.
- Background in NLP ML or developer tools.Working conditions: CET business hours
Other:
Location: Serbia Portugal Poland
Working conditions: CET business hours
View more
View less