drjobs Site Reliability Engineer

Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bangalore - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Position Overview

Job Title: Site Reliability Engineer (SRE)
Department: Technology
Location: Bangalore
Reporting To: Head of Infra

Tookitaki is looking for a Site Reliability Engineer (SRE) with 36 years of experience to help maintain and scale the infrastructure that powers our flagship productsFinCense and the AFC Ecosystem. As an SRE you will work at the intersection of software engineering and infrastructure ensuring high availability performance and scalability of our platforms.

You will collaborate with engineering DevOps and client success teams to operationalize deployments across on-premise VPC and Compliance as a Service (CaaS) environments while improving monitoring automation and incident response.

Position Purpose

The SRE role is responsible for ensuring the reliability and efficiency of Tookitakis production systems and environments. This includes building monitoring systems improving deployment pipelines automating routine operations and responding to production incidents. Youll help build a resilient infrastructure that supports our mission to provide AI-driven solutions that prevent financial crime.

Key Responsibilities

  1. System Monitoring & Incident Management

    • Build and maintain monitoring alerting and logging systems using tools like Prometheus Grafana and ELK.

    • Respond to incidents and outages conduct post-mortems and implement corrective actions.

  2. Infrastructure & Deployment Automation

    • Automate infrastructure provisioning and application deployment using Terraform Ansible or Helm.

    • Contribute to CI/CD pipelines improve reliability and speed of software delivery (GitLab CI Jenkins etc.).

  3. Container & Orchestration Management

    • Manage and troubleshoot Docker containers and Kubernetes clusters ensuring workload scaling resource management and health.

    • Support application updates rollbacks and blue-green or canary deployments.

  4. Cloud & Platform Operations

    • Operate within AWS (preferred) or GCP environments (EC2 S3 VPC IAM).

    • Monitor system availability and resource usage across environments.

  5. Security & Reliability Enhancements

    • Implement and monitor TLS/SSL RBAC SSO and secure API practices.

    • Support compliance and security audit activities by maintaining logs access controls and operational hygiene.

  6. Collaboration & Documentation

    • Work closely with developers infra engineers and support teams to ensure production readiness.

    • Maintain playbooks runbooks and system documentation for reliability engineering activities.

Qualifications and Skills

Education

  • Bachelors degree in Computer Science Engineering or related technical field.

Experience

  • 36 years in Site Reliability Engineering DevOps Platform Engineering or a related role.

  • Experience with production environments and live system debugging.

Technical Skills

  • Kubernetes Docker Helm experience deploying and scaling services.

  • Linux administration and command-line debugging.

  • Hands-on with AWS (preferred) or GCP cloud platforms.

  • Scripting in Bash and Python for automation and monitoring tasks.

  • Experience with monitoring and alerting tools like Prometheus Grafana ELK or Datadog.

  • Familiarity with databases (e.g. MariaDB ScyllaDB) and SQL/CQL querying.

Soft Skills

  • Strong problem-solving and debugging skills.

  • Ability to work in on-call rotations and high-pressure production environments.

  • Excellent communication and documentation abilities.

Key Competencies

  • Operational Reliability: Ensures system uptime and performance through proactive monitoring and maintenance.

  • Automation Mindset: Reduces manual effort through scripting and tooling.

  • Incident Response: Quick identification and resolution of issues to minimize downtime.

  • Cross-Functional Collaboration: Works effectively with engineering support and infra teams.

  • Security Awareness: Applies best practices in infrastructure and platform security.

Success Metrics

  • Maintain 99.9% uptime across production environments.

  • Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.

  • Increase in automation coverage and reduction in manual deployment steps.

  • High internal satisfaction from developers on CI/CD and platform reliability.

  • Compliance readiness and security log availability for audits.

Benefits

  • Competitive compensation

  • Work on a globally recognized RegTech platform transforming financial crime prevention.

Exposure to cutting-edge AI and big data infrastructure (Spark Kafka ScyllaDB Flink).

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.