Site Reliability Engineer

Tookitaki Holding

Not Interested
Bookmark
Report This Job

profile Job Location:

Bangalore - India

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Position Overview

Job Title: Site Reliability Engineer (SRE)
Department: Technology
Location: Bangalore
Reporting To: Head of Infra

Tookitaki is looking for a Site Reliability Engineer (SRE) with 36 years of experience to help maintain and scale the infrastructure that powers our flagship productsFinCense and the AFC Ecosystem. As an SRE you will work at the intersection of software engineering and infrastructure ensuring high availability performance and scalability of our platforms.

You will collaborate with engineering DevOps and client success teams to operationalize deployments across on-premise VPC and Compliance as a Service (CaaS) environments while improving monitoring automation and incident response.

Position Purpose

The SRE role is responsible for ensuring the reliability and efficiency of Tookitakis production systems and environments. This includes building monitoring systems improving deployment pipelines automating routine operations and responding to production incidents. Youll help build a resilient infrastructure that supports our mission to provide AI-driven solutions that prevent financial crime.

Key Responsibilities

  1. System Monitoring & Incident Management

  • Build and maintain monitoring alerting and logging systems using tools like Prometheus Grafana and ELK.

  • Respond to incidents and outages conduct post-mortems and implement corrective actions.

  • Infrastructure & Deployment Automation

    • Automate infrastructure provisioning and application deployment using Terraform Ansible or Helm.

    • Contribute to CI/CD pipelines improve reliability and speed of software delivery (GitLab CI Jenkins etc.).

  • Container & Orchestration Management

    • Manage and troubleshoot Docker containers and Kubernetes clusters ensuring workload scaling resource management and health.

    • Support application updates rollbacks and blue-green or canary deployments.

  • Cloud & Platform Operations

    • Operate within AWS (preferred) or GCP environments (EC2 S3 VPC IAM).

    • Monitor system availability and resource usage across environments.

  • Security & Reliability Enhancements

    • Implement and monitor TLS/SSL RBAC SSO and secure API practices.

    • Support compliance and security audit activities by maintaining logs access controls and operational hygiene.

  • Collaboration & Documentation

    • Work closely with developers infra engineers and support teams to ensure production readiness.

    • Maintain playbooks runbooks and system documentation for reliability engineering activities.

    Qualifications and Skills

    Education

    • Bachelors degree in Computer Science Engineering or related technical field.

    Experience

    • 36 years in Site Reliability Engineering DevOps Platform Engineering or a related role.

    • Experience with production environments and live system debugging.

    Technical Skills

    • Kubernetes Docker Helm experience deploying and scaling services.

    • Linux administration and command-line debugging.

    • Hands-on with AWS (preferred) or GCP cloud platforms.

    • Scripting in Bash and Python for automation and monitoring tasks.

    • Experience with monitoring and alerting tools like Prometheus Grafana ELK or Datadog.

    • Familiarity with databases (e.g. MariaDB ScyllaDB) and SQL/CQL querying.

    Soft Skills

    • Strong problem-solving and debugging skills.

    • Ability to work in on-call rotations and high-pressure production environments.

    • Excellent communication and documentation abilities.

    Key Competencies

    • Operational Reliability: Ensures system uptime and performance through proactive monitoring and maintenance.

    • Automation Mindset: Reduces manual effort through scripting and tooling.

    • Incident Response: Quick identification and resolution of issues to minimize downtime.

    • Cross-Functional Collaboration: Works effectively with engineering support and infra teams.

    • Security Awareness: Applies best practices in infrastructure and platform security.

    Success Metrics

    • Maintain 99.9% uptime across production environments.

    • Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.

    • Increase in automation coverage and reduction in manual deployment steps.

    • High internal satisfaction from developers on CI/CD and platform reliability.

    • Compliance readiness and security log availability for audits.

    Benefits

    • Competitive compensation

    • Work on a globally recognized RegTech platform transforming financial crime prevention.

    Exposure to cutting-edge AI and big data infrastructure (Spark Kafka ScyllaDB Flink).

    Position OverviewJob Title: Site Reliability Engineer (SRE)Department: TechnologyLocation: BangaloreReporting To: Head of InfraTookitaki is looking for a Site Reliability Engineer (SRE) with 36 years of experience to help maintain and scale the infrastructure that powers our flagship productsFinCens...
    View more view more

    Key Skills

    • Kubernetes
    • FMEA
    • Continuous Improvement
    • Elasticsearch
    • Go
    • Root cause Analysis
    • Maximo
    • CMMS
    • Maintenance
    • Mechanical Engineering
    • Manufacturing
    • Troubleshooting

    About Company

    Company Logo

    Tookitaki’s FinCense platform combines AI with community‑driven intelligence to deliver real‑time AML, fraud detection, smart screening, and transaction monitoring with 90%+ accuracy—trusted by global banks, fintechs & e‑wallets.

    View Profile View Profile