Site Reliability Engineer

Bangalore - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Position Overview

Job Title: Site Reliability Engineer (SRE)
Department: Technology
Location: Bangalore
Reporting To: Head of Infra

Tookitaki is looking for a Site Reliability Engineer (SRE) with 36 years of experience to help maintain and scale the infrastructure that powers our flagship productsFinCense and the AFC Ecosystem. As an SRE you will work at the intersection of software engineering and infrastructure ensuring high availability performance and scalability of our platforms.

You will collaborate with engineering DevOps and client success teams to operationalize deployments across on-premise VPC and Compliance as a Service (CaaS) environments while improving monitoring automation and incident response.

Position Purpose

The SRE role is responsible for ensuring the reliability and efficiency of Tookitakis production systems and environments. This includes building monitoring systems improving deployment pipelines automating routine operations and responding to production incidents. Youll help build a resilient infrastructure that supports our mission to provide AI-driven solutions that prevent financial crime.

Key Responsibilities

System Monitoring & Incident Management

Build and maintain monitoring alerting and logging systems using tools like Prometheus Grafana and ELK.
Respond to incidents and outages conduct post-mortems and implement corrective actions.

Infrastructure & Deployment Automation

Automate infrastructure provisioning and application deployment using Terraform Ansible or Helm.
Contribute to CI/CD pipelines improve reliability and speed of software delivery (GitLab CI Jenkins etc.).

Container & Orchestration Management

Manage and troubleshoot Docker containers and Kubernetes clusters ensuring workload scaling resource management and health.
Support application updates rollbacks and blue-green or canary deployments.

Cloud & Platform Operations

Operate within AWS (preferred) or GCP environments (EC2 S3 VPC IAM).
Monitor system availability and resource usage across environments.

Security & Reliability Enhancements

Implement and monitor TLS/SSL RBAC SSO and secure API practices.
Support compliance and security audit activities by maintaining logs access controls and operational hygiene.

Collaboration & Documentation

Work closely with developers infra engineers and support teams to ensure production readiness.
Maintain playbooks runbooks and system documentation for reliability engineering activities.

Qualifications and Skills

Education

Bachelors degree in Computer Science Engineering or related technical field.

Experience

36 years in Site Reliability Engineering DevOps Platform Engineering or a related role.
Experience with production environments and live system debugging.

Technical Skills

Kubernetes Docker Helm experience deploying and scaling services.
Linux administration and command-line debugging.
Hands-on with AWS (preferred) or GCP cloud platforms.
Scripting in Bash and Python for automation and monitoring tasks.
Experience with monitoring and alerting tools like Prometheus Grafana ELK or Datadog.
Familiarity with databases (e.g. MariaDB ScyllaDB) and SQL/CQL querying.

Soft Skills

Strong problem-solving and debugging skills.
Ability to work in on-call rotations and high-pressure production environments.
Excellent communication and documentation abilities.

Key Competencies

Operational Reliability: Ensures system uptime and performance through proactive monitoring and maintenance.
Automation Mindset: Reduces manual effort through scripting and tooling.
Incident Response: Quick identification and resolution of issues to minimize downtime.
Cross-Functional Collaboration: Works effectively with engineering support and infra teams.
Security Awareness: Applies best practices in infrastructure and platform security.

Success Metrics

Maintain 99.9% uptime across production environments.
Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.
Increase in automation coverage and reduction in manual deployment steps.
High internal satisfaction from developers on CI/CD and platform reliability.
Compliance readiness and security log availability for audits.

Benefits

Competitive compensation
Work on a globally recognized RegTech platform transforming financial crime prevention.

Exposure to cutting-edge AI and big data infrastructure (Spark Kafka ScyllaDB Flink).

Position OverviewJob Title: Site Reliability Engineer (SRE)Department: TechnologyLocation: BangaloreReporting To: Head of InfraTookitaki is looking for a Site Reliability Engineer (SRE) with 36 years of experience to help maintain and scale the infrastructure that powers our flagship productsFinCens...

Position Overview

Job Title: Site Reliability Engineer (SRE)
Department: Technology
Location: Bangalore
Reporting To: Head of Infra

Position Purpose

Key Responsibilities

System Monitoring & Incident Management

Build and maintain monitoring alerting and logging systems using tools like Prometheus Grafana and ELK.
Respond to incidents and outages conduct post-mortems and implement corrective actions.

Infrastructure & Deployment Automation

Automate infrastructure provisioning and application deployment using Terraform Ansible or Helm.
Contribute to CI/CD pipelines improve reliability and speed of software delivery (GitLab CI Jenkins etc.).

Container & Orchestration Management

Manage and troubleshoot Docker containers and Kubernetes clusters ensuring workload scaling resource management and health.
Support application updates rollbacks and blue-green or canary deployments.

Cloud & Platform Operations

Operate within AWS (preferred) or GCP environments (EC2 S3 VPC IAM).
Monitor system availability and resource usage across environments.

Security & Reliability Enhancements

Implement and monitor TLS/SSL RBAC SSO and secure API practices.
Support compliance and security audit activities by maintaining logs access controls and operational hygiene.

Collaboration & Documentation

Work closely with developers infra engineers and support teams to ensure production readiness.
Maintain playbooks runbooks and system documentation for reliability engineering activities.

Qualifications and Skills

Education

Bachelors degree in Computer Science Engineering or related technical field.

Experience

36 years in Site Reliability Engineering DevOps Platform Engineering or a related role.
Experience with production environments and live system debugging.

Technical Skills

Kubernetes Docker Helm experience deploying and scaling services.
Linux administration and command-line debugging.
Hands-on with AWS (preferred) or GCP cloud platforms.
Scripting in Bash and Python for automation and monitoring tasks.
Experience with monitoring and alerting tools like Prometheus Grafana ELK or Datadog.
Familiarity with databases (e.g. MariaDB ScyllaDB) and SQL/CQL querying.

Soft Skills

Strong problem-solving and debugging skills.
Ability to work in on-call rotations and high-pressure production environments.
Excellent communication and documentation abilities.

Key Competencies

Operational Reliability: Ensures system uptime and performance through proactive monitoring and maintenance.
Automation Mindset: Reduces manual effort through scripting and tooling.
Incident Response: Quick identification and resolution of issues to minimize downtime.
Cross-Functional Collaboration: Works effectively with engineering support and infra teams.
Security Awareness: Applies best practices in infrastructure and platform security.

Success Metrics

Maintain 99.9% uptime across production environments.
Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.
Increase in automation coverage and reduction in manual deployment steps.
High internal satisfaction from developers on CI/CD and platform reliability.
Compliance readiness and security log availability for audits.

Benefits

Competitive compensation
Work on a globally recognized RegTech platform transforming financial crime prevention.

Exposure to cutting-edge AI and big data infrastructure (Spark Kafka ScyllaDB Flink).

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Tookitaki Holding

Tookitaki’s FinCense platform combines AI with community‑driven intelligence to deliver real‑time AML, fraud detection, smart screening, and transaction monitoring with 90%+ accuracy—trusted by global banks, fintechs & e‑wallets.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Site Reliability Engineer

Bangalore - India

Job Summary

Position Overview

Position Purpose

Key Responsibilities

Qualifications and Skills

Education

Experience

Technical Skills

Soft Skills

Key Competencies

Success Metrics

Benefits

Position Overview

Position Purpose

Key Responsibilities

Qualifications and Skills

Education

Experience

Technical Skills

Soft Skills

Key Competencies

Success Metrics

Benefits

Key Skills

About Company

Related Jobs