Senior Site Reliability Engineer (SRE) & Support Lead

Banyan Software

Not Interested
Bookmark
Report This Job

profile Job Location:

Chennai - India

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

Banyan Software provides the best permanent home for successful enterprise software companies their employees and customers. We are on a mission to acquire build and grow great enterprise software businesses all over the world that have dominant positions in niche vertical recent years Banyan was named the #1 fastest-growing private software company in the US on the Inc. 5000 and amongst the top 10 fastest-growing companies by the Deloitte Technology Fast 500. Founded in 2016 with a permanent capital base setup to preserve the legacy of founders Banyan focuses on a buy and hold for life strategy for growing software companies that serve specialized vertical markets.

Senior Site Reliability Engineer (SRE) & Support Lead (Touchstream)

Location: Chennai India
Reports to: Head of Integrations
Role Type: Hands-on senior individual contributor with support leadership responsibilities

Company & Core Product Snapshot

Touchstream is the OTT Operations Hub: a cloud-native SaaS platform for independent end-to-end monitoring of streaming video systems (CDNs origin delivery chain). We serve some of the worlds largest broadcasters telco/OTT services and streaming platformsmonitoring tens of thousands of live streams in real time.

Touchstream now unifies its best selling CDN Monitoring and VirtualNOC into a single platform delivering:

  • Unified data & end-to-end visibility across the streaming workflow
  • Best-in-class incident intelligence and RCA tooling (including timestamped evidence packs)
  • Operating-model improvements via shared views collaboration AI MCP Servers and rich knowledge bases
  • Business value and ROI reporting for capacity optimization and performance insights

Role Summary

As Senior SRE Engineer & Support Lead you will own production health for Touchstreams customer-facing platform and data plane while also leading the global technical support function as part of your SRE responsibilities. Your mission is twofold:

  1. Reliability ownership: ensure high availability performance and change safety across the system (UI/API and ingest process & query pipelines) with strong SLO discipline and continuous improvement.

  2. Support leadership: run and evolve the support operationtriage escalation incident response coordination tooling and (over time) building a strong support team in Chennai to deliver world-class customer outcomes.

This is a highly impactful role at the intersection of SRE incident management observability engineering and customer-facing support.

Responsibilities:

1) Reliability Ownership (Primary)

  • Define and maintain SLOs error budgets and service health reporting.
  • Own availability and performance of:
    • Customer-facing system: UI/API
    • Data plane: ingest process & query pipelines
  • Drive capacity planning for live-event spikes load testing and scaling strategies.
  • Prevent recurring issues through high-quality RCAs and rigorous follow-through.

2) On-Call & Incident Management (Run the Room)

  • Build and evolve the on-call operating model: severity levels paging rules escalation paths comms templates.
  • Lead high-severity incidents end-to-end: triage mitigation rollback stop the bleeding decisions stakeholder comms.
  • Track MTTA/MTTR and implement systemic improvements over time.

3) Observability for the Observability Platform (Meta-Observability)

  • Own who watches the watchermonitoring and alerting for Touchstreams monitoring pipeline itself.
  • Standardize telemetry conventions (logs/metrics/traces) across services.
  • Build and maintain dashboards for:
    • ingest health (per customer / per source)
    • pipeline lag
    • query performance
    • alerting health
  • Tune alerting to reduce noise: dedupe routing symptom vs cause threshold hygiene.

4) Release Engineering & Change Safety (Bulletproof Change Management)

  • Implement guardrails: feature flags progressive delivery/canaries automated rollback triggers.
  • Maintain release readiness practices: migration checks backfills customer impact assessment capacity impacts.
  • Drive change metrics: deploy frequency change failure rate recovery time from deploys.

5) Cost & Efficiency Ownership (Cloud Economics)

  • Monitor and optimize cost per GB ingested/stored/queried.
  • Enforce retention policies tiering sampling and query limits without breaking customer value.
  • Make explicit capacity vs. cost tradeoffsespecially around large live events and heavy dashboards.

6) Security & Resilience Basics (Small-Team Practicality)

  • Baseline controls: access reviews secrets management least privilege dependency scanning.
  • Rate limiting / abuse guardrails audit logging security incident response readiness.
  • Backup/restore and lightweight-but-real disaster recovery drills.

7) Support Leadership & Operations (Explicitly Part of the Role)

  • Serve as the senior escalation point for critical customer issues and high-impact outages.
    Senior Technical Support Manage
  • Own the support operating model:
    • ticket triage prioritization SLAs escalation paths and shift handovers
    • runbooks playbooks FAQs and knowledge base (including formats suitable for AI-assisted support / RAG)
  • Establish and monitor support KPIs (SLA compliance backlog customer satisfaction MTTx) and implement process improvements.
    Senior Technical Support Manage
  • Partner with Engineering/Product/Integrations to turn support learnings into reliability fixes and product improvements.
  • Over time: help build mentor and lead a team of support/NOC engineers in Chennai.

8) Customer-Impact Focus (Tenant Health & Trust)

  • Maintain per-tenant customer health views: SLO compliance noisy sources top offenders recurring incident patterns.
  • Collaborate with Product on operator workflows: service health panels incident summaries status updates.

Required Qualifications & Skills

Technical / SRE Foundation

  • 8 years in SRE production operations technical support for SaaS or NOC/ops roles with strong reliability ownership.
  • Strong Linux fundamentals; comfort with debugging distributed systems.
  • Strong understanding of cloud infrastructure (AWS and/or GCP) and service operations.
  • Experience with monitoring/alerting/logging stacks incident management and RCA practices.
  • Ability to automate operational work (Python and/or shell scripting); comfort with APIs and CLI tooling.

Streaming / OTT Domain (Nice to Have)

  • Strong understanding of video streaming and delivery concepts: HLS DASH CMAF ABR CDNs origin HTTP caching DNS SSL/TLS. Familiarity with AWS Media Services is a big plus.

Support Leadership & Customer Communication

  • Proven ability to run escalations and communicate clearly in high-pressure incidents.
  • Experience designing support workflows SLAs escalation paths and operational KPIs.
  • Strong written and verbal English; confidence presenting incident status and RCAs to customers.

Working Style

  • Comfortable with flexible hours to support global customers (overlap with Europe/US time zones as needed).
  • Bias for action continuous improvement mindset and strong ownership.

Desired / Nice-to-Have

  • Prior experience supporting high-scale always-on streaming events and live operations.
  • Experience with progressive delivery canarying feature-flag platforms and release automation.
  • Familiarity with IT service management frameworks (e.g. ITIL).
  • Security operations exposure (secrets management vulnerability management audit logging).

What Youll Gain & Why Join

  • A senior high-ownership role shaping reliability support for a mission-critical observability platform in OTT streaming.
  • Direct impact on global broadcasters and streaming servicesimproving viewer experience at scale.
  • Opportunity to build the SRE/support operating model and grow the Chennai support function over time.
  • Collaboration with a globally distributed team across engineering integrations operations and product.

Diversity Equity Inclusion & Equal Employment Opportunity at Banyan: Banyan affirms that inequality is detrimental to our Global Teams associates our Operating Companies and the communities we serve. As a collective our goal is to impact lasting change through our actions. Together we unite for equality and equity. Banyan is committed to equal employment opportunities regardless of any protected characteristic including race color genetic information creed national origin religion sex affectional or sexual orientation gender identity or expression lawful alien status ancestry age marital status or protected veteran status and will not discriminate against anyone on the basis of a disability. We support an inclusive workplace where associates excel based on personal merit qualifications experience ability and job performance.

Beware of Recruitment Scams

We have been made aware of individuals fraudulently posing as members of our Talent Acquisition team and extending fake job offers. These scams may involve requests for personal information or payment for equipment.

Protect yourself by following these steps:

  • Verify that all communications from our recruiting team come from an @ email address.
  • Remember employers will never request payment or banking information during the hiring process.
  • If you receive a suspicious message do not respond instead forward it to and/or report it to the platform where you received it.

Your safety and security are important to us. Thank you for staying vigilant.


Required Experience:

Senior IC

Banyan Software provides the best permanent home for successful enterprise software companies their employees and customers. We are on a mission to acquire build and grow great enterprise software businesses all over the world that have dominant positions in niche vertical recent years Banyan was n...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Banyan Software acquires, builds, and grows great enterprise software business. We preserve the legacy of your business—for you, your team, and your customers.

View Profile View Profile