Senior Reliability Engineer Customer Data Platform

TekWissen LLC


Job Location:

Atlanta, GA - USA

Monthly Salary: Not Disclosed
Posted on: 6 days ago
Vacancies: 1 Vacancy

Job Summary

Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position:Senior Reliability Engineer - Customer Data Platform
Location: Atlanta GA
Duration: 6 Months
Job Type: Temporary Assignment
Work Type:Onsite
JOB SUMMARY
  • We are seeking a Senior Reliability Engineer to own production excellence for our Customer Data Platform (CDP) the authoritative source of truth for customer data across the entire US adult population.
  • An authoritative platform is only authoritative if it is available secure and timely. This role ensures exactly that: high availability operational resilience and compliance for the critical data systems that power customer experiences across every touchpoint.
  • You will lead 24x7 production support incident management platform governance and security compliance ensuring CDP remains the trusted foundation the business depends on.
  • You will act as the bridge between engineering platform security and compliance teams driving the operational discipline that keeps CDP resilient secure and audit-ready at all times.
Job Responsibilities :
  • KTLO Leadership and Production Support
  • Lead KTLO operations including 24x7 monitoring incident management and on-call processes understanding that CDP downtime directly impacts customer experiences and business decisions
  • Oversee production support for data pipelines APIs and platform services across Azure and Databricks ecosystems
  • Manage job orchestration and monitoring (e.g. Control-M) ensuring SLA adherence and timely resolution - because timeliness is a core promise of the authoritative source of truth
  • Establish and enforce runbooks SOPs and escalation procedures tailored to CDPs criticality
  • Drive root cause analysis (RCA) and implement preventive measures to reduce recurring issues and protect data trust.
  • Reliability Engineering and Operations
  • Improve system reliability through automation observability proactive monitoring and near-real-time availability targets
  • Define and track SLAs SLIs and SLOs for critical CDP systems with metrics aligned to data freshness accuracy and availability commitments
  • Partner with engineering teams to implement resiliency patterns failover strategies and capacity planning for population-scale data processing
  • Identify and eliminate operational bottlenecks and manual processes that threaten CDPs reliability and timeliness
  • Compliance Security and Governance
  • Lead execution of compliance mandates audits and regulatory requirements impacting CDP systems - ensuring the platform that holds data for the entire US adult population meets the highest security standards
  • Manage and remediate security violations vulnerabilities and policy breaches with urgency
  • Oversee access controls audit readiness and governance processes in collaboration with security teams - protecting the trust that makes CDP authoritative
  • Ensure adherence to data protection and privacy standards across all customer data systems
  • Platform Maintenance and Operational Hygiene
  • Manage patching upgrades and vulnerability remediation across CDP platforms
  • Lead password and credential rotation processes across systems and integrations
  • Ensure operational readiness for infrastructure and platform changes with zero-downtime deployment practices
  • Coordinate with vendors and platform teams for issue resolution and maintenance activities
  • Collaboration and Leadership
  • Lead and coordinate onshore/offshore support teams ensuring effective coverage and handoffs for 24x7 operations
  • Partner with Data Engineering AI/ML and Platform teams to ensure operability and supportability of all CDP systems
  • Provide operational readiness reviews for new deployments and features before they enter production
  • Mentor team members and drive a culture of accountability ownership and continuous improvement
Education and Work Experience:
  • Bachelors degree in Computer Science Engineering or related field
  • 6 years of experience in production support SRE or platform operations roles
  • Proven experience managing 24x7 support models and distributed teams
  • Experience supporting large-scale data platforms in cloud environments (Azure preferred)
  • Experience with security compliance and audit processes for systems handling sensitive customer data
Technical Skills:
  • Strong experience with Azure ecosystem (ADLS Databricks ADF Event Hub etc.)
  • Experience with job orchestration tools (Control-M or similar)
  • Solid understanding of data pipelines ETL/ELT processes and distributed systems at scale
  • Experience with monitoring and observability tools (e.g. Azure Monitor Log Analytics Splunk Prometheus)
  • Familiarity with incident management tools and processes (PagerDuty ServiceNow etc.)
  • Experience with CI/CD pipelines and release management
  • Knowledge of security practices access control encryption and compliance frameworks relevant to customer data
  • Scripting experience (Python Shell) for automation and operational tooling
Knowledge Skills and Abilities:
  • Strong operational mindset with unwavering focus on stability reliability and uptime for a platform the entire business depends on
  • Ability to manage high-pressure production incidents and drive resolution with urgency and precision
  • Deep understanding of why platform reliability and security are foundational to CDPs authority as the source of truth
  • Strong problem-solving and root cause analysis skills
  • Excellent coordination and communication across engineering security and business teams
  • Ability to balance short-term fixes with long-term reliability improvements
  • Leadership skills in managing global support teams and rotations.
TekWissen Group is an equal opportunity employer supporting workforce diversity.
Overview: TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services Position:Senior Re...