System Administration
Job Summary
Key Responsibilities
RHEL 789
Redhat Ansible Automation
Veritas Volume Manager and Cluster
SAN based replication
OS Patching
OS Infrastructure & Troubleshooting skills
System Administration & Server Management
Administer and support Linux/Unix servers across all environments (Dev Test Stage Prod DR).
Perform server provisioning and lifecycle management following approved standards.
Execute configuration changes within defined governance models.
Patching & Maintenance
Execute and support monthly patching for 300 servers.
Utilize Ansible Automation Platform for patch orchestration validation and compliance tracking.
Troubleshoot patch failures and coordinate remediation or rollback actions.
Security & Compliance
Collaborate with InfoSec and onshore teams to:
o Remediate identified vulnerabilities
o Implement CIS baselines and security controls
o Support enterprise security initiatives (e.g. CrowdStrike CyberArk)
Execute remediation actions and validate closure.
Operational Support & Incident Management
Provide Level 3 incident support including complex troubleshooting.
Participate in after-hours incident and change support.
Execute approved changes during maintenance windows.
Ensure daily operational tasks and system health checks are completed.
User Performance & Storage Management
Administer user accounts groups and access controls.
Monitor and optimize:
o CPU
o Memory
o Disk I/O
o File systems and partitions
Support storage upgrades and capacity management.
Key Responsibilities
Platform Architecture & Engineering
Own Unix/Linux platform architecture across production and disaster recovery environments.
Define and maintain:
o Build standards
o Configuration baselines
o Security hardening patterns
o DR and replication architecture
Lead platform design decisions and roadmap enhancements.
Advanced Troubleshooting & Root Cause Analysis
Serve as final escalation point for critical or systemic incidents.
Perform in-depth root cause analysis (RCA) covering:
o Kernel issues
o Performance degradation
o Cluster failures
o Storage and replication faults
Drive permanent fixes and preventive measures.
Clustering Storage & Replication
Design implement and support:
o Red Hat and Veritas clustering
o SAN-based replication solutions
o High availability and failover architectures
Validate resilience through testing and DR exercises.
Security Strategy & Compliance
Define Unix/Linux security hardening strategy aligned to CIS benchmarks.
Partner with InfoSec on vulnerability management frameworks.
Review and approve remediation approaches for complex security findings.
Automation Strategy & Tooling
Define enterprise-wide automation strategy using Ansible and scripting platforms.
Review and approve automation frameworks developed by lower tiers.
Drive efficiency and standardization across Unix/Linux operations.
Leadership Governance & Mentorship
Provide technical leadership and mentoring for L2/L3 engineers.
Review and approve changes impacting platform stability.
Participate in governance forums audits and architecture reviews.
Advise stakeholders on Unix/Linux risks capacity and modernization initiatives.
Skills & Experience
8 12 years of deep Unix/Linux engineering experience.
Act as a offshore lead and take responsibility in managing the offshore resources
Expert-level knowledge of:
o Red Hat Linux internals
o Veritas Volume Manager and clustering
o SAN architectures and replication
Proven experience leading platform architecture in large enterprises.
Strong communication and leadership skills.
On-Call Requirement
Must be open to senior-level escalation support including participation during major incidents or DR events.