Key Responsibilities:
1. Cluster Management:
Install configure and maintain opensource Hadoop clusters including HDFS YARN and related ecosystem components (e.g. Hive HBase Pig Spark).
Ensure high availability and reliability of the Hadoop cluster.
2. Performance Tuning:
Monitor cluster performance and resource usage.
Implement performance tuning and optimization strategies for Hadoop jobs and clusters.
3. Security Administration:
Implement and manage security measures including Kerberos authentication user and group permissions and data encryption.
Ensure compliance with security policies and procedures.
4. Data Management:
Manage data ingestion transformation and storage processes.
Implement data retention policies and ensure efficient storage utilization.
5. Backup and Recovery:
Develop and maintain backup and recovery processes for Hadoop data and configurations.
Ensure data integrity and availability through regular backups and disaster recovery planning.
6. System Upgrades and Patch Management:
Plan and execute Hadoop software upgrades and patches.
Test and validate new releases and updates in a controlled environment before deployment.
7. Monitoring and Troubleshooting:
Implement monitoring tools and scripts to track cluster health and performance.
Diagnose and resolve issues related to Hadoop clusters and ecosystem components.
8. Automation and Scripting:
Develop automation scripts for routine tasks using tools like Ansible Puppet or custom scripts.
Enhance operational efficiency through automation of repetitive tasks.
performance tuning,hbase,hadoop,spark,administration,yarn,kerberos,hive,monitoring,ansible,pig,hdfs,scripting,backup and recovery,security administration,data management,puppet