Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailEssential Functions:
Design build and manage Big Data and Kafka infrastructure on private Cloud AWS GCP and Azure.
Manage and optimize Apache Big Data and Kafka clusters for high performance reliability and scalability.
Develop tools and processes to monitor and analyze system performance and to identify potential issues.
Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the Big data cloud platforms.
Ensure security and compliance of the platforms within organizational guidelines.
Other responsibilities include effective root cause analysis of major production incidents and the development of learning documentation. The person will identify and implement highavailability solutions for services with a single point of failure.
The role involves planning and performing capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs. This includes automating repetitive tasks to reduce manual effort and prevent human errors.
The successful candidate will tune alerting and set up observability to proactively identify issues and performance problems. They will also work closely with Level 3 teams in reviewing new use cases and cluster hardening techniques to build robust and reliable platforms.
The role involves creating standard operating procedure documents and guidelines on effectively managing and utilizing the platforms. The person will leverage DevOps tools disciplines (Incident problem and change management) and standards in daytoday operations.
The individual will ensure that the platforms can effectively meet performance and service level agreement requirements. They will also perform security remediation automation and selfhealing as per the requirement.
The individual will concentrate on developing automations and reports to minimize manual effort. This can be achieved through various automation tools such as Shell scripting Ansible or Python scripting or by using any other programming language.
Team Leadership:
Lead and mentor a team of SRE engineers providing strategic and technical guidance and support.
Foster a culture of continuous improvement innovation and operational excellence.
Develop and implement professional development programs and succession planning for the team.
Technical Leadership
Provide technical leadership and oversight to engineers
Establish SRE best practices
Ensure engineering and operational excellence (quality security performance scalability availability resilience).
Collaboration & Strategy
Collaborate with Product Office Operations & Infrastructure Cybersecurity Client Support and other Product Development teams.
Drive the coordination organization and execution of qualitative and quantitative decisions.
The Skills You Bring:
Energy and Experience: A growth mindset that is curious and passionate about technologies and enjoys challenging projects on a global scale
Challenge the Status Quo: Comfort in pushing the boundaries hacking beyond traditional solutions
Language Expertise: Expertise in one or more general development languages (e.g. Python Java )
Learner: Constant drive to learn new technologies
This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.
Qualifications :
Basic Qualifications
o 8 years of relevant work experience and a Bachelors degree OR 11 years of
relevant work experience
Preferred Qualifications
9 or more years of relevant work experience with a Bachelors degree or 7 or
more relevant years of experience with an Advanced Degree (e.g. Masters
MBA JD MD) or 3 or more years of experience with a PhD
o Experience with managing and optimizing Big Data and Kafka clusters.
o Proficient in scripting languages (Python Bash) and SQL.
o Familiarity with big data tools (Big Data Spark Kafka etc.) and frameworks (HDFS MapReduce etc.).
o Strong knowledge in system architecture and design patterns for highperformance computing.
o Good understanding of data security and privacy concerns.
o Excellent problemsolving and troubleshooting skills.
o Observability: knowledge on observability tools like Grafana opera and Splunk.
o Linux: understanding of Linux networking CPU memory and storage.
o Programming Languages: Knowledge of and ability to code or program in one of Java python or a widely used coding language.
o Communication: Excellent interpersonal skills along with superior verbal and written communication abilities.
o Demonstrated experience with AWS and GCP cloud platforms
o Superior verbal written & interpersonal communication skills with both technical & nontechnical audiences
o Excellent team player with strong collaboration skills and ability to influence crossfunctional team for results
Additional Information :
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
Remote Work :
No
Employment Type :
Fulltime
Full-time