DescriptionElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Senior Lead Software Engineer - SRE at JPMorgan Chase within the AIML Data Platforms and Chief Data and Analytics Team you will develop and deliver advanced technology products focused on data and analytics. Tackle complex cloud data platform challenges especially around Datalakes this role you will work in an agile environment collaborating with cross-functional teams.
Job responsibilities
- Designs implements and maintains a managed AWS and Data platform and provides engineering and operational support for the platform to SRE and app teams.
- Performs platform design set-up and configuration workspace administration resource monitoring providing engineering support to data engineering teams Data Science/ML and Application/integration teams.
- Leads evaluation sessions with external vendors startups and internal teams to drive outcomes-oriented probing of architectural designs technical credentials and applicability for use within existing systems and information architecture.
- Drives continuous improvement in system observability alerting and capacity planning.
- Collaborates with engineering and data teams to optimize infrastructure and deployment processes focusing on automation and operational excellence.
- Executes creative software solutions design development and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems.
- Develops secure high-quality production code and reviews and debugs code written by others.
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems.
- Adds to team culture of diversity opportunity and respect.
- Implements Site Reliability Engineering (SRE) best practices to ensure reliability scalability and performance of data platforms.
- Develops and maintains incident response procedures including root cause analysis and postmortem documentation.
Required qualifications capabilities and skills
- Formal training or certification on software engineering concepts and applied experience.
- Strong understanding of SRE principles including SLIs SLOs error budgets and incident management.
- Experience with monitoring tools automation frameworks and CI/CD pipelines.
- Proficient in Python application program development with use of automated unit testing.
- Experience with terraform development and understanding of terraform enterprise.
- Experience in delivering system design application development testing and operational stability.
- Knowledge of Big Data distributed compute frameworks like Spark Glue MapReduce etc.
- Excellent troubleshooting analytical and communication skills.
Preferred qualifications capabilities and skills
- Experience in Data pipelines using Spark.
- Exposure to AWS & Databricks Platform administration.
- Knowledge of containerization (Docker Kubernetes) and orchestration.
- Familiarity with distributed systems and large-scale data processing.
Required Experience:
Senior IC
DescriptionElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.As a Senior Lead Software Engineer - SRE at JPMorgan Chase within the AIML Data Platforms and Chief Data and Ana...
DescriptionElevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Senior Lead Software Engineer - SRE at JPMorgan Chase within the AIML Data Platforms and Chief Data and Analytics Team you will develop and deliver advanced technology products focused on data and analytics. Tackle complex cloud data platform challenges especially around Datalakes this role you will work in an agile environment collaborating with cross-functional teams.
Job responsibilities
- Designs implements and maintains a managed AWS and Data platform and provides engineering and operational support for the platform to SRE and app teams.
- Performs platform design set-up and configuration workspace administration resource monitoring providing engineering support to data engineering teams Data Science/ML and Application/integration teams.
- Leads evaluation sessions with external vendors startups and internal teams to drive outcomes-oriented probing of architectural designs technical credentials and applicability for use within existing systems and information architecture.
- Drives continuous improvement in system observability alerting and capacity planning.
- Collaborates with engineering and data teams to optimize infrastructure and deployment processes focusing on automation and operational excellence.
- Executes creative software solutions design development and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems.
- Develops secure high-quality production code and reviews and debugs code written by others.
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems.
- Adds to team culture of diversity opportunity and respect.
- Implements Site Reliability Engineering (SRE) best practices to ensure reliability scalability and performance of data platforms.
- Develops and maintains incident response procedures including root cause analysis and postmortem documentation.
Required qualifications capabilities and skills
- Formal training or certification on software engineering concepts and applied experience.
- Strong understanding of SRE principles including SLIs SLOs error budgets and incident management.
- Experience with monitoring tools automation frameworks and CI/CD pipelines.
- Proficient in Python application program development with use of automated unit testing.
- Experience with terraform development and understanding of terraform enterprise.
- Experience in delivering system design application development testing and operational stability.
- Knowledge of Big Data distributed compute frameworks like Spark Glue MapReduce etc.
- Excellent troubleshooting analytical and communication skills.
Preferred qualifications capabilities and skills
- Experience in Data pipelines using Spark.
- Exposure to AWS & Databricks Platform administration.
- Knowledge of containerization (Docker Kubernetes) and orchestration.
- Familiarity with distributed systems and large-scale data processing.
Required Experience:
Senior IC
View more
View less