AIML Lead Data Engineer AutomationImage Processing

JPMorganChase

Job Location:

Tampa, FL - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Description

Join us as we embark on a journey of collaboration and innovation where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.

As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank you are an integral part of an agile team that works to enhance build and deliver data collection storage access and analytics solutions in a secure stable and scalable way. As a core technical contributor you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firms business objectives.

Job responsibilities

Design build and maintain scalable high-performance data pipelines and infrastructure to support ingestion processing and storage of large volumes of scanned document images across enterprise-wide workflows
Architect end-to-end data solutions on AWS cloud services to enable seamless flow of scanned images from source systems through OCR processing model inference and downstream data extraction and categorization pipelines
Develop robust image preprocessing and OCR integration pipelines that handle TIF/PNG format conversion normalization resolution enhancement noise reduction and batching to prepare scanned documents for downstream computer vision and OCR models
Build and optimize data pipelines that integrate OCR engine outputs extracting structured text and metadata from scanned images and routing them into databases and analytics platforms for further processing
Design and manage data storage architectures and containerized deployments using Oracle databases and AWS-native stores (S3 EFS) to efficiently catalog index and retrieve extracted text classification labels and metadata from processed document images
Drive the adoption of containerized deployment strategies using AWS EKS (Elastic Kubernetes Service) to deploy and scale image processing microservices OCR engines and data pipeline components with high availability and fault tolerance
Collaborate closely with data scientists and ML engineers to ensure training datasets for different models and other computer vision models are properly curated versioned labeled and accessible through well-structured data pipelines
Evaluate and integrate emerging data technologies and tools to continuously improve pipeline throughput reduce processing latency for high-volume document scanning workloads and optimize cost efficiency
Establish and enforce data quality lineage governance and security frameworks to ensure traceability and integrity of extracted data from scanned documents throughout the entire processing lifecycle
Partner with security and compliance teams to ensure that scanned document data extracted PII/PHI and sensitive content are handled in accordance with regulatory requirements encryption standards and access controls
Lead and mentor a team of data engineers establishing coding standards peer review processes CI/CD workflows and best practices for building production-grade image and document processing pipelines

Required qualifications capabilities and skills

Formal training or certification on Data Engineering concepts and 5 years applied experience
Strong proficiency in Java Groovy and Python for building data pipelines image preprocessing workflows automation scripts and backend data services
Hands-on experience with image file handling particularly TIF/PNG format processing multi-page document splitting format conversion and integration with OCR and computer vision pipelines
Deep hands-on experience with AWS cloud services including S3 (for image storage) Lambda Step Functions and CloudWatch for building and monitoring scalable data workflows
Expertise in AWS EKS (Elastic Kubernetes Service) for deploying and managing containerized image processing OCR and data pipeline services using Docker and Kubernetes
Advanced knowledge of Oracle databases including PL/SQL performance tuning partitioning strategies and data modeling for storing and querying large volumes of extracted document data and classification results
Familiarity with OCR technologies and the ability to build data pipelines that consume and structure OCR output for downstream analytics and model training
Understanding of data requirements for training deep learning models including dataset preparation annotation management and feature store integration
Experience with CI/CD pipelines (Jenkins) and infrastructure-as-code tools (Terraform CloudFormation) for automated deployment and environment management
Strong understanding of data governance data quality frameworks metadata management and data cataloging particularly in the context of document-centric and image-heavy data ecosystems
Excellent leadership communication and stakeholder management skills with the ability to drive technical decisions across cross-functional teams

Preferred qualifications capabilities and skills

Domain expertise in the healthcare industry

Required Experience:

DescriptionJoin us as we embark on a journey of collaboration and innovation where your unique skills and talents will be valued and celebrated. Together we will create a brighter future and make a meaningful difference.As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank...

Description

Job responsibilities

Design build and maintain scalable high-performance data pipelines and infrastructure to support ingestion processing and storage of large volumes of scanned document images across enterprise-wide workflows
Architect end-to-end data solutions on AWS cloud services to enable seamless flow of scanned images from source systems through OCR processing model inference and downstream data extraction and categorization pipelines
Develop robust image preprocessing and OCR integration pipelines that handle TIF/PNG format conversion normalization resolution enhancement noise reduction and batching to prepare scanned documents for downstream computer vision and OCR models
Build and optimize data pipelines that integrate OCR engine outputs extracting structured text and metadata from scanned images and routing them into databases and analytics platforms for further processing
Design and manage data storage architectures and containerized deployments using Oracle databases and AWS-native stores (S3 EFS) to efficiently catalog index and retrieve extracted text classification labels and metadata from processed document images
Drive the adoption of containerized deployment strategies using AWS EKS (Elastic Kubernetes Service) to deploy and scale image processing microservices OCR engines and data pipeline components with high availability and fault tolerance
Collaborate closely with data scientists and ML engineers to ensure training datasets for different models and other computer vision models are properly curated versioned labeled and accessible through well-structured data pipelines
Evaluate and integrate emerging data technologies and tools to continuously improve pipeline throughput reduce processing latency for high-volume document scanning workloads and optimize cost efficiency
Establish and enforce data quality lineage governance and security frameworks to ensure traceability and integrity of extracted data from scanned documents throughout the entire processing lifecycle
Partner with security and compliance teams to ensure that scanned document data extracted PII/PHI and sensitive content are handled in accordance with regulatory requirements encryption standards and access controls
Lead and mentor a team of data engineers establishing coding standards peer review processes CI/CD workflows and best practices for building production-grade image and document processing pipelines

Required qualifications capabilities and skills

Formal training or certification on Data Engineering concepts and 5 years applied experience
Strong proficiency in Java Groovy and Python for building data pipelines image preprocessing workflows automation scripts and backend data services
Hands-on experience with image file handling particularly TIF/PNG format processing multi-page document splitting format conversion and integration with OCR and computer vision pipelines
Deep hands-on experience with AWS cloud services including S3 (for image storage) Lambda Step Functions and CloudWatch for building and monitoring scalable data workflows
Expertise in AWS EKS (Elastic Kubernetes Service) for deploying and managing containerized image processing OCR and data pipeline services using Docker and Kubernetes
Advanced knowledge of Oracle databases including PL/SQL performance tuning partitioning strategies and data modeling for storing and querying large volumes of extracted document data and classification results
Familiarity with OCR technologies and the ability to build data pipelines that consume and structure OCR output for downstream analytics and model training
Understanding of data requirements for training deep learning models including dataset preparation annotation management and feature store integration
Experience with CI/CD pipelines (Jenkins) and infrastructure-as-code tools (Terraform CloudFormation) for automated deployment and environment management
Strong understanding of data governance data quality frameworks metadata management and data cataloging particularly in the context of document-centric and image-heavy data ecosystems
Excellent leadership communication and stakeholder management skills with the ability to drive technical decisions across cross-functional teams

Preferred qualifications capabilities and skills

Domain expertise in the healthcare industry

Required Experience:

Apply Now

About Company

JPMorganChase

JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans ov ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click