Job Title: Senior Data Engineer
Location: New York City or Plano TX (3 days onsite & 2 days remote)(open to relocation candidates)
Duration: 12 months; likely extensions
Notes:
Core Role Focus
- Primarily a Data Engineer role
- 80% Data Engineering
- 20% ML exposure
- ML is not the primary focus - strong Data Engineering fundamentals are mandatory.
Must Have Technical Skills
- Python (pandas dataframes - data engineering use cases)
- PySpark / Spark
- Databricks
- AWS ecosystem
- S3
- Core AWS services EMR Glue Lambda etc
- Exposure to Java is a plus but not mandatory
- Pipeline design & automation
- High volume data processing
- SCD (slowly changing dimensions)
- Streaming & near real time data
- Kafka (must understand consumption even if not hands on at Cap One)
- APIs
- Micro batching (with emphasis on high volume use cases)
- Candidates with only micro batching experience and no exposure to large scale volume pipelines are not strong fits.
Job Description:
Overview
Seeking a strong hands on Data Engineer to join a fast moving Cybersecurity organization focused on threat detection correlation and automated remediation. This role is heavily data engineering focused (approximately 80% Data Engineering / 20% ML exposure) and requires deep fundamentals not surface level experience.
This team works with large scale high volume data pipelines that support near real time security analytics and GenAI driven tools used by Cyber Operations teams and executive leadership.
Key Responsibilities
- Design build and maintain scalable data pipelines handling large volumes of structured and semi structured data
- Develop and optimize pipelines using PySpark and Databricks
- Implement data ingestion transformation and automation workflows in AWS
- Work with real time and near real time data sources including Kafka and APIs
- Design pipelines supporting high volume processing (beyond simple micro batching)
- Apply best practices around:
- Data quality
- Performance optimization
- Pipeline reliability and scalability
- Collaborate with cybersecurity data science and platform teams to support:
- Threat detection use cases
- Log analysis and security telemetry
- GenAI powered data products
- Participate in technical and behavioral interviews including hands on discussions and screen sharing exercises
Required Qualifications
Job Title: Senior Data Engineer Location: New York City or Plano TX (3 days onsite & 2 days remote)(open to relocation candidates) Duration: 12 months; likely extensions Notes: Core Role Focus Primarily a Data Engineer role 80% Data Engineering 20% ML exposure ML is not the primary focus...
Job Title: Senior Data Engineer
Location: New York City or Plano TX (3 days onsite & 2 days remote)(open to relocation candidates)
Duration: 12 months; likely extensions
Notes:
Core Role Focus
- Primarily a Data Engineer role
- 80% Data Engineering
- 20% ML exposure
- ML is not the primary focus - strong Data Engineering fundamentals are mandatory.
Must Have Technical Skills
- Python (pandas dataframes - data engineering use cases)
- PySpark / Spark
- Databricks
- AWS ecosystem
- S3
- Core AWS services EMR Glue Lambda etc
- Exposure to Java is a plus but not mandatory
- Pipeline design & automation
- High volume data processing
- SCD (slowly changing dimensions)
- Streaming & near real time data
- Kafka (must understand consumption even if not hands on at Cap One)
- APIs
- Micro batching (with emphasis on high volume use cases)
- Candidates with only micro batching experience and no exposure to large scale volume pipelines are not strong fits.
Job Description:
Overview
Seeking a strong hands on Data Engineer to join a fast moving Cybersecurity organization focused on threat detection correlation and automated remediation. This role is heavily data engineering focused (approximately 80% Data Engineering / 20% ML exposure) and requires deep fundamentals not surface level experience.
This team works with large scale high volume data pipelines that support near real time security analytics and GenAI driven tools used by Cyber Operations teams and executive leadership.
Key Responsibilities
- Design build and maintain scalable data pipelines handling large volumes of structured and semi structured data
- Develop and optimize pipelines using PySpark and Databricks
- Implement data ingestion transformation and automation workflows in AWS
- Work with real time and near real time data sources including Kafka and APIs
- Design pipelines supporting high volume processing (beyond simple micro batching)
- Apply best practices around:
- Data quality
- Performance optimization
- Pipeline reliability and scalability
- Collaborate with cybersecurity data science and platform teams to support:
- Threat detection use cases
- Log analysis and security telemetry
- GenAI powered data products
- Participate in technical and behavioral interviews including hands on discussions and screen sharing exercises
Required Qualifications
View more
View less