Sr. Data Engineer

Sparity

Not Interested
Bookmark
Report This Job

profile Job Location:

Dallas, IA - USA

profile Monthly Salary: Not Disclosed
Posted on: 8 days ago
Vacancies: 1 Vacancy

Job Summary

Job Title: Senior Data Engineer

Location: New York City or Plano TX (3 days onsite & 2 days remote)(open to relocation candidates)

Duration: 12 months; likely extensions

Notes:

Core Role Focus
  • Primarily a Data Engineer role
    • 80% Data Engineering
    • 20% ML exposure
  • ML is not the primary focus - strong Data Engineering fundamentals are mandatory.

Must Have Technical Skills

  • Python (pandas dataframes - data engineering use cases)
  • PySpark / Spark
  • Databricks
  • AWS ecosystem
    • S3
    • Core AWS services EMR Glue Lambda etc
  • Exposure to Java is a plus but not mandatory
  • Pipeline design & automation
    • High volume data processing
    • SCD (slowly changing dimensions)
  • Streaming & near real time data
    • Kafka (must understand consumption even if not hands on at Cap One)
    • APIs
    • Micro batching (with emphasis on high volume use cases)
  • Candidates with only micro batching experience and no exposure to large scale volume pipelines are not strong fits.

Job Description:

Overview

Seeking a strong hands on Data Engineer to join a fast moving Cybersecurity organization focused on threat detection correlation and automated remediation. This role is heavily data engineering focused (approximately 80% Data Engineering / 20% ML exposure) and requires deep fundamentals not surface level experience.

This team works with large scale high volume data pipelines that support near real time security analytics and GenAI driven tools used by Cyber Operations teams and executive leadership.

Key Responsibilities

  • Design build and maintain scalable data pipelines handling large volumes of structured and semi structured data
  • Develop and optimize pipelines using PySpark and Databricks
  • Implement data ingestion transformation and automation workflows in AWS
  • Work with real time and near real time data sources including Kafka and APIs
  • Design pipelines supporting high volume processing (beyond simple micro batching)
  • Apply best practices around:
    • Data quality
    • Performance optimization
    • Pipeline reliability and scalability
  • Collaborate with cybersecurity data science and platform teams to support:
    • Threat detection use cases
    • Log analysis and security telemetry
    • GenAI powered data products
  • Participate in technical and behavioral interviews including hands on discussions and screen sharing exercises

Required Qualifications

Job Title: Senior Data Engineer Location: New York City or Plano TX (3 days onsite & 2 days remote)(open to relocation candidates) Duration: 12 months; likely extensions Notes: Core Role Focus Primarily a Data Engineer role 80% Data Engineering 20% ML exposure ML is not the primary focus...
View more view more