Data Engineer (PythonPySparkApache Spark)
Ashburn, IL - USA
Job Summary
About Infinitive
Infinitive is a data and AI consultancy that enables its clients to modernize monetize and operationalize their data to create lasting and substantial value. We possess deep industry and technology expertise to drive and sustain adoption of new capabilities. We match our people and personalities to our clients culture while bringing the right mix of talent and skills to enable high return on investment.
Infinitive has been named Best Small Firms to Work For by Consulting Magazine 8 times most recently in 2025. Infinitive has also been named a Washington Post Top Workplace Washington Business Journal Best Places to Work and Virginia Business Best Places to Work.
Role Overview
We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer you will play a crucial role in designing developing and maintaining our clients data infrastructure. Your expertise in Apache Spark Python PySpark ETL processes CI/CD (Jenkins or GitHub) and experience with both streaming and batch workflows will be essential in ensuring the efficient flow and processing of data to support our clients.
Responsibilities
Data Architecture and Design: Collaborate with cross-functional teams to understand data requirements and design robust data architecture solutions. Develop data models and schema designs to optimize data storage and retrieval.
ETL Development: Implement robust ETL processes to extract transform and load data from various sources. Ensure data quality integrity and consistency throughout the ETL pipeline.
Distributed Computing & Spark Development: Utilize your expertise in Apache Spark Python and PySpark to develop efficient large-scale data processing and analysis scripts. Optimize code for performance memory management and scalability keeping up-to-date with the latest industry best practices.
Data Integration: Integrate data from different systems and sources to provide a unified view for analytical purposes. Collaborate with data scientists and analysts to implement solutions that meet their data integration needs.
Streaming and Batch Workflows: Design and implement streaming workflows using PySpark Streaming or other relevant technologies. Develop batch processing workflows for large-scale data processing and analysis.
CI/CD Implementation: Implement and maintain continuous integration and continuous deployment (CI/CD) pipelines using Jenkins or GitHub Actions. Automate testing code deployment and monitoring processes to ensure the reliability of data pipelines.
Qualifications
Bachelors or Masters degree in Computer Science Information Technology or a related field.
Proven experience as a Data Engineer or similar role.
Strong programming skills in Python and deep expertise in Apache Spark and PySpark for both batch and streaming data processing.
Hands-on experience developing tuning and troubleshooting distributed data pipelines.
Solid understanding of ETL tools data modeling database design and data warehousing concepts.
Familiarity with CI/CD tools such as Jenkins or GitHub Actions.
Excellent problem-solving analytical communication and collaboration skills.
Preferred Skills
Experience with Ab Initio (e.g. GDE Co-Operating System EME) or a strong background in enterprise ETL modernization.
Knowledge of cloud platforms such as AWS Azure or Google Cloud.
Experience with version control systems (e.g. Git).
Familiarity with containerization and orchestration tools (e.g. Docker Kubernetes).
Understanding of data security and privacy best practices.
Infinitive is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex sexual orientation gender identity national origin disability protected veteran status or any other characteristic protected by applicable federal state or local law.
Required Experience:
Junior IC
About Company
Get the value out of your data with Infinitive, the leaders in data analytics and cloud development.