Data Engineer
Job Location:
Dallas, TX - USA
Monthly Salary:
Not Disclosed
Posted on:
4 hours ago
Vacancies:
1 Vacancy
Job Summary
Position : Data Engineer
Location : Dallas TX / Pittsburgh PA / Cleveland OH (Onsite)
Term : C2C/W2 role
Duration : Long Term
Job Description :
Data Engineer with 5 years of experience in designing developing and maintaining scalable data pipelines supporting analytics reporting and operational platforms. The ideal candidate will have strong expertise in Spark PySpark Airflow SQL Data Lakes and large-scale batch processing environments.
Responsibilities
- Design and build scalable data pipelines aligned with business requirements.
- Process large datasets using batch and near real-time processing frameworks.
- Ensure data quality consistency governance and reliability across systems.
- Support data integration and transformation initiatives for analytics and reporting platforms.
- Maintain metadata repositories data dictionaries and technical documentation.
- Participate in data architecture reviews and data model validation activities.
- Support analytics reporting and risk management platforms.
- Collaborate with cross-functional teams to align enterprise data solutions with business objectives.
Required Qualifications
- 5 years of experience in Data Engineering and Big Data processing.
- Strong expertise in:
- Apache Spark (Spark Core Spark SQL)
- PySpark
- Large-scale batch processing
- Experience working with structured and semi-structured data complex transformations and performance tuning.
- Hands-on experience with data ingestion and integration from:
- Oracle
- SQL Server
- Hive
- HDFS
- Amazon S3
- Experience building and maintaining curated data models.
- Experience writing data to:
- Hive Tables
- Data Lakes (Iceberg)
- Downstream reporting systems
- Strong SQL and data modeling skills.
- Hands-on experience with Apache Airflow:
- DAG Development
- Scheduling
- Monitoring
- Workflow Orchestration
- Proficiency in Shell Scripting:
- Job Automation
- File Validation
- Dependency Management
- Logging & Error Handling
- Spark Job Execution
- Data Archival & Purging
- Strong understanding of batch processing and scheduling frameworks.
- Experience migrating job schedules from:
- CA7
- Control-M
- Airflow
- Experience implementing CI/CD for data pipelines.
- Experience ensuring data quality reliability governance and compliance in regulated environments.
- Strong communication and documentation skills.
Preferred Skills
- Banking / Financial Services Domain
- Risk Reporting Platforms
- Data Governance
- Enterprise Data Architecture
- Near Real-Time Data Processing
Key Technologies
Apache Spark Spark SQL PySpark Apache Airflow SQL Hive HDFS S3 Iceberg Oracle SQL Server Shell Scripting ETL Data Pipelines Data Modeling Control-M CA7 CI/CD
Cloud BC Labs Inc is a digital transformation organization aimed at creating seamless solutions for clients to effectively manage their business operations. The company specializes in Business and Management Consulting AI/ML Data Analytics & Visualization Cloud Data Warehouse Migration Snowflake Implementation Informatica Implementation & Upgrade Staffing Services and Data Management Solutions