Description:
Data Engineer
Job Summary:
We are seeking a highly skilled and detail-oriented Data Engineer with expertise in data architecture pipeline development cloud platforms and big data technologies. The ideal candidate will be responsible for designing building and maintaining scalable data infrastructure ensuring efficient data flow across systems and enabling advanced analytics and machine learning capabilities.
Key Responsibilities:
Good to Have Palantir Foundry Experience:
Experience working with Foundry Ontology to model enterprise data and define semantic relationships.
Building and maintaining Code Workbooks and Data Pipelines within Foundry for scalable ETL/ELT workflows.
Familiarity with Foundrys Object Explorer and Data Lineage tools for tracking data transformations and dependencies.
Integration of Foundry with external systems using Foundry APIs and Data Connections.
Experience with Foundrys Operational Workflows for automating data-driven decision-making.
Understanding of Foundrys security model and access control for enterprise-grade data governance.
Design develop and maintain ETL/ELT pipelines for structured and unstructured data.
Build and optimize data lakes data warehouses and real-time streaming systems.
Collaborate with Data Scientists and Analysts to ensure data availability and quality for modeling and reporting.
Implement data governance security and compliance protocols.
Develop and maintain data APIs and services for internal and external consumption.
Work with cloud platforms (AWS Azure GCP) to deploy scalable data solutions.
Monitor and troubleshoot data workflows ensuring high availability and performance.
Automate data validation transformation and integration processes.
Manage large-scale datasets using distributed computing frameworks like Spark and Hadoop.
Stay updated with emerging data engineering tools and best practices.
Technical Skills:
Programming & Frameworks:
Languages: Python SQL Scala Java
Frameworks & Tools: Apache Spark Hadoop Airflow Kafka Flink NiFi Beam
Libraries: Pandas PySpark Dask FastAPI SQLAlchemy
Cloud & DevOps:
Platforms: AWS (Glue Redshift S3 EMR) Azure (Data Factory Synapse) GCP (BigQuery Dataflow)
DevOps Tools: Docker Kubernetes Jenkins Terraform Git GitHub
Databases:
Relational: MySQL PostgreSQL SQL Server
NoSQL: MongoDB Cassandra DynamoDB Redis
Data Warehousing: Snowflake Redshift BigQuery Azure Synapse
Data Architecture & Processing:
ETL/ELT design and implementation
Batch and real-time data processing
Data modeling (Star Snowflake schemas)
Data quality and lineage tools (Great Expectations dbt Amundsen)
Monitoring & Visualization:
Prometheus Grafana CloudWatch
Integration with BI tools like Power BI Tableau
Qualifications:
Bachelors or Masters degree in Computer Science Information Systems Engineering or related field.
Proven experience in building and managing data pipelines and infrastructure.
Strong understanding of data architecture distributed systems and cloud-native technologies.
Excellent problem-solving and communication skills.
Enable Skills-Based Hiring No
Additional Details
- Planned Resource Unit : (55)ITTRUCKS;(10)F/TC - Application Engineer - 0-3 Yrs;Data Engineer;(Z1)0-3 Years