Design develop and support data engineering data modeling and data integrations with a primary focus on accelerating data landing and curation in a Databricks data lake house. Build and maintain reliable well-governed pipelines that ingest data from source systems into the lake house and curate it through a layered (medallion) architecture into analytics-ready trusted datasets. The role also carries a strong reporting and data-analysis focus - partnering with business users to build semantic data models dashboards and reports and performing hands-on analysis to answer business questions. The Data Engineer will help establish the data foundation that powers data-related AI and machine learning initiatives ensuring high-quality well-documented AI-ready data products.
Key Responsibilities
Build optimize and support pipelines that land data from source systems into the Databricks lake house and curate it through a layered (medallion) architecture into trusted analytics-ready datasets.
Produce and maintain high-quality well-governed documented AI-ready data products that serve as the foundation for AI and machine learning initiatives.
Implement data quality governance and monitoring controls (e.g. Unity Catalog automated testing alerting) across lake house pipelines.
Develop and maintain reporting and analytics solutions - semantic data models dashboards and reports - and perform ad-hoc querying to support business decision-making.
Gather requirements design and develop new data integrations or enhancements to existing code.
Partner with business users and the Business Relationship Management team on requirements gathering testing and supporting existing integrations analytics and reporting.
Create and maintain documentation and process flows for integration solutions.
Required Experience & Skills
Minimum 5 years of IT/technology experience spanning data analysis data engineering and/or data integration with a focus on building and curating pipelines in a cloud data lake or lake house environment.
At least 3 years writing SQL/NoSQL queries with specific experience in MS SQL Server Oracle and/or Postgres.
Hands-on experience with a modern cloud data platform / lake house (Databricks Microsoft Fabric Snowflake or comparable). Databricks strongly preferred.
Demonstrated experience landing data from diverse source systems into a lake/lake house and curating it through a medallion (bronze-silver-gold) architecture into clean conformed analytics-ready datasets.
Strong Python skills for data engineering including PySpark.
Working knowledge of data quality data governance and pipeline reliability practices - automated testing monitoring alerting and orchestration of batch and incremental/streaming workloads.
Experience designing simplified data models for integrations analytics and reporting; comfortable performing hands-on data analysis and ad-hoc querying.
Experience extracting data from source systems via web services (SOAP REST Web APIs) XML and CSV/Excel exports.
Experience building the data foundation and automation pipelines for analytics and AI/ML initiatives and partnering with business users on LLM/GenAI use cases.
Bachelors degree in Information Systems IT or a related technical discipline - or equivalent demonstrated technical proficiency.
Strong interpersonal and communication skills; fluent in English (oral and written).
Preferred / Nice-to-Have
Python cloud data warehouse experience (e.g. Snowflake Synapse) Spark SQL
Performance tuning partitioning and optimization.
Modern LLM architectures and GenAI frameworks - retrieval-augmented generation (RAG) embeddings and vector databases prompt orchestration and integrating LLMs into data products and pipelines.
Familiarity with using LLMs in automation development and with vector/embedding data.
Experience in the Oil & Gas domain.
NAVA Software is looking for a Sr. Data Engineer Details: Sr. Data Engineer Location: Sugarland TX - 4 days/week onsite Duration: 12 months Design develop and support data engineering data modeling and data integrations with a primary focus on accelerating data landing and curation in a Databr...
NAVA Software is looking for a Sr. Data Engineer
Details:
Sr. Data Engineer
Location: Sugarland TX - 4 days/week onsite
Duration: 12 months
Design develop and support data engineering data modeling and data integrations with a primary focus on accelerating data landing and curation in a Databricks data lake house. Build and maintain reliable well-governed pipelines that ingest data from source systems into the lake house and curate it through a layered (medallion) architecture into analytics-ready trusted datasets. The role also carries a strong reporting and data-analysis focus - partnering with business users to build semantic data models dashboards and reports and performing hands-on analysis to answer business questions. The Data Engineer will help establish the data foundation that powers data-related AI and machine learning initiatives ensuring high-quality well-documented AI-ready data products.
Key Responsibilities
Build optimize and support pipelines that land data from source systems into the Databricks lake house and curate it through a layered (medallion) architecture into trusted analytics-ready datasets.
Produce and maintain high-quality well-governed documented AI-ready data products that serve as the foundation for AI and machine learning initiatives.
Implement data quality governance and monitoring controls (e.g. Unity Catalog automated testing alerting) across lake house pipelines.
Develop and maintain reporting and analytics solutions - semantic data models dashboards and reports - and perform ad-hoc querying to support business decision-making.
Gather requirements design and develop new data integrations or enhancements to existing code.
Partner with business users and the Business Relationship Management team on requirements gathering testing and supporting existing integrations analytics and reporting.
Create and maintain documentation and process flows for integration solutions.
Required Experience & Skills
Minimum 5 years of IT/technology experience spanning data analysis data engineering and/or data integration with a focus on building and curating pipelines in a cloud data lake or lake house environment.
At least 3 years writing SQL/NoSQL queries with specific experience in MS SQL Server Oracle and/or Postgres.
Hands-on experience with a modern cloud data platform / lake house (Databricks Microsoft Fabric Snowflake or comparable). Databricks strongly preferred.
Demonstrated experience landing data from diverse source systems into a lake/lake house and curating it through a medallion (bronze-silver-gold) architecture into clean conformed analytics-ready datasets.
Strong Python skills for data engineering including PySpark.
Working knowledge of data quality data governance and pipeline reliability practices - automated testing monitoring alerting and orchestration of batch and incremental/streaming workloads.
Experience designing simplified data models for integrations analytics and reporting; comfortable performing hands-on data analysis and ad-hoc querying.
Experience extracting data from source systems via web services (SOAP REST Web APIs) XML and CSV/Excel exports.
Experience building the data foundation and automation pipelines for analytics and AI/ML initiatives and partnering with business users on LLM/GenAI use cases.
Bachelors degree in Information Systems IT or a related technical discipline - or equivalent demonstrated technical proficiency.
Strong interpersonal and communication skills; fluent in English (oral and written).
Preferred / Nice-to-Have
Python cloud data warehouse experience (e.g. Snowflake Synapse) Spark SQL
Performance tuning partitioning and optimization.
Modern LLM architectures and GenAI frameworks - retrieval-augmented generation (RAG) embeddings and vector databases prompt orchestration and integrating LLMs into data products and pipelines.
Familiarity with using LLMs in automation development and with vector/embedding data.