You will join the Contact Centre Renewal Project as a Data Engineer / Tester building and validating data pipelines that move high-volume claims insurance and finance data from an on-premise to a Big Data cloud platform (Data Lake). Working within an established Information Management practice you ll own the end-to-end pipeline lifecycle from source analysis and data modelling through Spark/Scala development automated data-quality testing and sustainment documentation.
Please note this is a 5 month contract position.
Deliverables:
- Analyze the source data prepare data model & mapping and develop Scala/Spark programs and related components in the work areas of Information Management relating to claims insurance and finance data.
- Use data pipelines to extract data from data sources which are in various formats (viz - flat files XML relational tables Oracle logs) and Use tools (such as Stream Sets Scala and Spark programs) to transform and store the data in big data platform (Data Lake) after data validation. Ingest data from cloud to Big Data.
- Develop Spark and Scala pipelines for data ingestion and Data transformation as per the mapping document from Cloud to Big Data
- Data Analysis as per requirements and develop data models/mapping
- Data validations and automated data quality pipelines
- Relevant documentation for sustainment
Requirements
- Applied knowledge in Big Data platforms ideally with exposure to Hadoop ecosystem(HDFS Pig Hive SPARK Big SQL NoSQL YARN); Solid experience in Scala
- Experience in designing efficient and robust data pipelines using Spark.
- Excellent skill in SQL development.
- Knowledge of data modeling and understanding of different data structures and their benefits and limitations under particular use cases.
- Hands-on experience with structured and unstructured data like JSON.
- Experience with enterprise systems such as Guidewire Claim Center Guidewire Policy Center would be an asset.
- Hands-on experience Cloud data ingestion using REST API.
- Knowledge on Contact Centre business.
- 5 years of experience developing data pipelines using tools like Spark and Scala with a focus on data ingestion transformation and validation in Big Data environments (e.g. Hadoop ecosystem Data Lakes).
- Has supported at least 1 Enterprise or Government entity with an enterprise data integration project including working with structured and unstructured data (e.g. JSON XML relational databases) and ingesting data from cloud platforms using REST APIs.
- Has supported at least 1 project where they were responsible in data modeling and mapping for large-scale cloud environments. The ideal candidate will have experience within domains such as insurance claims or finance and familiarity with enterprise platforms like Guidewire Claim Center or Policy Center is a strong asset.
Work Environment
hybrId work environment 2 days/week in North Vancouver.
5+ years of experience in data quality assurance and testing, including developing and executing functional test cases, validating data pipelines, and coordinating deployments from development to production environments. Has supported at least one Enterprise/Government Organization with Big Data platforms and tools, such as Hadoop (HDFS, Pig, Hive, Spark), Big SQL, NoSQL, and Scala, ideally within cloud-based environments. 3+ data analysis and modeling projects, including working with structured and unstructured databases, building automated data quality pipelines, and collaborating with data engineers and architects to ensure high data integrity. Experience developing and executing test cases for Big Data pipelines, with deployments across dev, test, and production environments. Strong SQL skills for validation, troubleshooting, and data profiling. Applied knowledge of Big Data platforms including Hadoop (HDFS, Hive, Pig), Spark, BigSQL, NoSQL, Scala. Familiarity with cloud data ingestion and integration methods. Experience working with structured and unstructured data formats. Understanding of data modeling, data structures, and use-case-driven design. Experience in test automation for data validation pipelines is a strong asset. Prior experience with Genesys Cloud testing is a plus. Exposure to Tableau or other BI tools is beneficial. Hybrid role: 2 days/week onsite in North Vancouver