This is a remote position.
We are seeking a skilled and motivated AWS Platform Engineer to join our team and contribute to the development and maintenance of scalable data solutions. The ideal candidate will have hands-on experience with EMR and Spark (PySpark) a strong understanding of data mesh principles and a passion for building robust data pipelines and quality frameworks.
Location: Remote
Status: Contract
Responsibilities:
- Design develop and optimize data pipelines using EMR and Spark (PySpark)
- Implement and manage AWS Lake Formation for secure and governed data access
- Contribute to the development of data mesh solutions enabling domain-oriented data ownership and interoperability
- Write and maintain data quality checks to ensure accuracy completeness and reliability of data
- Establish a standardized framework to promote consistency and ensure alignment with architectural standards
- Support light DevOps tasks with a strong preference for experience using Terraform for infrastructure as code
Requirements
- Proven experience with Apache Spark and AWS EMR
- Familiarity with Lake Formation and AWS data lake architecture
- Exposure to data mesh concepts and implementation
- Experience writing data validation and quality checks
- Experience defining best-practices framework to ensure consistency and compliance with architectural standard
- Experience with Terraform or similar tools for infrastructure automation
5+ years of experience in data quality assurance and testing, including developing and executing functional test cases, validating data pipelines, and coordinating deployments from development to production environments. Has supported at least one Enterprise/Government Organization with Big Data platforms and tools, such as Hadoop (HDFS, Pig, Hive, Spark), Big SQL, NoSQL, and Scala, ideally within cloud-based environments. 3+ data analysis and modeling projects, including working with structured and unstructured databases, building automated data quality pipelines, and collaborating with data engineers and architects to ensure high data integrity. Experience developing and executing test cases for Big Data pipelines, with deployments across dev, test, and production environments. Strong SQL skills for validation, troubleshooting, and data profiling. Applied knowledge of Big Data platforms including Hadoop (HDFS, Hive, Pig), Spark, BigSQL, NoSQL, Scala. Familiarity with cloud data ingestion and integration methods. Experience working with structured and unstructured data formats. Understanding of data modeling, data structures, and use-case-driven design. Experience in test automation for data validation pipelines is a strong asset. Prior experience with Genesys Cloud testing is a plus. Exposure to Tableau or other BI tools is beneficial. Hybrid role: 2 days/week onsite in North Vancouver