Design develop and optimize batch and streamingdatapipelines using Spark PySparkDatabricks and AWS Glue.
Architect and maintaindatalakes and warehouses (Snowflake Redshift) and implement dbt-baseddatamodeling for analytics-readydatasets.
Build and manage orchestration workflows using Apache Airflow and automate deployments with CI/CD pipelines (GitHub Actions Terraform Jenkins).
Implementdatagovernance security and compliance controls including PII masking RBAC and encryption.
Collaborate with cross-functional teams (product managers analystsdatascientists) to deliverdatasets powering BI dashboards and ML models.
Optimize Spark jobs and SQL queries for performance and cost efficiency on large-scaledatasets (billions of rows).
Develop containerized solutions using Docker and Kubernetes ensuring high availability and scalability.
Monitor and troubleshoot pipelines using CloudWatch Splunk and Azure Monitor to maintain SLA compliance.