Senior Big Data Engineer SAP Concur Complete

São Leopoldo - Brazil

Monthly Salary: Not Disclosed

Posted on: 2 days ago

Vacancies: 1 Vacancy

Job Summary

We help the world run better
At SAP we keep it simple: you bring your best to us and well bring out the best in you. Were builders touching over 20 industries and 80% of global commerce and we need your unique talents to help shape whats next. The work is challenging but it matters. Youll find a place where you can be yourself prioritize your wellbeing and truly belong. Whats in it for you Constant learning skill growth great benefits and a team that wants you to grow and succeed.

Join us as aSeniorBig Data Engineer supporting the SAP Concur platform working hybrid in Sao Leopoldo. You will design build and evolve the data pipelines and infrastructure that process billions of transactions receipts and travel events every day powering the analytics machine learning and operational reporting that millions of businesses depend on.

What You Will Build and Why It Matters

You will be a hands-on engineer and technical steward of SAPConcursdata platform owning the full pipeline lifecycle from raw ingestion through curated analytics-ready data products. Core areas of ownership include:

Scalable batch and streaming data pipelines that ingest transform and deliver structured and semi-structured data across the SAP Concur platform processing petabytes of expense travel and invoicing data.

End-to-end ETL/ELT workflows using industry-standard frameworks ensuring data isaccuratetimely and traceable from source to consumption layer.

Lakehouse and data warehouse architecture designing andmaintainingBronze/Silver/Gold medallion layers partition strategies and table formats (Delta Lake Apache Iceberg) that balance query performance with storage cost.

Real-time streaming pipelines for high-velocity event data enabling fraud detection signals live spend dashboards and near-real-time notificationtriggers forthe Concur notification service.

Data quality and observability frameworks implementing automated data validation schema drift detection SLA monitoring and lineage tracking so that downstream consumers can trust every dataset.

Pipeline infrastructure and DevOps building andmaintainingCI/CD workflows for data code managing infrastructure-as-code (Terraform/CDK) and ensuring robust monitoring and alerting across all pipeline stages.

Collaborative data modelling with analytics engineers data scientists and product managers to ensure that canonical data models support both operational and analytical use cases.

Continuousoptimizationof existing pipelines reducing processing latency loweringcomputeand storage costs and improving resilience and fault-tolerance across the platform.

What You Bring

Languages & Query Fundamentals

Pythonasthe primary language for data pipeline development fluent in idiomatic PythonPySpark and scripting for automation and orchestration.

Advanced SQL for complex transformations window functions queryoptimization and datamodellingacross both relational and analytical warehouse environments.

Working knowledge of Scala or Java for interacting with Apache Spark internals JVM-based big data frameworks or compiled pipeline components.

Big Data Processing & Frameworks

Expertisein Apache Spark PySpark Spark SQL Structured StreamingDataFrames adaptive query execution and job-level performance tuning (partitioning caching broadcast joins shuffleoptimisation).

Experience with distributedlakehouseplatforms such as Databricks.

Familiarity with the broader Hadoop ecosystem (HDFS Hive YARN) as it applies to legacy migration and hybridon-premises/cloud architectures.

Experience with real-time stream processing using Apache Kafka (producers consumers Kafka Streams) and complementary engines such as Apache Flink or Spark Structured Streaming forexactly-onceand low-latency processing.

Data Warehousing & Storage

Understanding of cloud data warehouses Snowflake GoogleBigQuery or Amazon Redshift

Solid understanding of open table formats: Delta Lake and Apache Iceberg including ACID transactions time travel schema evolution and compaction strategies.

Familiarity with data lake storage on AWS S3 Azure Data Lake Storage (ADLS) or Google Cloud Storage and the trade-offs between lake warehouse andlakehousearchitectures.

Experience with NoSQL and document stores (DynamoDB) where applicable to high-throughput low-latency operational data access patterns.

Orchestration &DataOps

Understanding ofApache Airflow for authoring scheduling and monitoring DAG-based pipeline workflows; familiarity with alternatives such as Prefect orDagsteris a plus.

Experience implementingdbt(data build tool) for in-warehouse SQL transformations testing documentation and lineage includingdbtCloud ordbtCore with version-controlled model management.

Strong DevOps andDataOpspractices: CI/CD pipeline design for data code using GitHub Actions or similar tools; infrastructure-as-code with Terraform or AWS CDK;containerisedpipeline execution with Docker and Kubernetes.

Understanding of data governance concepts data lineage metadata management (Apache AtlasOpenLineage) data cataloguing and data contracts and practical experience applying them to production pipelines.

Cloud Platforms & Infrastructure

Working knowledge of at least one major cloud provider: AWS (S3 Glue EMR Kinesis Lambda Redshift RDS) GCP (BigQueryDataproc Pub/Sub Cloud Composer) or Microsoft Azure (ADLS Synapse Analytics Data Factory Event Hubs).

Comfort deploying and operating workloads incontainerizedenvironments Docker Kubernetes (EKS/GKE/AKS) and working with serverlesscomputefor lightweight pipeline tasks.

Experience with cost-aware cloud architecture: query tagging compute auto-scaling storage tiering and right-sizing clusters to balance performance against infrastructure spend.

Familiarity with observability and monitoring tooling relevant to data platforms Grafana CloudWatch Datadog or Monte Carlo for pipeline health data freshness SLAs and anomaly detection.

Data Quality & Reliability

Experience implementing automated data quality testing frameworks such as Great Expectations or Soda including row-level validation schema checks freshness assertions and drift alerting.

Understanding of idempotency exactly-once semantics and late-arriving data patterns designing pipelines that can be safely re-run without duplicating or corrupting data.

Collaboration & Leadership

Fluent English for collaborating with global multi-regional teams across the Americas EMEA and APJ.

Ability to partner with data scientists analytics engineers product managers and software engineers translating business requirements into sound technical data models and pipeline designs.

Proactive communication style comfortable raising data quality issues SLA risks and infrastructure concerns to stakeholders before they become production incidents.

Experience using AI coding assistants (Claude Code Cursor or similar) and AI-assisted data quality tooling to accelerate pipeline development and debugging is a plus.

Domain & Platform Knowledge

Familiarity with financial transaction data expense management ERP integrations or travel and hospitality data domains isadvantageous.

Experience working within SAP BTP SAP HANA or SAP Datasphere data ecosystems is a plus.

Where You Belong

A diverse inclusive culture where global perspectives shape better products SAPs workforce spans more than 160 countries.

A hybrid work environment in Sao Leopoldo that blends flexibility with meaningful in-person collaboration.

Cross-cultural cross-functional teams that support shared learning and collective problem-solving.

Continuous learning through SAP Learning Hub external conference support and access to leading-edge data engineering tooling.

A team culture that values clean observable and well-tested data systems and psychological safety to raise ideas challenge assumptions and propose improvements.

Bring out your best
SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software SAP has evolved to become a market leader in end-to-end business application software and related services for database analytics intelligent technologies and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide we are purpose-driven and future-focused with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries people or platforms we help ensure every challenge gets the solution it deserves. At SAP you can bring out your best.

We win with inclusion
SAPs culture of inclusion focus on health and well-being and flexible working models help ensure that everyone regardless of background feels included and can run at their best. At SAP we believe we are made stronger by the unique capabilities and qualities that each person brings to our company and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better world.

SAP is committed to the values of Equal Employment Opportunity and provides accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application please send an e-mail with your request to Recruiting Operations Team:

For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.

Qualified applicants will receive consideration for employment without regard to their age race religion national origin ethnicity gender (including pregnancy childbirth et al) sexual orientation gender identity or expression protected veteran status or disability in compliance with applicable federal state and local legal requirements.

Successful candidates might be required to undergo a background verification with an external vendor.

AI Usage in the Recruitment Process

For information on the responsible use of AI in our recruitment process please refer to our Guidelines for Ethical Usage of AI in the Recruiting Process.

Please note that any violation of these guidelines may result in disqualification from the hiring process.

Requisition ID: 450646 Work Area: Software-Design and Development Expected Travel: 0 - 10% Career Status: Professional Employment Type: Regular Full Time Additional Locations: #LI-Hybrid

Required Experience:

Senior IC

We help the world run betterAt SAP we keep it simple: you bring your best to us and well bring out the best in you. Were builders touching over 20 industries and 80% of global commerce and we need your unique talents to help shape whats next. The work is challenging but it matters. Youll find a pla...

What You Will Build and Why It Matters

Scalable batch and streaming data pipelines that ingest transform and deliver structured and semi-structured data across the SAP Concur platform processing petabytes of expense travel and invoicing data.

End-to-end ETL/ELT workflows using industry-standard frameworks ensuring data isaccuratetimely and traceable from source to consumption layer.

Lakehouse and data warehouse architecture designing andmaintainingBronze/Silver/Gold medallion layers partition strategies and table formats (Delta Lake Apache Iceberg) that balance query performance with storage cost.

Real-time streaming pipelines for high-velocity event data enabling fraud detection signals live spend dashboards and near-real-time notificationtriggers forthe Concur notification service.

Data quality and observability frameworks implementing automated data validation schema drift detection SLA monitoring and lineage tracking so that downstream consumers can trust every dataset.

Pipeline infrastructure and DevOps building andmaintainingCI/CD workflows for data code managing infrastructure-as-code (Terraform/CDK) and ensuring robust monitoring and alerting across all pipeline stages.

Collaborative data modelling with analytics engineers data scientists and product managers to ensure that canonical data models support both operational and analytical use cases.

Continuousoptimizationof existing pipelines reducing processing latency loweringcomputeand storage costs and improving resilience and fault-tolerance across the platform.

What You Bring

Languages & Query Fundamentals

Pythonasthe primary language for data pipeline development fluent in idiomatic PythonPySpark and scripting for automation and orchestration.

Advanced SQL for complex transformations window functions queryoptimization and datamodellingacross both relational and analytical warehouse environments.

Working knowledge of Scala or Java for interacting with Apache Spark internals JVM-based big data frameworks or compiled pipeline components.

Big Data Processing & Frameworks

Expertisein Apache Spark PySpark Spark SQL Structured StreamingDataFrames adaptive query execution and job-level performance tuning (partitioning caching broadcast joins shuffleoptimisation).

Experience with distributedlakehouseplatforms such as Databricks.

Familiarity with the broader Hadoop ecosystem (HDFS Hive YARN) as it applies to legacy migration and hybridon-premises/cloud architectures.

Experience with real-time stream processing using Apache Kafka (producers consumers Kafka Streams) and complementary engines such as Apache Flink or Spark Structured Streaming forexactly-onceand low-latency processing.

Data Warehousing & Storage

Understanding of cloud data warehouses Snowflake GoogleBigQuery or Amazon Redshift

Solid understanding of open table formats: Delta Lake and Apache Iceberg including ACID transactions time travel schema evolution and compaction strategies.

Familiarity with data lake storage on AWS S3 Azure Data Lake Storage (ADLS) or Google Cloud Storage and the trade-offs between lake warehouse andlakehousearchitectures.

Experience with NoSQL and document stores (DynamoDB) where applicable to high-throughput low-latency operational data access patterns.

Orchestration &DataOps

Understanding ofApache Airflow for authoring scheduling and monitoring DAG-based pipeline workflows; familiarity with alternatives such as Prefect orDagsteris a plus.

Experience implementingdbt(data build tool) for in-warehouse SQL transformations testing documentation and lineage includingdbtCloud ordbtCore with version-controlled model management.

Strong DevOps andDataOpspractices: CI/CD pipeline design for data code using GitHub Actions or similar tools; infrastructure-as-code with Terraform or AWS CDK;containerisedpipeline execution with Docker and Kubernetes.

Understanding of data governance concepts data lineage metadata management (Apache AtlasOpenLineage) data cataloguing and data contracts and practical experience applying them to production pipelines.

Cloud Platforms & Infrastructure

Working knowledge of at least one major cloud provider: AWS (S3 Glue EMR Kinesis Lambda Redshift RDS) GCP (BigQueryDataproc Pub/Sub Cloud Composer) or Microsoft Azure (ADLS Synapse Analytics Data Factory Event Hubs).

Comfort deploying and operating workloads incontainerizedenvironments Docker Kubernetes (EKS/GKE/AKS) and working with serverlesscomputefor lightweight pipeline tasks.

Experience with cost-aware cloud architecture: query tagging compute auto-scaling storage tiering and right-sizing clusters to balance performance against infrastructure spend.

Familiarity with observability and monitoring tooling relevant to data platforms Grafana CloudWatch Datadog or Monte Carlo for pipeline health data freshness SLAs and anomaly detection.

Data Quality & Reliability

Experience implementing automated data quality testing frameworks such as Great Expectations or Soda including row-level validation schema checks freshness assertions and drift alerting.

Understanding of idempotency exactly-once semantics and late-arriving data patterns designing pipelines that can be safely re-run without duplicating or corrupting data.

Collaboration & Leadership

Fluent English for collaborating with global multi-regional teams across the Americas EMEA and APJ.

Ability to partner with data scientists analytics engineers product managers and software engineers translating business requirements into sound technical data models and pipeline designs.

Proactive communication style comfortable raising data quality issues SLA risks and infrastructure concerns to stakeholders before they become production incidents.

Experience using AI coding assistants (Claude Code Cursor or similar) and AI-assisted data quality tooling to accelerate pipeline development and debugging is a plus.

Domain & Platform Knowledge

Familiarity with financial transaction data expense management ERP integrations or travel and hospitality data domains isadvantageous.

Experience working within SAP BTP SAP HANA or SAP Datasphere data ecosystems is a plus.

Where You Belong

A diverse inclusive culture where global perspectives shape better products SAPs workforce spans more than 160 countries.

A hybrid work environment in Sao Leopoldo that blends flexibility with meaningful in-person collaboration.

Cross-cultural cross-functional teams that support shared learning and collective problem-solving.

Continuous learning through SAP Learning Hub external conference support and access to leading-edge data engineering tooling.

A team culture that values clean observable and well-tested data systems and psychological safety to raise ideas challenge assumptions and propose improvements.

AI Usage in the Recruitment Process

For information on the responsible use of AI in our recruitment process please refer to our Guidelines for Ethical Usage of AI in the Recruiting Process.

Required Experience:

Senior IC

Key Skills

Apply Now

About Company

SAP

SAP started in 1972 as a team of five colleagues with a desire to do something new. Together, they changed enterprise software and reinvented how business was done. Today, as a market leader in enterprise application software, we remain true to our roots. That’s why we engineer soluti ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click