Overview:
We are looking for a highly skilled Data Engineer to design build and manage scalable data pipelines and systems that power analytics insights and business intelligence. This role involves hands-on ownership of our Data Lake Databricks platform and real-time data streaming pipelines across MySQL Kafka and Spark ensuring reliable secure and performant data flow across the organization.
Key Responsibilities
Own and manage the enterprise Data Lake infrastructure on AWS and Databricks ensuring reliability scalability and governance.
Design develop and optimize data ingestion and transformation pipelines from MySQL to Kafka (CDC pipelines) and from Kafka to Databricks using Spark Structured Streaming.
Strong expertise in designing and implementing MapReduce jobs to process and transform large-scale datasets efficiently across distributed systems.
Capable of optimizing MapReduce workflows for performance fault tolerance and scalability in big data environments.
Build robust batch and near real-time data pipelines capable of handling high-volume high-velocity data efficiently.
Develop and maintain metadata-driven data processing frameworks ensuring consistency lineage and traceability.
Implement and maintain strong observability and monitoring systems (logging metrics alerting) using Prometheus Grafana or equivalent tools.
Work closely with Product Regulatory and Security teams to ensure compliance privacy and data quality across the data lifecycle.
Collaborate with cross-functional teams to build end-to-end data lakehouse solutions integrating multiple systems and data sources.
Apply best practices in code quality CI/CD automation (Jenkins GitHub Actions) and infrastructure as code (IaC) for deployment consistency.
Ensure system reliability and scalability through proactive monitoring performance tuning and fault-tolerant design.
Stay up to date with the latest technologies in data engineering streaming and distributed systems and drive continuous improvements.
Qualifications :
Required Skills & Experience
Strong programming expertise in one or more of the following: Scala Spark Java or Python.
Experience 6 - 10 Years
Proven experience working with Kafka (Confluent or Apache) for building event-driven or CDC-based pipelines.
Hands-on experience with distributed data processing frameworks (Apache Spark Databricks or Flink) for large-scale data handling.
Solid understanding of Kubernetes for deploying and managing scalable resilient data workloads (EKS experience preferred).
Practical experience with AWS Cloud Services such as S3 Lambda EMR Glue IAM and CloudWatch.
Experience designing and managing data lakehouse architectures using Databricks or similar platforms.
Familiarity with metadata-driven frameworks and principles of data governance lineage and cataloging.
Experience with CI/CD pipelines (Jenkins GitHub Actions) for data workflow deployment and automation.
Experience with monitoring and alerting frameworks such as Prometheus Grafana or ELK stack.
Problem-solving communication and collaboration skills.
Additional Information :
At Freshworks we have fostered an environment that enables everyone to find their true potential purpose and passion welcoming colleagues of all backgrounds genders sexual orientations religions and ethnicities. We are committed to providing equal opportunity and believe that diversity in the workplace creates a more vibrant richer environment that boosts the goals of our employees communities and business. Fresh vision. Real impact. Come build it with us.
Remote Work :
No
Employment Type :
Full-time
Freshworks makes it fast and easy for businesses to delight their customers and employees. We do this by taking a fresh approach to building and delivering software that is affordable, quick to implement, and designed for the end user. Headquartered in San Mateo, California, Freshwork ... View more