Role: Data Engineer
Experience: 5 Years
Location: Noida/Pune
Mode: Work from Office
Role Overview:
We are looking for a skilled and experienced Data Engineer to join our growing data team. The ideal candidate will have hands-on experience working with Big Data technologies like Kafka Spark and Airflow and be proficient in working with databases like SQL/PLSQL and Oracle. You will be responsible for building and optimizing data pipelines integrating various data sources and supporting analytics teams with high-quality clean and scalable data solutions.
Key Responsibilities:
- Data Pipeline Development: Design develop and maintain scalable data pipelines using tools like Apache Kafka Apache Spark and Airflow to support data ingestion transformation and processing workflows.
- Data Integration: Integrate diverse data sources (structured and unstructured) and transform raw data into usable formats for analytics and business intelligence purposes.
- Database Management: Work with relational databases SQL PLSQL Oracle to manage and query large datasets ensuring efficient storage and retrieval.
- ETL Processes: Build and maintain ETL (Extract Transform Load) processes to ensure data flows smoothly from various sources to data storage systems.
- Real-time Data Processing: Implement real-time data streaming solutions using Kafka for event-driven architectures enabling near real-time data ingestion and processing.
- Data Quality & Validation: Ensure data accuracy completeness and consistency across all stages of the data pipeline from source to destination.
- Automation & Scheduling: Leverage Airflow for task orchestration managing workflows and automating data processing tasks.
- Data Visualization: Utilize tools like Apache Superset for data visualization and dashboard creation to provide insights and metrics to stakeholders.
- Optimization & Performance Tuning: Optimize data workflows and improve the performance of data pipelines to handle large-scale datasets efficiently.
- Collaboration: Work closely with data scientists analysts and business teams to understand their requirements and deliver actionable data solutions.
- Documentation: Maintain detailed documentation of data pipeline designs data models and best practices for future reference and scalability.
Key Requirements:
- Experience: Minimum 5 years of experience as a Data Engineer with hands-on experience in building and managing data pipelines and data integration solutions.
- Big Data Technologies: Strong knowledge and experience with Apache Kafka Apache Spark and Airflow.
- Database Skills: Expertise in working with SQL/PLSQL and Oracle databases for querying and managing large datasets.
- ETL: Proven experience in designing and implementing ETL processes for extracting transforming and loading data efficiently.
- Data Streaming: Experience with real-time data streaming and event-driven architecture using Kafka.
- Data Visualization: Familiarity with Apache Superset or similar tools for creating interactive dashboards and reports.
- Programming Languages: Proficiency in Python Java or Scala for data processing and automation.
- Data Modeling: Understanding of data modeling principles and techniques for designing efficient data storage solutions.
- On-Prem Platforms: Experience with on-premise platforms like Kubernetes OpenShift VMware or OpenStack is essential for managing and orchestrating data infrastructure.
- Cloud Platforms: Experience with cloud platforms like AWS is a plus.
- Agile Environment: Familiarity with Agile methodologies and working in cross-functional teams.
- Problem Solving: Strong analytical and problem-solving skills with the ability to troubleshoot data issues and optimize pipelines.
- Communication: Excellent communication skills with the ability to collaborate effectively with business stakeholders and technical teams.
openstack,etl,scala,openshift,airflow,plsql,apache superset,kafka,apache spark,aws,apache kafka,data streaming,java,python,sql,apache,kubernetes,oracle,vmware