Overview:
TekWissen Group is a workforce management provider operating throughout India and several other countries worldwide. The client below is a leading technology company offering a range of IT solutions to businesses and organizations enabling them to transform their digital futures
Position: Senior Software Engineer
Location: Hyderabad
Duration: 24 Months
Job Type: Contract
Work Type: Hybrid
Shift Timings: 9:00 AM-6:00 PM
Job Description:
- Install configure and administer data ingestion and transformation platforms including StreamSets Apache Spark Informatica and Apache NiFi for maximum utilization and throughput.
- Perform StreamSets Data Collector and Control Hub administration including pipeline deployment monitoring and optimization across multiple environments.
- Experience in designing and implementing real-time and batch data pipelines using Apache Spark with Python (PySpark) with Scala knowledge as an added advantage.
- Hands-on experience with Apache NiFi for data flow automation including processor configuration flow management and cluster coordination.
- Experience in upgrading and migrating data ingestion tools and pipelines to higher versions while maintaining data integrity and minimal downtime.
- Highly proficient in Python programming for data processing pipeline development and automation scripting.
- Expert-level SQL skills for complex data transformations performance optimization and database interactions across various RDBMS platforms.
- Experience with diverse database technologies and data sources:
- MPP databases: Teradata Greenplum for large-scale analytical workloads
- OLTP databases: Oracle SQL Server PostgreSQL for transactional processing
- In-memory databases: SAP HANA MemSQL for high-performance analytics
- Big Data platforms: Hive HBase and Kudu for distributed data storage and processing
- Additional data sources including SFDC and modern cloud data platforms
- Ability to write efficient scalable code following best practices and coding standards for data engineering projects.
- Extensive experience with automated testing integration using CI/CD tools preferably GitLab (Jenkins experience is beneficial).
- Implement and maintain automated deployment pipelines for data ingestion and transformation workflows.
- Experience in version control branching strategies and collaborative development practices for data engineering projects.
- Knowledge of infrastructure as code and containerization technologies for deployment automation.
- Perform data pipeline deployments across versioned repositories from development to production environments.
- Experience with TLS hardening and security best practices for data platforms and APIs.
- Proficient in using command-line tools and APIs for platform administration and monitoring.
- Create maintain and restore backup strategies for data pipelines configurations and metadata repositories.
- Expertise in troubleshooting performance bottlenecks in data processing workflows and implementing optimization strategies.
- Experience in cluster maintenance and distributed system administration for Spark and other big data technologies.
- Production environment health monitoring alerting and incident response for data pipelines.
- Root cause analysis of pipeline failures with comprehensive documentation of issues and resolutions.
- Develop and enforce data engineering coding standards and best practices across development teams.
- Experience with data quality validation testing frameworks and automated data validation processes.
- Knowledge of data governance principles and implementation of data lineage tracking.
- Scheduling and orchestration of data workflows with proper error handling and retry mechanisms.
Mandatory Skills:
Expertise in Data Ingestion and Transformation tools and technologies
- StreamSets
- Spark (python preferable scala good to have)
- Informatica
- Apache Ni-fi
- Highly proficient in Python and SQL
- Has working experience with automated test integration using CI/CD tools
- Gitlab preferable (Jenkins good to have)
MPP DBs like Teradata Greenplum
- OLTP DBs like Oracle SQL Server and Postgres
- In-memory DBs like HANA MemSQL
- Hive HBase and Kudu
Experience:
- Total Exp: 5- 8 years
- Rel Exp: 5 years
TekWissen Group is an equal opportunity employer supporting workforce diversity.