drjobs Senior Lead Data Engineer Spark SQL

Senior Lead Data Engineer Spark SQL

Employer Active

1 Vacancy
The job posting is outdated and position may be filled
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

INR 3400000 - 4500000

Vacancy

1 Vacancy

Job Description

Company: Lifesight
Payroll: Lifesight

Responsibilities
  • Build highly scalable available faulttolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabytesized data warehouse and elasticsearch cluster.
  • Build quality data solutions and refine existing diverse datasets to simplified models encouraging selfservice.
  • Build data pipelines that optimize on data quality and are resilient to poorquality data sources.
  • Own the data mapping business logic transformations and data quality.
  • Lowlevel systems debugging performance measurement & optimization on large production clusters.
  • Participate in architecture discussions influence product roadmap and take ownership and responsibility over new projects.
  • Maintain and support existing platforms and evolve to newer technology stacks and architectures

Mandatory
  • Strong Data Engineering Profile
  • Mandatory (Experience 1 Must have 5 YOE in Data Engineering using data transformation tools like HDFS YARN MapReduce Hive Kafka Spark Airflow Presto etc.
  • Mandatory (Experience 2 Must have worked on large distributed architectures using any of Kafka Spark Hive Hadoop
  • Mandatory (Experience 3 Must have handled projects involving at least 100 GB of data.
  • Mandatory (Core Skills) Expertise in Apache Spark (RDDs data frames Spark tuning) and PySpark.
  • Mandatory (Data Handling) Familiarity with data formats like Parquet Avro and NoSQL databases.
  • Mandatory (Tech Stack) Proficiency with distributed systems and big data technologies such as HDFS YARN Kafka Hive MapReduce Hadoop etc.
  • Mandatory (Company) Midsized Product Companies or Analyticsheavy companies
  • Mandatory (Exclusions 1 Dont want Candidates from large companies (like Walmart McAfee Oracle etc)
  • Mandatory (Exclusions 2 Dont want Candidates from IT Services Companies
Preferred
  • Preferred (Education) Bachelor s degree in Computer Science Engineering or related fields from a Tier 1 or Tier 2 college
Ideal Candidate
  • Proficiency in Python and PySpark.
  • Deep understanding of Apache Spark Spark tuning creating RDDs and building data frames.
  • Experience in big data technologies like HDFS YARN MapReduce Hive Kafka Spark Airflow Presto etc.
  • Experience in building distributed environments using any of Kafka Spark Hive Hadoop etc.
  • Good understanding of the architecture and functioning of distributed database systems.
  • Experience working with various file formats like Parquet Avro etc. for large volumes of data.
  • Experience with one or more NoSQL databases.
  • Experience with AWS GCP.
  • 5 years of professional experience as a data or software engineer.

nosql databases,spark,kafka,airflow,avro,pyspark,parquet,presto,hive,sql,data engineering,apache spark,distributed systems,map-reduce,yarn,hdfs,hadoop

Employment Type

Full Time

Company Industry

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.