Description:
Specific Title of the Position: Data Scientist - Databricks/PySpark
Work Location: Remote/Telecommute
Work Hours: Estimated 8 hours/day with flexible hours to accommodate remote work arrangements (e.g. 9am-5pm EST but not strictly enforced).
Position Background and Business Impact:
The Data Scientist - Databricks/PySpark will play a critical role in driving business growth and improving operational efficiency by developing and deploying data-driven solutions using Databricks PySpark and other related technologies. This position will accomplish the following for the business:
- Develop and deploy scalable data pipelines and machine learning models to drive business insights and decision-making.
- Improve data quality and consistency through data cleansing and feature engineering techniques.
- Collaborate with cross-functional teams to identify business problems and develop data-driven solutions.
- Team Description: The Data Scientist - Databricks/PySpark will be working with a team of 8-10 data scientists and engineers with diverse skill sets including:
- Data engineering (Databricks PySpark SQL)
- Machine learning (Python R TensorFlow PyTorch)
- Data analysis and visualization (Tableau Power BI)
- Business acumen (healthcare industry knowledge business operations)
- The team culture is collaborative dynamic and focused on delivering high-quality results.
Top 5-10 Responsibilities:
- Develop and deploy scalable data pipelines using Databricks and PySpark.
- Design and implement machine learning models using Python and related libraries (e.g. Scikit-learn TensorFlow).
- Collaborate with cross-functional teams to identify business problems and develop data-driven solutions.
- Perform data cleansing and feature engineering to improve data quality and consistency.
- Develop and maintain technical documentation for data pipelines and machine learning models.
- Work with stakeholders to identify and prioritize data-driven projects.
- Develop and deploy data visualizations to communicate insights to business stakeholders.
- Stay up-to-date with emerging trends and technologies in data science and machine learning.
- Collaborate with data engineers to ensure data quality and consistency.
- Participate in code reviews and contribute to the development of best practices for data science and engineering.
Ideal Candidate Background:
- 5 years of experience in data science or a related field.
- Healthcare industry experience is a plus but not required.
- Strong background in data engineering (Databricks PySpark SQL).
- Experience with machine learning (Python R TensorFlow PyTorch).
- Required Skills/Attributes:
- 3 years of experience with Databricks and PySpark.
- 2 years of experience with Python and related libraries (e.g. Scikit-learn TensorFlow).
- 2 years of experience with SQL and data warehousing.
- Strong understanding of data cleansing and feature engineering techniques.
- Experience with machine learning model development and deployment.
- Excellent communication and collaboration skills.
Preferred Skills/Attributes:
- Experience with cloud-based data platforms (Azure).
- Experience with Agile development methodologies.
- Certification in data science or a related field (e.g. Certified Data Scientist).
- Professional License or Certification: Not required but certifications like Certified Data Scientist or Certified Analytics Professional are a plus.
Required Skills : Data Analysis
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check : No
Drug Screen : No