Data Scientist – Python & PySpark

Salt Lake, UT - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Remote Job

Must Have Technical/Functional Skills:

7-10 years hands-on with Python for machine learning especially XGBoost scikit-learn and NumPy/pandas.

Proficiency in PySpark for reading transforming and analyzing large datasets stored in parquet.

Experience in validating or reverse engineering ML models from business logic or legacy implementation.

Exposure to Java-based ML libraries or understanding of how internals map across languages.

Hands-on with Python frameworks for meta-modelling libraries.

Roles & Responsibilities:

Interpret data transformation logic and validate feature pipelines from existing Java implementations.

Run Python-converted models on historical datasets and validate output metrics against Java model benchmarks.

Collaborate with model validation teams to review performance consistency and explain metric deviations if any.

Design unit tests and validation scenarios to support each migrated model s readiness for signoff.

Ingest model input data from parquet files using PySpark and pandas to reproduce training and scoring workflows.

Conduct EDA and spot-check row-level predictions where needed Collaborate with the customer team to

understand the logic structure and parameters of the Java-based XGBoost models.

Salary :00/Per Annum

Remote Job Must Have Technical/Functional Skills: 7-10 years hands-on with Python for machine learning especially XGBoost scikit-learn and NumPy/pandas. Proficiency in PySpark for reading transforming and analyzing large datasets stored in parquet. Experience in validating or r...