Data scientist/software developer to support a proof of- concept demonstration using natural language processing and other machine learning methods to improve the intake process.
This work is critical to demonstrating the potential of the latest technology to improve the lives of children at risk.
The Data Scientist/Developer will be responsible for supporting the development implementation and testing of
statistical models integration of NLP and refinement and testing of the prototype. The data scientist will work closely
with State stakeholders and technical team members to ensure the quality of the results and that the derived methods
are transparent statistically sound relevant and documented.
Key Responsibilities
Current Processes & Technology
o Collectively engage with MDCPS and other team members to understand the current intake process and
outcomes.
o Identify how the State decides to deploy resources based on the intake information.
o Contribute to the identification of shortcomings in the intake process and opportunities to improve outcomes.
Use information from interviews discovery sessions and workshops to identify.
o Identify any internal data sources used in the intake process.
Devise New Intake Approach Using New Technologies
o Based on an understanding of the current intake process and its shortcomings devise and propose an
improved process using natural language processing and other machine learning methods to favorably impact child
outcomes while reducing resources.
o Quantify to the extent possible the impact of the improved process and use of new technology.
Map Anticipated Data Source Changes
o Determine how internal data sources might change with future modifications to core IT systems used by
MDCPS.
o Adjust the proposed intake process to account for any data source changes
Design Review(s)
o Conduct a preliminary and a final design review of an improved intake tool proof-of-concept implementation.
o Include anticipated outcomes from the use of the technology and any differences that may be evident from the
proof-of-concept implementation.
o If an LLM is intended to be used show how the data will be protected.
o Identify the source of the data that will be used in the proof-of-concept implementation. If data from the State is
unavailable describe an alternative approach.
Implementation of Proof-of-Concept
o Create a means of hosting data whether the data is provided by the State simulated or other means.
o Construct a demonstrable prototype application that will illustrate the new technologys impact on children and
State resources.
o Build the prototype application using Python C JAVA and/or SQL or similar language. Use Postgres or a
similar database if needed.
o Integrate the proof-of-concept with the available data source.
o Conduct tests to validate the functionality of the application.
o Validate to the extent possible the impact on children and State resources from using the prototype in a fully
implemented form.
o Seek validation of the applications efficacy from key State stakeholders through one-on-one demonstrations.
Conference Room Demonstration
o During 3-4 days provide a conference room demonstration that shows how the prototype application can
improve child outcomes and reduce State resources.
o Provide stakeholders a hands-on-experience with the application.
Agile Development Process
o Participate in the Agile development process to ensure the success of the project.
Requirement Details:
Bachelors or Masters degree in computer science engineering physics or related field.
Have participated in US Federal Govt data science programs requiring TS/SCI clearance delivering solutions
requiring the combination of geospatial disciplines and pattern of life analysis.
Proven expertise custom developing AI programs from the ground up including but not limited to text
processing and optimized selection and application of multiple LLMs.
Minimum two (2) years of experience designing and implementing machine-learning solutions based on first
principles including developing custom statistical methods without reliance on pre-built libraries.
Minimum academic math background to include full calculus series linear algebra and statistics. Discrete
math advanced statistics and differential equations are a plus.
Knowledge and competence in databases such as Postgres MySQL SQL Server as well as Python C
JAVA React NextJS NodeJS and AWS.
Experience deploying analytic models in pilot or AWS production environments.
Good communication skills with both technical and non-technical people.
Strong understanding of model validation and performance measurement.
Experience deploying advanced analytic solutions in public-sector or regulated environments.