This is a remote position.
What the engineer will actually do:
- P1 Build and schedule Python parsers that extract structured JSON from PowerPoint PDF and Excel documents then land the data in Databricks Bronze Silver tables.
- P1 Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
- P2 Add basic text embedding or LLM based entity extraction (LangChain or open source transformers) to enrich the document feed.
- P3 Write unit tests and lightweight data quality checks (Great Expectations) so parsing errors do not break the pipeline.
- P3 Produce concise handover docs for our future data architect.
Must have (core):
- 2 4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet Spark SQL Airflow or similar).
- Solid Python (pandas PySpark) and experience parsing Office files with libraries such as python pptx openpyxl pdfplumber or PyPDF.
- Basic SQL tuning and ability to work with structured schemas.
- Git and CI/CD familiarity.
Nice to have (bonus):
- Exposure to LangChain Hugging Face Transformer or any LLM inference workflow.
- Experience adding embeddings to tables for downstream ML or search.
- Great Expectations or similar data quality tooling.
- Familiarity with Unity Catalog or Snowflake RBAC concepts.
Qualifications: Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience). Proven experience as a Full Stack Engineer, with strong proficiency in both Python and React. Solid understanding of frontend technologies including HTML, CSS, and JavaScript, and experience with modern frontend frameworks/libraries. Expertise in designing and building RESTful APIs and backend services using Python frameworks like Django or Flask. Familiarity with database management systems, both SQL and NoSQL. Experience working in an Agile development environment, collaborating with cross-functional teams. Excellent communication skills and the ability to effectively convey technical concepts to both technical and non-technical team members. A portfolio of past projects showcasing your technical skills and contributions is a plus.