Data Engineer AI
Marshall County, WV - USA
Job Summary
Data Engineer - AI
Dallas TX (preferred) Hybrid (Bishop Arts preferred) Full-time
Reports to the Founding AI / Engineering Team
Why this role exists
Our Client is an AI-powered contract intelligence platform that validates purchased services invoices against contract terms before payment turning contracts into enforceable controls within healthcare procure-to-pay workflows .
The platform processes massive volumes of contracts invoices vendor records and transactional data. A single enterprise customer may generate over 30000 invoice and contract-related documents monthly all requiring ingestion extraction normalization validation monitoring and analytics.
The companys founding engineering team is currently focused on building higher-level AI systems semantic layers ontology frameworks and enterprise-scale platform architecture. This role exists to own the implementation and operationalization layer underneath that vision building and maintaining the pipelines reporting systems integrations and scalable data infrastructure that allow the platform to operate reliably at enterprise scale.
This is not a pure analytics role and not a pure research role. It is a hands-on engineering role for someone who can build production-grade data pipelines while also understanding how modern AI ML LLM and knowledge graph systems operate.
If you enjoy building scalable data systems handling messy enterprise data operationalizing AI pipelines and creating infrastructure that powers enterprise SaaS products this role will feel like a strong fit.
What youll own
Enterprise Data Pipeline Engineering
- Build maintain and optimize large-scale ETL/ELT pipelines for contracts invoices logs traces events and operational data
- Support enterprise-scale ingestion and processing workflows for healthcare procurement and AP data
- Design resilient streaming and batch processing systems
- Help operationalize the platform for enterprise-grade customer workloads
- Improve pipeline reliability observability scalability and monitoring
- Support distributed data processing workflows across large document and transactional datasets
Reporting Operational Analytics
- Build internal and customer-facing reporting systems showing document processing status validation outcomes exceptions and operational insights
- Create dashboards and analytics layers that provide actionable insights from invoice and contract data
- Develop ad hoc reporting capabilities for founders GTM teams customers and investors
- Help identify trends gaps anomalies and operational patterns across purchased services spend
- Translate raw platform data into usable operational intelligence
AI ML Data Infrastructure
- Support AI and ML pipelines powering contract intelligence and invoice validation workflows
- Build infrastructure supporting LLM ML and semantic data workflows
- Work alongside engineers building ontology layers semantic layers and knowledge graph systems
- Help structure and operationalize datasets for AI-driven applications
- Support vector database semantic retrieval and modern AI architecture workflows
- Understand how data flows through MLOps and LLMOps environments
Platform Data Foundations
Help maintain and improve the companys core data architecture
Support enterprise-grade logging tracing monitoring and event management systems
Build scalable data lake and storage workflows
Improve system reliability and operational visibility as customer scale increases
Collaborate closely with AI engineers and platform leadership on implementation and execution.
What Success Looks Like (First 90 Days)
First 45 Days
- Ramp quickly on the AI platform pipeline architecture and customer workflows
- Understand how contracts invoices validation systems and analytics pipelines interact
- Identify gaps in pipeline reliability reporting and data quality
- Begin contributing production-ready improvements to core pipelines and operational systems
By 90 Days
- Core reporting and analytics workflows are operational and scalable
- Enterprise pipeline reliability and monitoring improve measurably
- Data quality and processing visibility improve across customer workflows
- Internal teams can access cleaner operational reporting and analytics
- Founders and customer-facing teams can generate custom reporting more efficiently
- AI and semantic systems receive more reliable and structured downstream data
- You are independently building and maintaining production data workflows with minimal oversight
The profile that tends to win here
You are first and foremost a strong engineer who can build and maintain production systems
You have experience working with enterprise-scale or mid-market data environments not only early-stage startups
Youve worked with large-scale transactional operational or machine-generated datasets
You understand modern AI/ML ecosystems well enough to support them operationally
You are comfortable dealing with ambiguity and evolving infrastructure
You think systematically about scalability reliability and maintainability
You can move comfortably between infrastructure pipelines analytics and operational engineering
You are highly analytical and naturally curious about patterns anomalies and data quality
You move quickly fail fast and care deeply about accuracy and operational quality
Qualifications
- 48 years of experience in Data Engineering Platform Engineering or Backend/Data Infrastructure roles
- Strong experience building ETL/ELT pipelines in production environments
- Experience with distributed data processing systems
- Experience handling streaming and batch data workflows
- Strong SQL and Python skills
- Experience with modern cloud infrastructure (AWS GCP or Azure)
- Experience working with data lakes and large-scale operational datasets
- Experience handling logs traces events and telemetry-style data
- Familiarity with ML pipelines vector databases or modern AI data architectures
- Understanding of MLOps and/or LLMOps concepts
- Experience building reporting systems dashboards and operational analytics workflows
- Comfortable working in fast-moving startup environments with evolving requirements
Strongly Preferred:
- Experience supporting AI/LLM-driven products
- Exposure to knowledge graphs semantic layers or ontology systems
- Experience in enterprise SaaS environments
- Experience with observability and monitoring tooling
- Familiarity with healthcare procurement AP automation or invoice processing systems
- Experience building customer-facing analytics systems
- Experience supporting high-volume document processing systems
Compensation benefits
- Competitive base salary variable comp potential for future equity
- Opportunity to help build foundational infrastructure at an early-stage AI company
- High ownership and direct technical impact
- Flexible and remote-friendly environment
- Opportunity to work on cutting-edge AI enterprise data infrastructure problems.