About Tekion:
Positively disrupting an industry that has not seen any innovation in over 50 years Tekion has challenged the paradigm with the first and fastest cloud-native automotive platform that includes the revolutionary Automotive Retail Cloud (ARC) for retailers Automotive Enterprise Cloud (AEC) for manufacturers and other large automotive enterprises and Automotive Partner Cloud (APC) for technology and industry partners. Tekion connects the entire spectrum of the automotive retail ecosystem through one seamless platform. The transformative platform uses cutting-edge technology big data machine learning and AI to seamlessly bring together OEMs retailers/dealers and consumers. With its highly configurable integration and greater customer engagement capabilities Tekion is enabling the best automotive retail experiences ever. Tekion employs close to 3000 people across North America Asia and Europe.
Position Summary
Build andoperatethe production backbone that takes models from Applied Sciences (AS) and delivers reliable low-latency ML services acrossTekionsDMS CRM Digital Retail Service Payments and enterprise pipelines microservices CI/CD observability and runtime reliabilityworkinghand-in-handwith Applied Sciences and Product to turn ideas into measurable dealer and consumer impact
Why this Role Matters
- Accelerate the rollout of LLM-powered and agent-driven features across Tekion products.
- Enable agentic workflows that automate reason and interact on behalf of users and internal stakeholders.
- Operationalize secure compliant and explainable LLM and agentic services at scale.
- Convert Applied Sciences models into scalable compliant costefficient production services.
- Standardize how models are trainedvalidated deployed andmonitoredacrossTekionproducts.
- Power real-time context-aware experiences by integrating batch/stream features graph context and online inference.
What Youll Do
- Turn AppliedSciencesprototypemodels(tabular NLP/LLM recommendation forecasting) into fast reliable services with well-defined API contracts.
- Integrate with the LLM Gateway/MCPprompt/config versioning.
- Build and orchestrate CI/CD pipelines.
- Review data science models; refactor and optimize code; containerize; deploy; version; andmonitorfor quality.
- Collaborate with data scientists data engineers product managers and architects to design enterprise systems.
- Monitor detect and mitigate risks unique to LLMs and agentic systems.
- Implement prompt management: versioning A/B testing guardrails and dynamic orchestration based on feedback and metrics.
- Design batch/stream pipelines (Airflow/Kubeflow Spark/Flink Kafka) and online features linked to our domain graph.
- Build inference microservices (REST/gRPC) with schema versioning structured outputs and stringent p95 latency targets.
- Manage the model/feature lifecycle: feature store strategy model/agent registry versioning and lineage.
- Instrument deep observability: traces/logs/metrics data/feature drift model performance safety signals and cost tracking.
- Ensure real-time reliability: autoscaling caching circuit breakers retries/fallbacks and graceful degradation.
- Develop templates/SDKs/CLIs sandbox datasets and documentation that make shipping ML the default path.
Desired Skills and Experience
- 7-10 years in ML engineering/MLOpsor backend/platform engineering with production ML.
- Experience with LLMs retrieval systems vector stores and graph/knowledge stores.
- Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing.
- Hands-on with orchestration frameworks and libraries (LangChainLlamaIndex OpenAI Function CallingAgentKit etc.).
- Knowledge of agent architectures (reactive planning retrieval-augmented agents) and safe execution patterns.
- Pipelines and data: Airflow/Kubeflow or similar; Spark/Flink; Kafka/Kinesis; strong data quality practices.
- Microservices and runtime: Docker/Kubernetes service meshes REST/gRPC; performance and reliability engineering.
- Model ops: experiment tracking registries () feature stores A/B and shadow testing drift detection.
- Observability:OpenTelemetry/Prometheus/Grafana; debugging latency tailbehavior and memory/CPU hotspots.
- Cloud: AWS preferred (IAM ECS/EKS S3 RDS/DynamoDB Step Functions/Lambda) with cost optimization experience.
- Security/compliance: secrets management RBAC/ABAC PII handling auditability.
Preferred Mindset
- Product-oriented: You measure success by dealer and consumer outcomes not just technical metrics.
- Reliability- and safety-first: You move fast with guardrails rollbacks and clear SLOs.
- Systems thinker: You design for multi-tenant scale portability and cost efficiency.
- Collaborative: You translate between Applied Sciences Product and the Data & AI Platform; you document and teach.
- Pragmatic: You automate the 80% and leave room for rapid experimentation
Perks and Benefits
- Competitive compensation
- Generous stock options
- Medical Insurance coverage
- Work with some of the brightest minds from Silicon Valleys most dominant and successful companies
Tekion is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race religion color national origin gender (including pregnancy childbirth or related medical conditions) sexual orientation gender identity gender expression age status as a protected veteran status as an individual with a disability victim of violence or having a family member who is a victim of violence the intersectionality of two or more protected categories or other applicable legally protected characteristics.
For more information on our privacy practices please refer to our Applicant Privacy Notice here.
Required Experience:
Senior IC
About Tekion:Positively disrupting an industry that has not seen any innovation in over 50 years Tekion has challenged the paradigm with the first and fastest cloud-native automotive platform that includes the revolutionary Automotive Retail Cloud (ARC) for retailers Automotive Enterprise Cloud (AEC...
About Tekion:
Positively disrupting an industry that has not seen any innovation in over 50 years Tekion has challenged the paradigm with the first and fastest cloud-native automotive platform that includes the revolutionary Automotive Retail Cloud (ARC) for retailers Automotive Enterprise Cloud (AEC) for manufacturers and other large automotive enterprises and Automotive Partner Cloud (APC) for technology and industry partners. Tekion connects the entire spectrum of the automotive retail ecosystem through one seamless platform. The transformative platform uses cutting-edge technology big data machine learning and AI to seamlessly bring together OEMs retailers/dealers and consumers. With its highly configurable integration and greater customer engagement capabilities Tekion is enabling the best automotive retail experiences ever. Tekion employs close to 3000 people across North America Asia and Europe.
Position Summary
Build andoperatethe production backbone that takes models from Applied Sciences (AS) and delivers reliable low-latency ML services acrossTekionsDMS CRM Digital Retail Service Payments and enterprise pipelines microservices CI/CD observability and runtime reliabilityworkinghand-in-handwith Applied Sciences and Product to turn ideas into measurable dealer and consumer impact
Why this Role Matters
- Accelerate the rollout of LLM-powered and agent-driven features across Tekion products.
- Enable agentic workflows that automate reason and interact on behalf of users and internal stakeholders.
- Operationalize secure compliant and explainable LLM and agentic services at scale.
- Convert Applied Sciences models into scalable compliant costefficient production services.
- Standardize how models are trainedvalidated deployed andmonitoredacrossTekionproducts.
- Power real-time context-aware experiences by integrating batch/stream features graph context and online inference.
What Youll Do
- Turn AppliedSciencesprototypemodels(tabular NLP/LLM recommendation forecasting) into fast reliable services with well-defined API contracts.
- Integrate with the LLM Gateway/MCPprompt/config versioning.
- Build and orchestrate CI/CD pipelines.
- Review data science models; refactor and optimize code; containerize; deploy; version; andmonitorfor quality.
- Collaborate with data scientists data engineers product managers and architects to design enterprise systems.
- Monitor detect and mitigate risks unique to LLMs and agentic systems.
- Implement prompt management: versioning A/B testing guardrails and dynamic orchestration based on feedback and metrics.
- Design batch/stream pipelines (Airflow/Kubeflow Spark/Flink Kafka) and online features linked to our domain graph.
- Build inference microservices (REST/gRPC) with schema versioning structured outputs and stringent p95 latency targets.
- Manage the model/feature lifecycle: feature store strategy model/agent registry versioning and lineage.
- Instrument deep observability: traces/logs/metrics data/feature drift model performance safety signals and cost tracking.
- Ensure real-time reliability: autoscaling caching circuit breakers retries/fallbacks and graceful degradation.
- Develop templates/SDKs/CLIs sandbox datasets and documentation that make shipping ML the default path.
Desired Skills and Experience
- 7-10 years in ML engineering/MLOpsor backend/platform engineering with production ML.
- Experience with LLMs retrieval systems vector stores and graph/knowledge stores.
- Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing.
- Hands-on with orchestration frameworks and libraries (LangChainLlamaIndex OpenAI Function CallingAgentKit etc.).
- Knowledge of agent architectures (reactive planning retrieval-augmented agents) and safe execution patterns.
- Pipelines and data: Airflow/Kubeflow or similar; Spark/Flink; Kafka/Kinesis; strong data quality practices.
- Microservices and runtime: Docker/Kubernetes service meshes REST/gRPC; performance and reliability engineering.
- Model ops: experiment tracking registries () feature stores A/B and shadow testing drift detection.
- Observability:OpenTelemetry/Prometheus/Grafana; debugging latency tailbehavior and memory/CPU hotspots.
- Cloud: AWS preferred (IAM ECS/EKS S3 RDS/DynamoDB Step Functions/Lambda) with cost optimization experience.
- Security/compliance: secrets management RBAC/ABAC PII handling auditability.
Preferred Mindset
- Product-oriented: You measure success by dealer and consumer outcomes not just technical metrics.
- Reliability- and safety-first: You move fast with guardrails rollbacks and clear SLOs.
- Systems thinker: You design for multi-tenant scale portability and cost efficiency.
- Collaborative: You translate between Applied Sciences Product and the Data & AI Platform; you document and teach.
- Pragmatic: You automate the 80% and leave room for rapid experimentation
Perks and Benefits
- Competitive compensation
- Generous stock options
- Medical Insurance coverage
- Work with some of the brightest minds from Silicon Valleys most dominant and successful companies
Tekion is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race religion color national origin gender (including pregnancy childbirth or related medical conditions) sexual orientation gender identity gender expression age status as a protected veteran status as an individual with a disability victim of violence or having a family member who is a victim of violence the intersectionality of two or more protected categories or other applicable legally protected characteristics.
For more information on our privacy practices please refer to our Applicant Privacy Notice here.
Required Experience:
Senior IC
View more
View less