Staff Machine Learning Engineer, LLM Fine Tuning (VerilogRTL Applications)

Not Interested
Bookmark
Report This Job

profile Job Location:

San Jose, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Staff Machine Learning Engineer LLM FineTuning (Verilog/RTL Applications)

HIGHLIGHTS

Location:San Jose CA (Onsite/Hybrid)

Schedule: Full Time
Position Type:Contract
Hourly: BOE


Overview:

Our client is building privacypreserving LLM capabilities that help hardware design teams reason over Verilog/SystemVerilog and RTL artifactscode generation refactoring lint explanation constraint translation and spectoRTL assistance. Our client is looking for a Stafflevel engineer to technically lead a small highleverage team that finetunes and productizes LLMs for these workflows in a strict enterprise dataprivacy environment.

You dont need to be a Verilog/RTL expert to start; curiosity drive and deep LLM craftsmanship matter most. Any HDL/EDA fluency is a strong plus.

What youll do (Responsibilities)

  • Own the technical roadmap for Verilog/RTLfocused LLM capabilitiesfrom model selection and adaptation to evaluation deployment and continuous improvement.

  • Lead a handson team of applied scientists/engineers: set direction unblock technically review designs/code and raise the bar on experimentation velocity and reliability.

  • Finetune and customize models using stateoftheart techniques (LoRA/QLoRA PEFT instruction tuning preference optimization/RLAIF) with robust HDLspecific evals:

    • Compile/lint/simulatebased pass rates for code generation constrained decoding to enforce syntax and doesitsynthesize checks.

  • Design privacyfirst ML pipelines on AWS:

    • Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS KServe/Triton/DJL) for bespoke training needs.

    • Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints) IAM leastprivilege CloudTrail auditing and Secrets Manager for credentials.

    • Enforce encryption in transit/at rest data minimization no public egress for customer/RTL corpora.

  • Stand up dependable model serving: Bedrock model invocation where it fits and/or lowlatency selfhosted inference (vLLM/TensorRTLLM) autoscaling and canary/bluegreen rollouts.

  • Build an evaluation culture: automatic regression suites that run HDL compilers/simulators measure behavioral fidelity and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).

  • Partner deeply with hardware design CAD/EDA Security and Legal to source/prepare datasets (anonymization redaction licensing) define acceptance gates and meet compliance requirements.

  • Drive productization: integrate LLMs with internal developer tools (IDEs/plugins code review bots CI) retrieval (RAG) over internal HDL repos/specs and safe tooluse/functioncalling.

  • Mentor & uplevel: coach ICs on LLM best practices reproducible training critical paper reading and building securebydefault systems.

What youll bring (Minimum qualifications)

  • 10 years total engineering experience with 5 years in ML/AI or largescale distributed systems; 3 years working directly with transformers/LLMs.

  • Proven track record shipping LLMpowered features in production and leading ambiguous crossfunctional initiatives at Staff level.

  • Deep handson skill with PyTorch Hugging Face Transformers/PEFT/TRL distributed training (DeepSpeed/FSDP) quantizationaware finetuning (LoRA/QLoRA) and constrained/grammarguided decoding.

  • AWS expertise to design and defend secure enterprise deployments including:

    • Amazon Bedrock (model selection Anthropic model usage model customization Guardrails Knowledge Bases Bedrock runtime APIs VPC endpoints)

    • SageMaker (Training Inference Pipelines) S3 EC2/EKS/ECR VPC/Subnets/Security Groups IAM KMS PrivateLink CloudWatch/CloudTrail Step Functions Batch Secrets Manager.

  • Strong software engineering fundamentals: testing CI/CD observability performance tuning; Python a must (bonus for Go/Java/C).

  • Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.

Nice to have (Preferred qualifications)

  • Familiarity with Verilog/SystemVerilog/RTL workflows: lint synthesis timing closure simulation formal test benches and EDA tools (Synopsys/Cadence/Mentor).

  • Experience integrating static analysis/ASTaware tokenization for code models or grammarconstrained decoding.

  • RAG at scale over code/specs (vector stores chunking strategies) tooluse/functioncalling for code transformation.

  • Inference optimization: TensorRTLLM KVcache optimization speculative decoding; throughput/latency tradeoffs at batch and token levels.

  • Model governance/safety in the enterprise: model cards redteaming secure eval data handling; exposure to SOC2/ISO 27001/NIST frameworks.

  • Data anonymization DLP scanning and code deidentification to protect IP.

What success looks like

90 days

  • Baseline an HDLaware eval harness that compiles/simulates; establish secure AWS training & serving environments (VPConly KMSbacked no public egress).

  • Ship an initial finetuned/customized model with measurable gains vs. base (e.g. X% compilepass rate Y% lint findings per K LOC generated).

180 days

  • Expand customization/training coverage (Bedrock for managed FMs including Anthropic; SageMaker/EKS for bespoke/open models).

  • Add constrained decoding retrieval over internal design specs; productionize inference with SLOs (p95 latency availability) and audited rollout to pilot hardware teams.

12 months

  • Demonstrably reduce review/iteration cycles for RTL tasks with clear metrics (defect reduction timetolintclean % autofix suggestions accepted) and a stable MLOps path for continuous improvement.

(Security & privacy by design)

  • Customer and internal design data remain within private AWS VPCs; access via IAM roles and audited by CloudTrail; all artifacts encrypted with KMS.

  • No public internet calls for sensitive workloads; Bedrock access via VPC interface endpoints/PrivateLink with endpoint policies; SageMaker and/or EKS run in private subnets.

  • Data pipelines enforce minimization tagging retention windows and reproducibility; DLP scanning and redaction are firstclass steps.

  • We produce model cards data lineage and evaluation artifacts for every release.

Tech youll touch

  • Modeling: PyTorch HF Transformers/PEFT/TRL DeepSpeed/FSDP vLLM TensorRTLLM

  • AWS & MLOps: Amazon Bedrock (Anthropic and other FMs Guardrails Knowledge Bases Runtime APIs) SageMaker (Training/Inference/Pipelines) MLflow/W&B ECR EKS/KServe/Triton Step Functions

  • Platform/Security: S3 KMS IAM VPC/PrivateLink (incl. Bedrock) CloudWatch/CloudTrail Secrets Manager

Tooling (nice to have):

  • HDL toolchains for compile/simulate/lint vector stores (pgvector/OpenSearch) GitHub/GitLab CI


We are GTN The Go To Network

Staff Machine Learning Engineer LLM FineTuning (Verilog/RTL Applications)HIGHLIGHTSLocation:San Jose CA (Onsite/Hybrid)Schedule: Full Time Position Type:Contract Hourly: BOEOverview: Our client is building privacypreserving LLM capabilities that help hardware design teams reason over Verilog/SystemV...
View more view more

Key Skills

  • Aerospace Engineering
  • Anti Money Laundering
  • Electrical Installation
  • Desktop Support
  • Corporate Marketing
  • Fabrication