AI Inference Engineer

F5 Networks

Not Interested
Bookmark
Report This Job

profile Job Location:

Dublin - Ireland

profile Monthly Salary: Not Disclosed
Posted on: 18 hours ago
Vacancies: 1 Vacancy

Job Summary

At F5 we strive to bring a better digital world to life. Our teams empower organizations across the globe to create secure and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity from protecting consumers from fraud to enabling companies to focus on innovation.

Everything we do centers around people. That means we obsess over how to make the lives of our customers and their customers better. And it means we prioritize a diverse F5 community where each individual can thrive.

TheAI Inference Engineerplays a critical role in the AI lifecycle by bridging the gap between high-performance model development and optimized deployment environments. This position focuses onoptimizingLarge Language Models (LLMs)for inference serving diverse environmentsfrom GPU-rich data centers to resource-constrained edge deviceswith a strong emphasis on maximizing throughput minimizing latency andmaintainingmodel accuracy.

This role is pivotal in advancing F5s AI capabilities ensuring enterprise-grade reliability byleveraginghardware acceleration designing scalable infrastructure andmonitoringsystem performance.

Key Responsibilities

High-Performance AI Serving

  • Build andmaintainrobust inference engines using tools likevLLMTGI (Text Generation Inference) andNVIDIA Triton ensuring high performance at scale.

  • Handle deployment optimizations to deliver low-latency AI serving solutions for multiple business applications.

Hardware Acceleration and Optimization

  • Profile andoptimizemodels for specialized hardware backends includingNVIDIA GPUs(CUDA/TensorRT)Apple Silicon (CoreML) and AI accelerators likeTPUsandLPUs.

  • Collaborate with hardware teams to maximizeutilizationand performance across various computational environments.

Inference Orchestration and Scalability

  • Design and implementauto-scaling architecturesfor online (real-time) and batch inference pipelinesleveragingKubernetesfor inference routing and orchestration.

  • Ensure software solutions areoptimizedfor peak performance during traffic spikesmaintainingreliability and scalability.

Performance Monitoring and Observability

  • Establish robust observability frameworks tomonitorTime to First Token (TTFT) tokens per second and memory bandwidthutilizationagainst service-level agreements (SLAs).

  • Build and executeperformance and load testing suitestoidentifybottlenecks and ensure consistent reliability at scale.

Technical Requirements

Required Skills:

  • Programming Languages:Proficiencyin programming languages such asPythonCRust orGolangspecifically for high-performance AI workflows.

  • Inference Tools:Proven hands-on experience with tools likevLLMTensorRT andOllamafor inference development and optimization.

  • Infrastructure Expertise:Strong familiarity with infrastructure technologies includingDockerKubernetes and cloud platforms such asAWSGCP andAzure.

  • Hardware Optimization Expertise:Comprehensive understanding of GPU and AI hardware including techniques for profiling andoptimizingperformance for accelerators like NVIDIA GPUs and TPUs.

Preferred Experience:

  • Prior experience deployingLarge Language Models (LLMs)with advanced techniques likeSpeculative DecodingorPagedAttention.

  • Contributions toopen-source inference librariesor hardware-level kernel development (e.g. CUDA Triton kernels).

  • Background inMLOpsorSREroles focused onhigh-performance AI endpointsand reliability during demand surges.

  • Proficiencyin designing scalable solutions for high-throughput inference environmentsoptimizedfor traffic bursts.

Success Metrics (KPIs):

  • Latency Reduction:Continuously improve inference latency metrics ensuring minimalTime to First Token (TTFT)andmaximumtokens per second.

  • Cost Efficiency:Achieve lower Cost per 1K Tokens through better resourceutilizationand hardware optimization.

  • Scalability:Maintainsystem stability and reliability duringtraffic spikes ensuring performance consistency across environments.

  • Throughput Maximization:Deploy modelsoptimizedfor peak hardware usage and maximized process throughput.

Why Join F5

F5 empowers you to push boundaries inAI optimizationandhigh-performance engineering. Joining our team means:

  • Collaborating withcutting-edgetechnologies and hardware solutions to support real-time AI applications.

  • Advancing your career in a fast-paced multidisciplinary environment focused on innovation scalability and problem-solving.

  • Driving transformative projects that deliver real-time AI reliability to global customers whilemaintainingcost and efficiency standards.

  • Working on advancedMLOpssolutionsthat seamlessly scale enterprise AI systems and shape the future of intelligent deployment.

What Success Looks Like:

As anAI Inference Engineerat F5 success is measured by your ability to:

  • Combine technicalexpertiseand problem-solving skills to deliver low-latency scalable and high-performing AI prediction systems.

  • Collaborate efficiently across cross-functional teamsparticipatingin knowledge sharing and system refinement.

  • Demonstrate initiative by driving optimizations across hardware tools and orchestration processes balancing immediate solutions with long-term architectural goals.

  • Translatecomplex AI and inference workflows into practical solutions that align with F5s strategicobjectives.

#LI-AK1

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However the description may not be all-inclusive and responsibilities and requirements are subject to change.

Please note that F5 only contacts candidates through F5 email address (ending with @) or auto email notification from Workday (ending with or @).

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race religion color national origin sex sexual orientation gender identity or expression age sensory physical or mental disability marital status veteran or military status genetic information or any other classification protected by applicable local state or federal laws. This policy applies to all aspects of employment including but not limited to hiring job assignment compensation promotion benefits training discipline and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting .


Required Experience:

IC

At F5 we strive to bring a better digital world to life. Our teams empower organizations across the globe to create secure and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity from protecting consumers from fraud to enabling companies ...
View more view more

About Company

Company Logo

F5 application services ensure that applications are always secure and perform the way they should—in any environment and on any device.

View Profile View Profile