AI Inference Engineer
Company: quadric, Inc
Location: Burlingame
Posted on: February 13, 2026
|
|
|
Job Description:
Job Description Job Description Quadric has created an
innovative general purpose neural processing unit (GPNPU)
architecture. Quadric's co-optimized software and hardware is
targeted to run neural network (NN) inference workloads in a wide
variety of edge and endpoint devices, ranging from battery operated
smart-sensor systems to high-performance automotive or autonomous
vehicle systems. Unlike other NPUs or neural network accelerators
in the industry today that can only accelerate a portion of a
machine learning graph, the Quadric GPNPU executes both NN graph
code and conventional C++ DSP and control code. Role The AI
Inference Engineer in Quadric is the key bridge between the world
of AI/LLM models and Quadric unique platforms. The AI Inference
Engineer at Quadric will [1] port AI models to Quadric platform;
[2] optimize the model deployment for efficient inference; [3]
profile and benchmark the model performance. This senior technical
role demands deep knowledge of AI model algorithms, system
architecture and AI toolchains/frameworks. Responsibilities
Quantize, prune and convert models for deployment Port models to
Quadric platform using Quadric toolchain Optimize inference
deployment for latency, speed Benchmark and profile model
performance and accuracy Collaborate across related areas of the AI
inference stack to support team and business priorities Develop
tools to scale and speed up the deployment Make Improvement to SDK
and runtime Provide technical support and documents to customers
and developer community Requirements Bachelor’s or Master’s in
Computer Science and/or Electric Engineering. 5 years of experience
in AI/LLM model inference and deployment frameworks/tools
experience with model quantization (PTQ, QAT) and tools experience
with model accuracy measures experience with model inference
performance profiling experience with at least one of the following
frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer,
neural-compressor, llamacpp Proficiency in C/C++ and Python
Demonstrate good capability in problem solving, debug and
communication Benefits Health Care Plan (Medical, Dental & Vision)
Retirement Plan (401k, IRA) Life Insurance (Basic, Voluntary &
AD&D) Paid Time Off (Vacation, Sick & Public Holidays) Family
Leave (Maternity, Paternity) Short Term & Long Term Disability
Training & Development Work From Home Free Food & Snacks Stock
Option Plan
Keywords: quadric, Inc, Elk Grove , AI Inference Engineer, IT / Software / Systems , Burlingame, California