Member of Technical Staff — TPU Systems (JAX / XLA / PALLAS)

Palo Alto, CA

About the Role

RadixArk is looking for a TPU Systems Engineer to build high-performance inference and training systems using JAX, XLA, and Pallas. You'll push model workloads to their limits on TPU hardware, working on SGLang-JAX and other critical infrastructure that enables efficient deployment of frontier models on Google's tensor processing units.

Requirements

3+ years experience building production ML systems utilizing JAX/Torch, XLA, or TPU-focused frameworks.

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or equivalent industry experience

Deep understanding of XLA internals preferred: HLO, MLIR, operator fusion, SPMD partitioning, and sharding strategies.

Strong performance tuning instincts across compiler and runtime layers

Experience with distributed inference systems (e.g. SGLang, vLLM) or training frameworks (e.g. Miles, Alpa, Pathways)

Proficiency in Python with demonstrated ability to write high-performance, production-quality code

Experience writing custom GPU/TPU/AI Accelerator kernels. Familiarity with Pallas for kernel development is strongly preferred.

Responsibilities

Build high-performance inference and training systems using JAX/XLA/Pallas, including SGLang-JAX

Push large-model workloads to the limits on the newest TPU hardwares

Optimize end-to-end latency and throughput for LLM serving on TPU infrastructure

Design and implement SPMD strategies for efficient distributed inference and training

Design and implement Pallas kernels for operations that require customized low level control for best performance

Profile and optimize XLA compilation pipelines and HLO graph transformations

Collaborate with kernel engineers and compiler teams to achieve performance wins across the stack

Contribute to open-source projects with TPU optimization guides, benchmarks, and architectural insights

About RadixArk

RadixArk is an infrastructure-first company built by enggineers who've shipped production Al systems,created SGLang (20K+ GitHub stars,the fastest open LLM serving engine),and developed Miles(our large-scale RL framework). We're on a mission to democratize frontier-level Al infrastructure by building world-class open systems for inference and training. Our team has optimized kernels serving billions of tokens daily,designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastucture that powers leading Al companies and research labs. We're backed by well-known infrastructure investors and partner with NVIDIA, Google,AWS,and frontier Al labs. Join us in building infrastructure that givees real leverage back to the Al community.

Compensation

We offer competitive compensation for this 1-year residency program, with health benefits and potential for conversion to a full-time role. Compensation is determined by location and prior experience. Strong residents may receive offers to join RadixArk full-time with equity after program completion.

Equal Opportunity

RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

3+ years experience building production ML systems utilizing JAX/Torch, XLA, or TPU-focused frameworks.

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or equivalent industry experience

Deep understanding of XLA internals preferred: HLO, MLIR, operator fusion, SPMD partitioning, and sharding strategies.

Strong performance tuning instincts across compiler and runtime layers

Experience with distributed inference systems (e.g. SGLang, vLLM) or training frameworks (e.g. Miles, Alpa, Pathways)

Proficiency in Python with demonstrated ability to write high-performance, production-quality code

Experience writing custom GPU/TPU/AI Accelerator kernels. Familiarity with Pallas for kernel development is strongly preferred.

Responsibilities

Build high-performance inference and training systems using JAX/XLA/Pallas, including SGLang-JAX

Push large-model workloads to the limits on the newest TPU hardwares

Optimize end-to-end latency and throughput for LLM serving on TPU infrastructure

Design and implement SPMD strategies for efficient distributed inference and training

Design and implement Pallas kernels for operations that require customized low level control for best performance

Profile and optimize XLA compilation pipelines and HLO graph transformations

Collaborate with kernel engineers and compiler teams to achieve performance wins across the stack

Contribute to open-source projects with TPU optimization guides, benchmarks, and architectural insights

About RadixArk

RadixArk is an infrastructure-first company built by enggineers who've shipped production Al systems,created SGLang (20K+ GitHub stars,the fastest open LLM serving engine),and developed Miles(our large-scale RL framework). We're on a mission to democratize frontier-level Al infrastructure by building world-class open systems for inference and training. Our team has optimized kernels serving billions of tokens daily,designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastucture that powers leading Al companies and research labs. We're backed by well-known infrastructure investors and partner with NVIDIA, Google,AWS,and frontier Al labs. Join us in building infrastructure that givees real leverage back to the Al community.

Apply with uptayn.

Sign in free to open the apply link, get this role scored against your CV, and track your application.

uptayn
2026 · built quietly in Berlin.
uptayn = up + attain
Built for
  • Recent business grads
  • Engineers pivoting to ops
  • Consultants → startup
  • Second-job operators
Quiet by default
  • No tracking pixels
  • No LinkedIn login
  • No spam outreach
  • Just roles + your CV