Living Talent

Running ML and GPU loads on Native Kubernetes in the Cloud
Effective Resource Optimization: DRA and GPU Fractionalization

Startup (Series A, team of 40)

REMOTE first

Smart, fun, low-ego team culture

Compensation: Base Salary $300k – 360k + CAD, Equity

Key Responsibilities:

Architecture & Development: Kubernetes-based ML/AI optimization platform

Leadership & Collaboration: with C-staff, product management, engineering, and design partners.

Communication: Create detailed architecture diagrams, documents, and presentations.

User Experience Focus: for Infrastructure Admin and MLOps staff.

Open Source Community: Stay actively involved with CNCF and related projects.

Enterprise-Class Solutions: Drive & deliver solutions for enterprise-class data, ML, AI applications.

FinOps & SRE Best Practices: FinOps for cloud financial management, modern SRE practices.

Qualifications:

Entrepreneurial, Startup Experience

10 years+ infrastructure level software architecture and development.

Deep – Native Kubernetes Expertise

Modern Cloud Architecture

Linux, Virtualization platforms (hands-on)

Kubernetes-based ML/AI systems (Kubeflow, Kueue, KServe, GPU Operators, DRA, Karpenter)

Deep knowledge:

ML/AI use cases & customer stories of model development, training, inference

Hardware accelerator usage (CPU, GPU, TPU).

Proven track record of delivering complex distributed systems.

Active involvement in open-source communities – CNCF projects.

Strong leadership and team collaboration skills.

Excellent communication skills, both verbal and written.

Preferred Qualifications:

Knowledge of additional ML/AI frameworks and tools.

Experience in DevOps practices and tools.

Certification in Kubernetes or related technologies.

Awareness of FinOps and SRE best practices

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

Living Talent

You must sign in to apply for this position.