Senior Software Engineer, CentML

Infrastructure

Salary not provided
AWS
Docker
Kubernetes
GCP
Python
Java
Go
Terraform
C++
C
Azure
Mid and Senior level
Remote in US
San Francisco Bay Area
Toronto
CentML

Machine learning training platform

Open for applications

CentML

Machine learning training platform

21-100 employees

B2CArtificial IntelligenceMachine LearningSaaS

Open for applications

Salary not provided
AWS
Docker
Kubernetes
GCP
Python
Java
Go
Terraform
C++
C
Azure
Mid and Senior level
Remote in US
San Francisco Bay Area
Toronto

21-100 employees

B2CArtificial IntelligenceMachine LearningSaaS

Company mission

To pioneer novel technology to enhance computing efficiency, making AI accessible for innovation and to benefit the global community.

Role

Who you are

  • 4+ years of experience working with containerized deployment systems (e.g, kubernetes, openshift, terraform etc.)
  • A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O
  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure
  • Past experience in building GPU clusters for large scale ML training and inference is desirable
  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable
  • Strong coding skills in languages like Python, Java, Go, and/or C/C++

What the job involves

  • Join our team in a key role focused on designing, developing, and maintaining the CentML platform that offers a cost effective infrastructure for serving and training large scale machine learning models
  • Responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters that spans across multiple cloud service providers like AWS, GCP, Azure, Coreweave, and OCI
  • Responsible for leading a team of engineers and building a scalable, performant, and reliable platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer
  • Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications
  • Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster
  • Communicate with our product teams and define new features and goals for improving the CentML platform

Our take

In an increasingly AI and ML-driven world, the demand for these technologies is skyrocketing, alongside their costs, leaving numerous companies without access to tools that could enhance their operations. CentML emerges as a solution, aiming to democratize AI and ML by making them more accessible and cost-effective for all.

Backed by a team with extensive expertise in AI, ML compilers, and ML hardware, CentML possesses a deep understanding of the inefficiencies prevalent in the industry. Among the challenges it addresses is the scarcity of AI chips. By meticulously analyzing clients' AI/ML requirements, CentML advises on suitable hardware options to optimize performance and minimize costs.

With its inception in 2022, CentML has swiftly garnered attention and funding, underscoring the market's appetite for its offerings. Recent funding will enable the company to further refine its product and conduct pivotal research in the field, solidifying its position as a pioneering force in democratizing AI and ML technologies.

Kirsty headshot

Kirsty

Company Specialist

Insights

Some candidates hear
back within 2 weeks

Company

Funding (2 rounds)

Sep 2023

$27.3m

SEED

Jun 2022

$3.5m

SEED

Total funding: $30.8m

Company benefits

  • An open and inclusive culture and work environment
  • Fully stocked kitchen at the office
  • Full health and dental benefits
  • Parental Leave top-up for 6 months
  • Continuous education budget
  • Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask

Company HQ

Yonge-Bay Corridor, Toronto, ON

Leadership

Gennady Pekhimenko

(Co-Founder & CEO)

Also an Associate Professor at the University of Toronto, a Faculty Member at Vector Institute, and a Research Co-Chair at MLCommons.

Shang Wang

(Co-Founder & CTO)

Graduate Student Researcher at Vector Institute. Previously a Senior Software Engineer at Nvidia.

Akbar Nurlybayev

(Co-Founder & COO)

Prior work includes Director of Engineering (Data Platform) at KAR Global and Software Development Manager at TradeRev.

Anand Jayarajan

(Co-Founder)

Previously a Research Fellow at the Indian Institute of Technology, and a Software Development Engineer at Flipkart.

Salary benchmarks

We don't have enough data yet to provide salary benchmarks for this role.

Submit your salary to help other candidates with crowdsourced salary estimates.

Share this job

View 7 more jobs at CentML