Machine Learning Platform Engineer, Stability AI

$130-190k

+ Stock options

AWS
Docker
Kubernetes
TypeScript
Python
Azure
Senior level
Remote in US
Stability AI

Open source generative AI company

Be an early applicant

Stability AI

Open source generative AI company

101-200 employees

B2CB2BArtificial IntelligenceDeep TechMachine LearningAPI

Be an early applicant

$130-190k

+ Stock options

AWS
Docker
Kubernetes
TypeScript
Python
Azure
Senior level
Remote in US

101-200 employees

B2CB2BArtificial IntelligenceDeep TechMachine LearningAPI

Company mission

To build the foundation to activate humanity's potential.

Role

Who you are

  • 8 years of experience in cloud computing, API development, and a deep understanding of High-Performance Computing environments, particularly in an AWS setting
  • Strong knowledge of HPC cluster management and job scheduling with Slurm and AWS HyperPod
  • Proficiency in programming languages such as Python and Typescript, essential for API development and integration within AWS and/or Cloudflare worker environments
  • Demonstrated expertise in API design, implementation, and maintenance, ensuring security and performance best practices within AWS and Cloudflare
  • Knowledge of containerization technologies (e.g., Docker, Kubernetes) for deployment of APIs within AWS, Cloudflare, and HPC systems
  • Experience with automating CI/CD pipelines
  • Familiarity with authentication and authorization protocols (e.g., OAuth, JWT) to ensure secure data exchange between AWS, Cloudflare, and HPC environments
  • Strong problem-solving skills and the ability to troubleshoot complex issues related to API integrations in a hybrid cloud-HPC setup, particularly in AWS and Cloudflare environments
  • Excellent communication and collaboration skills to work effectively with diverse teams and stakeholders in AWS and Cloudflare ecosystems

What the job involves

  • We are currently looking for a skilled Sr. ML Platform Engineer with specialized focus on Cloud Infrastructure that includes API development to facilitate seamless integration and interaction between cloud-based services and High-Performance Computing (HPC) environments
  • The successful candidate will play a pivotal role in designing and implementing APIs that enable efficient communication and data exchange between cloud platforms and HPC systems
  • Design, develop, and maintain robust APIs that facilitate communication and data exchange between cloud-based services, particularly AWS, and HPC environments
  • Collaborate with cross-functional teams to understand the unique requirements of both cloud based services and HPC systems, ensuring that the APIs developed meet the specific needs of these environments
  • Implement best practices for API design, including security, scalability, and performance optimization to ensure efficient interaction between cloud services and HPC clusters
  • Utilize services such as Cloudflare to enhance API performance, security, and reliability in the cloud-to-HPC communication, optimizing for speed and resilience
  • Work closely with HPC engineers to identify and address integration challenges, striving for seamless connectivity between diverse systems and cloud-based platforms
  • Drive innovation by proposing and implementing new API strategies, enhancing the efficiency and functionality of data exchange between AWS, Cloudflare workers, on-premise HPC environments
  • Create comprehensive documentation and provide training to internal teams on the use and integration of developed APIs, focusing on AWS and Cloudflare environments
  • Monitor API performance and address issues related to data transfer, ensuring reliability and consistent operation between AWS, Cloudflare, and HPC systems (Slurm/AWS HyperPod)
  • Collaborate with the security team to ensure that the APIs comply with industry standards and best practices for data privacy and protection, especially in AWS and Cloudflare environments
  • Participating in incident management and root cause analysis to improve system reliability
  • Build containers with REST APIs for Gen AI functionality and host them on AWS and Azure

Our take

Stability AI is an AI-driven visual art company that designs and implements an open AI tool that creates images based on text input. Through collective intelligence and augmented technology, Stability AI is helping research communities develop cutting-edge AI models for image, language, audio, video, 3D content, biotech and other scientific research.

Working in partnership with Amazon Web Services, Stability AI's latest release, Stable Diffusion is a text-to-image model that utilizes the Ezra-1 UltraCluster (the world's fifth-largest supercomputer). The model quickly proved to be a success, with four of the top ten applications on Apple's App Store being powered by Stable Diffusion after just one month.

Alongside Stable Diffusion, Stability AI offers premium imagining application, DreamStudio, as well as externally built products Lensa, Wonder and NightCafe, which have too amassed great success, with over 40M users and counting. The company is dedicated to investing in more supercomputing power in order to further accelerate its product offerings and is looking to expand its workforce, expecting to grow to over 300 employees by the end of the year.

Freddie headshot

Freddie

Company Specialist

Insights

Top investors

Some candidates hear
back within 2 weeks

324% employee growth in 12 months

Company

Funding (last 2 of 3 rounds)

Jun 2024

$80m

EARLY VC

Oct 2023

$50m

CONVERTIBLE

Total funding: $231m

Company values

  • Pragmatic and Impact-Driven: We are committed to finding practical solutions that make a positive difference.
  • Bold: We constantly push the boundaries of what's possible to create meaningful change and drive progress.
  • Collaborative: We believe that working with our partners, customers, and communities can achieve far more than we could alone.
  • Innovative: We are always exploring new ideas, experimenting with new technologies, and looking for fresh ways to tackle the world's most pressing challenges.
  • Ambitious: We never settle for the status quo. We constantly strive to do better, achieve more, and make a bigger impact in the world.
  • Transparent and Trustworthy: We are committed to being open and honest in all our interactions and building long-term partnerships with our customers, employees, and stakeholders based on mutual respect and trust.

Company HQ

Westminster, London, UK

Leadership

Emad Mostaque

(CEO)

Graduated from Oxford University with a master's degree in mathematics and computer science. Founder of Symmitree, a startup aimed at reducing the cost of technology.

Share this job

View 9 more jobs at Stability AI