Software Engineer, Anyscale

Model Training Infrastructure

$170.1-237k

+ Stock Options

AWS
Kubernetes
GCP
Tensorflow
PyTorch
Senior and Expert level
San Francisco Bay Area

3+ days a week in office

Anyscale

Open-source framework for AI software development

Open for applications

Anyscale

Open-source framework for AI software development

201-500 employees

B2BArtificial IntelligenceEnterpriseMachine LearningSaaS

Open for applications

$170.1-237k

+ Stock Options

AWS
Kubernetes
GCP
Tensorflow
PyTorch
Senior and Expert level
San Francisco Bay Area

3+ days a week in office

201-500 employees

B2BArtificial IntelligenceEnterpriseMachine LearningSaaS

Company mission

To remove distributed systems expertise from the critical path of realizing the business potential of AI.

Role

Who you are

  • We’re particularly interested in engineers who can help shape and execute a vision for the future of ML training infrastructure
  • We welcome both Individual Contributors and technically inclined individuals with experience managing small teams
  • Minimum 5+ years of experience building, scaling, and maintaining software systems in production environments
  • Strong fundamentals in algorithms, data structures, and system design
  • Proficiency with machine learning frameworks and libraries (e.g., PyTorch, TensorFlow, XGBoost)
  • Experience designing fault-tolerant distributed systems
  • Solid architectural skills

Desirable

  • Experience with cloud technologies (AWS, GCP, Kubernetes)
  • Hands-on experience building ML training platforms in production
  • Background in managing and maintaining open-source libraries
  • Experience leading small teams to achieve ambitious technical goals
  • Familiarity with Ray

What the job involves

  • We’re looking for passionate, motivated engineers excited to build infrastructure and tools for the next generation of machine learning applications
  • We’re hiring exceptional Software Engineers for our distributed training team, which develops and maintains widely adopted open-source machine learning libraries
  • The Distributed Training team drives the development and optimization of Ray’s distributed training libraries, focusing on features and performance enhancements for large-scale machine learning workloads
  • They are responsible for building and maintaining core libraries like Ray Train (for distributed model training) and Ray Tune (for distributed hyperparameter tuning)
  • You’ll collaborate closely with the Ray Core and Ray Data teams to create impactful, end-to-end solutions, and have the exciting opportunity to work directly with Machine Learning teams around the globe, shaping products that are transforming the AI landscape
  • Develop scalable, fault-tolerant distributed machine learning libraries that power leading ML platforms
  • Create an exceptional end-to-end experience for training machine learning models
  • Solve complex architectural challenges and transform them into practical solutions
  • Contribute to and engage with the open-source community, collaborating with ML researchers, engineers, and data scientists to build new scalable machine learning abstractions
  • Share your work and expertise with a broader audience through talks, tutorials, and blog posts
  • Collaborate with a team of experts in distributed systems and machine learning
  • Work directly with end-users to iterate on and enhance the product based on their feedback
  • Partner with engineering and product managers to nurture a talented team of software engineers
  • Play a key role in building and shaping a world-class company

Share this job

View 21 more jobs at Anyscale

Insights

Top investors

133% employee growth in 12 months

Company

Company benefits

  • We offer a wide range of health, dental, and vision coverage options for you and your family — including many that are 99% covered by Anyscale
  • Whether you’re hopping on public transit or driving in yourself, Anyscale covers a portion of your commuting costs each month
  • Lunch is served every day in our San Francisco office. Dinner, too, if you’re ever working late. And did we mention daily boba runs?
  • Give back to the causes and communities you love with paid time off specifically for volunteer work
  • Anyscalers can take advantage of 12 weeks of paid leave following the arrival of a new little one
  • Paid time off at Anyscale is flexible and unlimited. We encourage all Anyscalers to rest up and recharge when needed

Funding (last 2 of 4 rounds)

Aug 2022

$99m

SERIES C

Dec 2021

$100m

SERIES C

Total funding: $259.6m

Our take

AnyScale is a platform facilitating the development of distributed applications with high computing demands on its Ray platform. It answers an increasing requirement for higher-level tech strategies among companies which are incorporating AI and machine learning into their products, and a similarly broad and growing demand for easier access to cloud programming.

AnyScale launched its first commercial offering in 2021, and intends to continue developing products that will lower the skill and resource threshold for cloud programming. In 2023 it released Aviary, a project to simplify open-source large language model (LLM) deployment. What is promising, and lucrative, about its proposition is that it offers a general-use distributed system, this sidesteps the cumbersome process of stitching together disparate distributed systems, previously a requirement for novel applications. It is currently finding use in a range of applications such as supply chain, environmental restoration and retail by organisations including Amazon and Uber.

Anyscape has raised considerable funding led by Andreessen Horowitz and Addition. This is being used to scale its team and further develop the Ray platform. As the demand for AI apps rapidly increases, Anyscape is poised to disrupt a $13 trillion market.

Kirsty headshot

Kirsty

Company Specialist at Welcome to the Jungle