Senior Site Reliability Engineering, Onehouse

$150-220k

+ Equity

AWS
Kubernetes
GCP
Kafka
Go
Terraform
ELK
Jenkins
Azure
Spark
Prometheus
Senior and Expert level
San Francisco Bay Area

Office located in Sunnyvale, CA

Onehouse

Pre-built data lakehouse foundation

Open for applications

Onehouse

Pre-built data lakehouse foundation

21-100 employees

B2BEnterpriseBig dataSaaSData AnalysisCloud Computing

Open for applications

$150-220k

+ Equity

AWS
Kubernetes
GCP
Kafka
Go
Terraform
ELK
Jenkins
Azure
Spark
Prometheus
Senior and Expert level
San Francisco Bay Area

Office located in Sunnyvale, CA

21-100 employees

B2BEnterpriseBig dataSaaSData AnalysisCloud Computing

Company mission

To aid companies of all sizes in supercharging their data engineering/data science, by automating painful data infrastructure buildout.

Role

Who you are

  • Bachelor's degree in Computer Science or related field
  • 7+ years of experience in software engineering or SRE roles, with a focus on large scale distributed systems
  • Strong coding skills in at least one programming language, such as Java, Python, or Go
  • Strong conviction in software development best practices, including version control, automated testing, and continuous integration and delivery
  • Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems
  • Experience with managing kubernetes clusters and applications at scale
  • Experience deploying applications on one or more cloud platforms such as AWS, Google Cloud Platform or Microsoft Azure
  • Experience defining and owning reliability focussed systems and processes (e.g. Incident Management, Post-mortem)
  • Experience with software development related compliance processes (e.g. Soc2, FedRAMP)
  • Experience with the following tech stack:
  • Infrastructure-as-code (e.g. Terraform, Cloudformation)
  • Automation frameworks (e.g. Jenkins, CircleCI)
  • Monitoring stacks (e.g. Prometheus and ELK)
  • Cloud security management (e.g IAM, SSO)
  • Data processing technologies like Spark

What the job involves

  • When you join Onehouse, you're joining a team of passionate professionals tackling the deeply technical challenges of building a 2-sided engineering product
  • Our engineering team serves as the bridge between the worlds of open source and enterprise: contributing directly to and growing Apache Hudi (already used at scale by global enterprises like Uber, Amazon, ByteDance etc) and concurrently defining a new industry category - the transactional data lake
  • The Reliability Engineering team is the glue that binds all of this together
  • You will be responsible for developing and maintaining the tools and systems that enable our engineering teams to operate our services reliably and at scale
  • You will closely cross functionally partner with our engineering teams to ensure our services are able to scale with our growing business
  • At Onehouse, you will own our entire live production infrastructure and operational posture to run massive data systems at scale
  • Ensure our services remain resilient by identifying opportunities for improvement and drive their implementation
  • Identify opportunities to improve our overall operational efficiency and growing by owning the modern tools in our cloud-only operation and our practices for proactive automation, monitoring and response
  • Acting as a mentor to guide cross-functional teams during crisis situations and ensure timely resolution, minimizing the impact on our customers and business
  • Build and own our reliability engineering practice from the ground up, owning our entire production infrastructure and operational posture
  • Establish a culture of reliability across engineering by providing a comprehensive incident management platform that is being used for instrumentation, operability, and around incidents
  • Design, implement and maintain new services, tools, and monitoring to support service reliability and alerting
  • Serve as an active member of our SRE team, responding to and managing high severity incidents or any situations concerning the wellbeing and continuous operation of our mission-critical systems
  • Collaborate with your stakeholders across engineering teams to ensure continuous adoption of best practices, rollout scenarios for the space, and that services are designed with reliability in mind
  • Continuously analyze and evaluate the tradeoffs of the existing designs and make recommendations based on new technologies and industry best practices
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health through an intimate understanding of how the critical parts of our site work
  • Contribute to better incident management posture and retrospectives, driving improvements in our overall reliability and incident response time as well as on-call runbooks and post-mortem reports
  • Drive our compliance posture; ensuring that all our products and processes comply with relevant regulations and standards, especially during compliance audits

Otta's take

Xav Kearney headshot

Xav Kearney

CTO of Otta

Managing the ballooning volume of unstructured data is becoming a tough task for enterprise companies. The traditional solution, data lakes, doesn’t offer management or transaction capabilities. This lack of oversight could lead to data violations, that are becoming more costly as regulations tighten. Onehouse is catering to the growing number of businesses opting for an alternative, the so-called ‘data lakehouse’. It's hybrid architecture that offers the management and transaction capabilities of a warehouse, with the cost-effectiveness of a data lake.

The Onehouse platform is a management plane that helps businesses set up a data lakehouse without having to invest the time and expertise in building one from scratch. With an open data format, it can be used to work with protected or sensitive data; it also allows companies to easily pull their data from Onehouse without egress fees if they decide to leave the service.

Onehouse has carved out an astute market niche for itself: businesses under increasingly close scrutiny, but with ballooning data pools, who don’t need to build out highly customized lakehouses. For the moment, this tends to be top-tier enterprises, which is how Onehouse has secure deep pocketed clients like Walmart, Amazon, Zendesk, and Uber. As the first company to make fully managed data lakes possible, it is no surprise that it has received substantial funding. This will allow it to continue advancing the platform and grow its team to meet market demand.

Insights

Company

Funding (last 2 of 3 rounds)

Jun 2024

$35m

SERIES B

Feb 2023

$25m

SERIES A

Total funding: $68m

Company benefits

  • Health, dental, vision
  • Unlimited PTO
  • Paid parental leave
  • Equity
  • Flexible schedule
  • Contribute directly to open source
  • Work and grow with an experienced team

Company HQ

Sharon Heights, Menlo Park, CA

Founders

Previously a Principal Engineer at Uber, then Confluent, and subsequently served as VP of Apache Hudl at The Apache Software Foundation.

Salary benchmarks

We don't have enough data yet to provide salary benchmarks for this role.

Submit your salary to help other candidates with crowdsourced salary estimates.

Share this job

View 6 more jobs at Onehouse