Senior Site Reliability Engineer, Grafana Labs

Databases

€88.6-106.4k

Salary benchmarked for candidates based in Spain, may vary in other locations. + Equity

AWS
Kubernetes
GCP
JavaScript
Python
Linux
Go
Azure
Grafana
loki
Senior and Expert level
Remote in EU, UK

More information about location

Grafana Labs

Organization monitoring, visualization and observability platform

Job no longer available

Grafana Labs

Organization monitoring, visualization and observability platform

1001+ employees

B2BAnalyticsVisualisationSaaSData Analysis

Job no longer available

€88.6-106.4k

Salary benchmarked for candidates based in Spain, may vary in other locations. + Equity

AWS
Kubernetes
GCP
JavaScript
Python
Linux
Go
Azure
Grafana
loki
Senior and Expert level
Remote in EU, UK

More information about location

1001+ employees

B2BAnalyticsVisualisationSaaSData Analysis

Company mission

To unite data, no matter where it lives, and empower its users to analyze, take action, and make smart decisions.

Role

Who you are

  • Strong engineering background (at least 6 years), that lean towards SRE roles (at least 3 years)
  • Good communication, capable of engaging in deep technical conversations with other engineers and customers, and collaborating across organizational boundaries
  • Experience with Kubernetes on any of AWS, GCP, or Azure, and working with Helm charts
  • Experience with Site Reliability Engineering, System Design, and Distributed Computing
  • Experience with one or more programming languages (e.g. Go, Python, JavaScript, etc)
  • Experience with Linux operating systems internals, and some knowledge of networking
  • Experience with calmly and actively participating in blame-free Incident Response, following up on actions, and writing high quality PIRs (Post Incident Reviews, a.k.a. post-mortem documents)
  • Comfortable working within an engineering team where individuals are encouraged to have a strong sense of autonomy and self-direction
  • We highly value those who are kind, intellectually curious, who default to transparency, possess a high bias towards action, and who are also kind (this is important)

What the job involves

  • We are looking for a Senior SRE to help us support our highest value Grafana Cloud customers by increasing the reliability of our Cloud databases that are based on Mimir, Loki, Tempo, and Pyroscope
  • The High SLA SRE team is a new team within the Databases department, that owns the environments (customer and product cells) for our largest customers, and acts as an overlay to existing teams that run the databases within the system
  • As an SRE within the team, you own the configuration of the software via Helm charts and Jsonnet, being involved with the PRR for new features, shepherding releases to the environment and ensuring new releases do not degrade the SLOs or user experience for the customer (learn what is special about each of these customers, and mitigate risks that might be produced by a change in the software), directly contributing design docs, code, PR review, and other engineering activities to the databases to further improve reliability for the customer, observability of the customer stack, and making recommendations to customers about their use of the system to further improve reliability
  • Like all SRE roles there is an on-call element, unlike other roles this one is a shared pager where “if the Mimir team are paged for this customer, then we are also paged”, this allows you to focus your response on the experience the customer has, whilst also being supported by another on-call engineer who will focus on the system
  • As a company, we hire globally (remote-only) to ensure our on-call is as healthy as possible, and aligned to 12 daylight hours per day as the default
  • Regular 1:1s to with your manager and colleagues
  • Reviewing and creating SLOs, proactively investigating ways in which we can further reduce budget burn for those SLOs, which can be self-directed or as the result of learnings from incidents, and may include improvements to monitoring, automation, increasing self-healing, auto-scaling, etc
  • Improve observability of customers within the High SLA environments
  • Configuring systems to increase reliability via Helm and Jsonnet
  • Collaborating with our Engineering Leaders to help define and influence product strategy, roadmaps and technical designs
  • Participate in PR review and collaborating with other engineers on their Design Docs
  • Teach others about Site Reliability Engineering and communicate best practices to be applied early in development of new features and functionality
  • Participate in Incident Response when applicable, including investigation through to resolution, PIR, and communication with customers via Bridge calls where necessary

Our take

Observability plays a crucial role in software development, where dashboards are pivotal for monitoring the health of IT infrastructure across an organization. However, for developers, the frustration of spending unnecessary time navigating between different dashboards to find the latest information detracts from valuable programming time.

Grafana collaborates closely with its clients to create customizable observability platforms, striving to enhance efficiency in observability. Its offerings include cloud-based data structures, dashboards, API plugins, and collaboration tools. Currently, FGrafana boasts over 21M active instances and approximately 10M users globally, with a customer base exceeding 2,000, including prominent names like Bloomberg, PJ Morgan Chase, eBay, PayPal, and Sony.

While Grafana provides free plans with a simplified tool package, the Grafana Enterprise Stack offers customized observability solutions on a monthly subscription basis. The company's ongoing strategy revolves around continually improving Grafana to meet users' demands, enhancing functionality with each iteration, and fostering community engagement by enabling users to share their Grafana dashboards with the broader open-source community.

Kirsty headshot

Kirsty

Company Specialist

Insights

Top investors

Some candidates hear
back within 2 weeks

49% employee growth in 12 months

Company

Funding (last 2 of 6 rounds)

Aug 2024

$270m

SERIES D

Apr 2022

$240m

SERIES D

Total funding: $805.2m

Company benefits

  • Vacation: Balance is key. Our team enjoys 30 days of paid vacation each year on top of national holidays, parental leave, and sick leave. We also take a breather on a number of Grafana Shutdown Days each year
  • Healthcare: We’re proud to provide health coverage or stipends for our colleagues in the US, UK, Canada, the Netherlands, Sweden, Singapore, and India
  • Retirement planning: There’s no time like the present to start saving for your future. We make employer contributions into the pension pots of our team members in the US, UK, Canada, the Netherlands, Sweden, and Germany
  • Professional development: On top of a $1,500 annual learning and development stipend, Grafanistas have thousands of on-demand courses at their fingertips to help them grow professionally. Want to attend a conference or training? Go ahead. Just pass on what you learned
  • Work location: Vast majority of our roles are fully remote, focused on hiring the best talent and allowing you to perform from the comfort of your home. If you fancy a change of scene, we’ll also reimburse you up to $175 a month for a personal co-working space
  • Choice of tech: There’s no one-size-fits-all when it comes to the tech required to do your job. Choose the laptop and accessories you need when you join us, and we’ll refresh them every three years
  • Mindfulness: When you join the team, you can sign up for a complimentary subscription to Headspace to take advantage of the benefits of mindfulness and meditation. Our wellbeing resource group also organize sessions run by fellow Grafanistas or external trainers
  • Fond Perks: Grafanistas across the world receive access to Fond, a platform that provides access to pre-negotiated discounts on a wide variety of services including entertainment, food, and fitness
  • Global Employee Assistance Program: We offer all team members a 100% confidential support service with 24/7 365 access to professionally qualified counsellors and specialists

Company values

  • Share openly and default to transparency
  • Respectfully empowered
  • OSS is in our DNA
  • We keep our commitments
  • Seek diverse perspectives
  • Don't let perfect get in the way
  • Help each other thrive

Company HQ

Financial District, New York, NY

Founders

Raj Dutt

(Co-Founder & CEO)

Currently a Board Member at NSONE. Previously founded Voxel where they served as the CEO for 12 years. They also served as the Senior Vice President of Technology at Internap Network Services.

Torkel Ödegaard

(Co-Founder)

Graduated from Mälardalen University with a Master's in Computer Science. Founder of Coding Instinct. Previously worked as a Consultant at H&M, Avega, and Ebay.

Anthony Wood

(Co-Founder)

Previously worked at Visa and Voxel as a Senior Systems Engineer. They also have Systems Engineering axperience at PalVision and iiNet.

Share this job

View 71 more jobs at Grafana Labs