Site Reliability Engineer, Trainline

Salary not provided

AWS

Python

Linux

Terraform

ELK

Flow

Data Flow

Grafana

Teamcity

Junior, Mid and Senior level

London

2+ days a week in office (Holborn, London)

Train ticketing platform

Job no longer available

Train ticketing platform

1001+ employees

B2CB2BTravelSustainabilityTransporteCommerce

Job no longer available

Salary not provided

AWS

Python

Linux

Terraform

ELK

Flow

Data Flow

Grafana

Teamcity

Junior, Mid and Senior level

London

2+ days a week in office (Holborn, London)

1001+ employees

B2CB2BTravelSustainabilityTransporteCommerce

Company mission

We're building the world's no.1 rail platform while empowering greener travel choices, connecting people and places.

Job

Company

Role

Who you are

Experience of SRE concepts such as SLI, SLO and error budgets
Hands-on experience with observability tooling such as New Relic, Elastic Cloud (ELK Stack), Influx, Grafana, with a good understanding of APM and MELT (metrics, events logs, traces),
Strong understanding of HTTP/TCP (status codes, nuances of headers, cookies, connection/request life cycle)
Understanding of load balancing and reverse proxy concepts, upstream config concepts, upstream health checks, worker & data flow concepts
Application architecture concepts (threading, queuing, readiness checks, health checks, circuit breakers, timeouts, exponential backoff, throttling)
Experience building, maintaining and evolving time series data, retention, cardinality, deviation, moving averages and other functions
Experience working with cloud providers preferably AWS
Experience with build, deployment & configuration management tooling such as TeamCity, GitHub Actions, and Terraform
Experience troubleshooting Linux operating systems
Experience of scripting in at least one language preferably Python

What the job involves

Trainline is a fast-growing company that loves utilising new technology to build world-class products for our customers
We run a diverse platform that is hosted on AWS and coupled with our own tooling allows us to embrace CI/CD, DevOps practices, SRE disciplines and cloud native services to their full potential
ReliabilityOps are at the forefront of platform observability maintaining availability, latency, performance, efficiency, capacity, CI/CD delivery co-ordination, critical incident response and cloud infrastructure automation and provisioning
We are looking for a Site Reliability Engineer to join the team contributing to owning observability and building tooling that supports operational engineering
We are looking for a strong technical team player who has experience implementing SRE practices within a team and contributing to advocating SRE principles
Critical incident response in production, from initial event, participating in rapid response and driving service restoration, identifying follow up measures
Building and implementing tooling to improve observability, identification and resolution of incidents with a strong emphasis on reducing MTTD
Supporting product engineering teams to ensure applications are operationally launch ready and that CI/CD activities are carried out in a safe manner with reliability in mind
Reducing MTTR by working with product engineering teams to understand issues, surface and present the right data to influence change
Contributing to incident retrospectives with deep technical knowledge to explain what may have occurred at HTTP, TCP, DNS layers of the stack
Promoting and expanding SRE concepts to the engineering community in both a consultancy and hands-on fashion, being a champion of observability engineering and reliability principles
Improving platform reliability, identifying metrics to base decisions on, surfacing them if we don’t record them, identifying continuous improvement across our pillars of observability
See data presentation as a socio-technological problem, we need the most pertinent information presented quickly in a human-consumable way to affect the resolution of a real time incident
Delivery on key road map deliverables and ensure that initiatives are contributing to the achievement and improvement of the SRE team, reliability of the platform & business OKRs
Participating in the SRE on call rota, assuming the role of incident commander ensuring our platform is supported 24/7 for our customers

Salary benchmarks

Our take

Trainline make it super easy to book transport tickets. Their app is intuitive and well designed. and they're even pioneering paperless travel, allowing you to store your tickets in the app.

They're hyper focused on customer experience and have a product culture of fast iteration to deliver great outcomes for customers.

Their customers make 170,000 journeys a day and they sell 200 tickets a minute.

They're continuing to innovate to stay ahead. They're using machine learning to predict price movements and are integrating with Siri and Google Assistant so people can buy tickets just by speaking to their phone.

Steph

Company Specialist

Insights

Some candidates hear
back within 2 weeks

42% female employees

14% employee growth in 12 months

Glassdoor (3.73)

Trustpilot (4.3)

Company

Employee endorsements

Challenging work

"The complex nature of the market as we work with different suppliers is an amazing opportunity to create fluid solutions. A very well segmented..."

Browse all endorsements

Funding (1 round)

Jan 2006

$210.5m

LATE VC

Total funding: $210.5m

Company benefits

Flexible working (40% min office attendance)
Cycle to work scheme
Gym memberships
Private Health and dental Insurance
Pension (matched contributions)
Income Protection
Share Incentive Plan (buy one get one free)
Learning budget
Primary & Secondary Caregiver Leave
Shared Parental Leave
Enhanced sick pay
Free Perkbox subscription

Company values

Own It - We care about every customer, partner and journey
Do Good - We make a positive impact
Travel Together - We're one team
Think Big - We are building the future of rail

Company HQ

Farringdon, London, UK

Articles

How we migrated our CDN to AWS CloudFront at Trainline

Meet the new kids shaping the future of business travel

Founders

Jody Ford

(CEO, not founder)

Previously Non-exec Director of Moonpig and Photobox Group.

Milena Nikolic

(Chief Technology Officer, not founder)

Previously Engineering Director at Google.

Lisa Hillier

(Chief People Officer, not founder)

Previously Chief People Officer at Gousto , The Restaurant Group and Just Eat.

Mike Hyde

(Chief Data Officer, not founder)

Previously at Meta.

Dave Price

(Chief Product Officer, not founder)

Previously at Spotify and GoEuro.

People progressing

Alan

Joined as a Product Manager. Promoted to Product Manager after 1 year. Then promoted again to Principal Product Manager after 7 months and is currently Product Director.

Sarah

Joined as Business Performance Analyst. Promoted to Senior Analyst after 1.5 years. Then promoted again to Commercial Manager after 7 months.

Diversity & Inclusion at Trainline

💚 Diversity drives us forward We know that having a diverse team is crucial to Trainline’s ongoing success. A team that is diverse in all forms – gender, ethnicity, sexuality, disability, nationality and diversity of thought are key to Trainline being an inclusive environment where every Trainliner can be their true self. We're committed to creating workplaces where everyone belongs, is celebrated and their differences are valued, creating an awesome employee and customer experience. 💚 Meet our Networks Our diversity networks are inclusive communities developed and led by Trainliners, with sponsorship and support from senior leaders. They're all about empowering and supporting underrepresented groups, by providing a safe space to talk, a place to come up with new ideas and a channel for voices to be heard. 🚀 Women in Leadership 💜 Artemis (gender equality) 🏳️‍🌈 Rainbow Train (LGBTQIA+) 🌻 Sunflower (accessibility) 🌍 Ethnic Diversity Network 🐣 Parent & Carers

Share this job

View 5 more jobs at Trainline