SRE

Creator

Creator

Seonglae Cho

Created

Created

2020 Jul 9 1:53

Editor

Editor

Seonglae Cho

Edited

Edited

2026 May 1 13:46

Refs

Refs

Site Reliability Engineering

인프라스트럭처와 운영 문제에 적용

주된 목적은 상당한 스케일링이 가능하고 상당히 신뢰할만한 소프트웨어 시스템을 만드는 것

Lindy effect and Bathtub curve in Reliability engineering

Less, SAFe → Lean development rule

주요 지표들로는 아래와 같다

SRE Notion

Reliability Engineering

class SRE implements Devops

Lessons learned from two decades of Site Reliability Engineering

Site Reliability Engineering, incident management, learning, lessons learned, SRE

Lessons learned from two decades of Site Reliability Engineering

https://sre.google/resources/practices-and-processes/twenty-years-of-sre-lessons-learned

How to Build Software like an SRE

I’ve been doing this “reliability” stuff for a little while now (~5 years), at companies ranging from about 20 developers to over 2,000. I’ve always cared primarily about the software elements I describe as living “outside” the application – like, how does it get its configuration? What kinds of instances does it run on, and are those the best kinds to use? What steps does it take on its path from “code in a repository” to “running in production”? And I’ve always kept track of what I liked – which mechanisms allowed fast iteration and which caused frustration, which led to outages and which prevented them.

https://www.willett.io/posts/precepts

SRE #4-예제로 보는 SLI/SLO 정의 방법

조대협 (http://bcho.tistory.com) 앞에서 SRE의 주요 지표인 SLO/SLI의 개념에 대해서 설명하였는데, 그러면 실제 서비스에서는 어떻게 SLO/SLI를 정의하는지에 대해서 알아본다. SLI는 사용자 스토리당 3~5개 정도가 적당하다. 사용자 스토리는 로그인, 검색, 상품 상세 정보와 같이 하나의 기능을 의미한다고 보면된다. 아래 그림과 같은 간단한 게임 서비스가 있다고 가정하자.

https://bcho.tistory.com/1329

SRE #4-예제로 보는 SLI/SLO 정의 방법

https://github.com/openai/openai-cookbook/blob/main/articles/techniques_to_improve_reliability.md

OpenSRE AI

OpenSRE | Agentic Alert Investigation for Production Pipelines

OpenSRE investigates the moment an alert fires — correlating signals, testing hypotheses, and recommending fixes so your team can resolve incidents 10× faster.

OpenSRE | Agentic Alert Investigation for Production Pipelines

https://www.opensre.com/

OpenSRE | Agentic Alert Investigation for Production Pipelines

Recommendations

///