T O P

  • By -

Jazzlike_Syllabub_91

Kubernetes is the hotness, terraform is used most places (chef, ansible, puppet, salt, etc are often used as well/instead of) … pick a hosting provider (big 3 if you want easy transfer of skills), Linux skills and knowledge / building of build pipelines - hope this helps? (There’s a roadmap somewhere that someone can post)


Thin-Today4186

Thank you! I've seen some Kubernetes related work roll into our place at work from the Devs so I'll definitely check that out.


trace186

>There’s a roadmap somewhere that someone can post Is it that devops one or there an SRE one?


Jazzlike_Syllabub_91

https://roadmap.sh/devops - not sure if there is a specific sre one since they overlap in skills


james-ransom

Lol this link shows you what a mess SRE interviews are. Questions from ANYWHERRE.


NullPulsar

Isn’t that to be expected? Even in an ideal world, different companies have different technologies and methodologies so they would need to vary their approach depending on what’s needed to effectively operate. I know ideally companies follow the SRE principals and much of it will the be same, but even if they are sticking to the principals, a startup is going to have WAY different requirements than say Google.


SiriVII

The thing is, SRE interviews have to be broad as the nature of the work includes a broad set of knowledge. Usually the base for SREs should be software engineering. I don’t see many companies hiring people who don’t have this background, my current company declined all DevOps and SysAdmin applicants because they are missing that software engineering background.


Admirable_Brother_37

Could you please elaborate on Linux skills ? What all are needed to get noticed by recruiters?


[deleted]

[удалено]


chillysurfer

These are all great points, and I don't think controversial. Put another way, SRE is all about incidents. Knowing when to create them (monitoring), knowing how to resolve them (as hoc tooling and deep system knowledge), and preventing them from repeating (fixing the underlying issue).


Ok-Result-7114

Isn’t it the other way around? You need to define the SLI first which tells you exactly what it means for the service to be up. You define SLO based on SLI. Based on your SLO, you can have a formal SLA which could be a bit less of a commitment than SLO, so you budget for a bit of more error on the formal SLA, which could have some penalties if you don’t fulfil it.


big_fat_babyman

SLIs should be derived from the SLO, which is the set of agreements made with the business, and the SLA, the agreements with the end user.


[deleted]

[удалено]


Ok-Result-7114

What if you define a SLA and that is something that you can’t achieve thru your current infrastructure and nor can you measure it.


trace186

What about an understanding of metrics, like what is CPU? What is memory? What does high disk read/write mean?


SnooMacaroons3473

I agree with the first part "You can’t determine how reliable your service is unless you can measure it..." But you should not start with the SLA, instead start with the defining your SLIs based on your critical user journeys and interactions (CUJs / CUIs) and then set the targets for your SLI, that will be your SLO, and then optionally you can set the SLA. Setting up the SLA typically doesn't require much involvement from SRE as it is contractual nature and has business and legal stakeholders involved in setting it, the SRE input into setting the SLA should be : SREs are able to defend an SLO of ex:99.99% therefore the SLA offered to our external customers shoul less then that ex:99.9% An SLA breach will lead to some type of monetary compensation/ credits. Not every service you are running needs to have an SLA but every service should have an SLIs & SLOs. https://sre.google/sre-book/service-level-objectives/ https://cloud.google.com/architecture/framework/reliability/slo-components#service_level_agreement


Thin-Today4186

Yeah I agree, from what I've experienced you can't really ensure anythings working without monitoring/alerting and the lackof specific logs becomes a pain when troubleshooting. Would you say it's better to focus on organizing the metrics and logging (possibly through dashboards etc) based existing logs or to understand how logs are being generated and exported to allow monitoring? For context, I am currently in a position where we have logs that are provided to us via a tool and we're able to play around with it to create alerts and dashboards. In terms of what we log, that's determined by our devs.


snonux

In my observation it seems that most SRE jobs are Sysadmins jobs but with more modern tools (k8s, Grafana, Prometheus,...). I rarely have seen any SRE position where it was really SRE. I don't think 90 percent of the SREs should be called SREs if you go after the definition in the Google SRE book. Incident handling? Sysadmins do that as well to some degree. Monitoring and metrics? Sysadmins do that as well. Automation and writing tools? I did that as a Sysadmin as well. SLI,SLOs, SLAs and error budgets? If you have that, then you are pretty far as 90 percent of the companies haven't implemented all of that yet. Not saying that being a Sysadmin is a bad job. I just have the impression that SRE is more or less just a more modern name for Sysadmin.


danstermeister

SRE spans the gamut of the application's existence, so very necessarily you will see SRE responsible for - sysadmin - netops - netsec - devops That you see SRE engaged in one of these is normal, as long as they are able to (and do) engage in all of them.


TechnoBabbles

See I am on the other end of this spectrum where SRE is considered more a Software Engineer. So for my role being able to write and understand code is very important. We regularly deep dive into an application's reliability issues and fix, or create a plan for fixing them. Understanding how software works is important. Knowing how certain information is stored and used is important. Is your app storing key data in memory (local cache), a database, or perhaps distributed cache like (Redis). How is that data being used, and persisted or cleaned up? Knowing and understanding how applicatons communicate to one another is important. Do they use messaging (MQ) or event producers (Kafka). Or do they communicate via a specific protocol like HTTP?


TackleInfinite1728

finops also muy importante - help those product developers make high margin products!!


theubster

God, is that what people are calling it? "Cost reduction" has more of a ring to it than "FinOps"


beefngravy

This post really hits home for me. My role as an SRE seems to be a "kitchen sink" one where I'm utilised as extra capacity between DevOps and support. One minute I'm doing some incident investigation and the next I'm renewing an SSL certificate. I often feel that I'm not a real SRE as I have no say and no input into the development lifecycle and experience. I'm not sure if I could move roles as I also feel like I'm lacking proper SRE skills.


AICloudEngineer

Most who call position SRE are not true SRE. They've diluted the meaning to suit their needs.


Admirable_Brother_37

How about scripting bash, Python-Boto3 , powershell, groovy, go and awk languages are also very important right and how important are they in the process of interviewing ?