StaffEng Podcast

Rich Lafferty (PagerDuty)

Oscillating between the roles of individual contributor and management has been a recurring theme on this show. Our guest today, Rich Lafferty, has some special insights into this pattern that can help anyone looking to improve their work. Rich works as a Staff Site Reliability Engineer at PagerDuty and has spent many years interfacing with various departments and building projects and proposals. In our conversation with Rich, we discuss how his past roles have informed his work at PagerDuty and how he gets the most out of his teams without exploiting the authority that comes with his more senior role. We delve into Rich’s process for building proposals and learn some of his tips and tricks for ensuring the best possible outcome by investing in the foundation and design phase. We also explore the importance of early feedback, why you need to include a diverse group of individuals, and how to gradually grow your feedback group. Tune in as we discuss everything from risk management to high and low context culture, and much more!

Links

Listen

Download Episode

Transcript

Note: This transcript was generated using automated transcription and may contain errors.

David: Welcome to the Staff Eng podcast where we interview software engineers who have progressed beyond the career level into staff levels and beyond. We’re interested in the areas of work that set staff plus level engineers apart from other individual contributors. Things like setting technical direction, mentorship and sponsorship, providing engineering perspective to the org, etc. My name is David Noel Romis and I’m joined by my co host, Alex Kessinger. We’re both staff engineers who have been working in software for over a decade. Alex, please tell us a bit about today’s guest.

Alex: Yeah, Rich Lafferty is a staff site reliability engineer at Pagerduty, where he builds platforms in the clouds to make Pagerduty reliable and his development teams happy. He calls Toronto home with his wife and two tortoiseshell cats. In his copious spare time, he can be found enjoying craft beer, staring at a wall, or playing bass in one of Pagerduty’s two office bands.

David: All right, Rich, thank you so much for taking the time today. If we could start by having you tell us who you are and what you do.

Rich: Yeah. So my name is Rich Lafferty. I’m a staff site reliability engineer at Pagerduty working out of the Toronto office, although no one’s working from an office right now. I’ve been at Pagerduty for about four years now. Started there as a senior engineer and managed to get myself into the staff title about a year ago. I think I’ve been doing it for a while. Before that I’ve worked tech companies here and there. I was at a. I did a little tour of management for a while, decided that wasn’t for me, and when I joined page duty, it was explicitly to switch back from management into an individual contributor role. And the great part about, you know, where I’ve kind of ended up now, why I’m excited to talk to you guys today is that this whole staff thing has really hit the sweet spot of what I loved about being in a management role and the sweet spot about what I love being in an IC role all at the same time.

Alex: Nice. What does a typical staff engineer do at Pagerduty?

Rich: Yeah, so at Pagerduty, a staff engineer reports to a director. So the idea is that even though I might be co located with a particular team for this project or that project, I don’t have a like a permanent home team necessarily. The idea is that. So for instance, I’m a staff engineer in the whole infrastructure department, even though I tend to work with our cloud infrastructure team right now so staff engineer, it does depend kind of individual by individual, but it’s a mix. It’s less spending time in the editor and more time working on the high leverage stuff where I keep going by. Tanya Reilly, who is a staff engineer at Squarespace, has a fantastic presentation that’s pretty well known called Being Glue. And if you haven’t seen the talk or gone to the, you know, if you Google Tanya Riley Being Glue, you’ll find it. And it’s framed in terms of here’s a warning for underrepresented groups in junior roles. But you can really turn the talk around and go all of the, the stuff that Tanya talks about in that this glue work, this bringing things together is really high leverage work for senior engineers. And the amount, what I can deliver in terms of sitting in front of an editor and writing code is nothing compared to what I can do by bringing people all onto the same page, finding the highest leverage opportunities and that kind of thing. I have a lot of freedom in my role to figure out, well, what should I be doing? Obviously, sometimes I’ll be asked like, hey, please go solve this particular problem, or please, you know, we’ve got this initiative coming up, we’d like you to kick it off. But really, I’ve got a lot of freedom to go, okay, so where what needs my attention and where can I be the highest leverage and that might be. I mean, I’ve joked before that my editor of choice is Google Docs. Because, you know, a lot of what I’m doing is just, well, let’s just write it all down so that everyone’s got a place to start from, whether that be a, you know, a process proposal or a technical proposal, or taking a whole bunch of stuff that a bunch of people throwing at me and trying to put it in one page so that I can go to people and leverage Cunningham’s Law and say this, what I think you’re saying, so that they can come in and say, no, no, no, that’s not what I’m saying at all. And eventually get everyone on one page. So it’s a very gluish kind of role. I’ve definitely noticed before that we’ve had some folks that made it to staff and found that they don’t get to write code as much anymore and had a little bit of trouble with that, but it really is. It really is. It’s kind of like almost a not quite director without proposal. I think principal pagerduty would be director without reports. So it’s kind of a senior manager without Reports that still stays in touch with the technical stuff, which is really a sweet spot for me.

Alex: Is there anything specific about your approach to the staff role that you think might be different from other staff engineers at your organization or just more broadly?

Rich: I think the main thing, you know, I mentioned that I spent some time in management and I feel like I leverage that all the time. I use that all the time. And that’s kind of sort of my, one of my, one of my little superpowers. Before I was at PagerDuty, I was at a company called FreshBooks. I was there for nine years, which was a little too long, but I started there as one of the early employees in the startup and when I left there I was their director of production operations. And having spent that time in senior management and being exposed to all the parts of the business that aren’t the engineering work I use all the time, it’s code switching. If I need to talk to senior management of the executive team or if I need to talk to legal or finance and stuff like that, then I have had to do that. And I can switch into finance language, I can switch into code. Switching into manager is really useful. But also I can kind of imagine what the really, if you have a problem and it’s not a purely technical problem, then there are feelings involved. And having been on the management side of things, I can kind of have some empathy for what those problems might be and what those feelings might be and really kind of emphasize that. So I think my role, my approach to this is very much being able to leverage kind of that past experience in management towards being glue between engineers and management engineers and product engineers and other engineers and so forth. And a lot of the things that I do that I find high leverage are things like, you know, I notice this is going a little off the rails. I’m going to unblock it more so than I’m going to go into, you know, I’m going to sit down for a day and I’m going to come up with a top to bottom new proposal for a thing. That said there’s some of that involved as well and it’s not, it’s not all glue.

David: That’s really interesting. I think we’ve talked a couple times on the show already about like the idea of the pendulum between management and IC work. And I think when we have senior ICs that have management experience that ends up being kind of a superpower. I do want to circle back to that. I think there’s a lot to kind of unpack there in terms of how it changes your relationship with the teams that you work with and with the leaders above you. But before doing that, maybe to just sort of paint a picture for people. What are some examples of, like, concretely, it doesn’t have to be specific things that you’ve worked on recently, necessarily, but like, what’s the type of work that you would be doing at Pagerduty on a regular basis?

Rich: Yeah, so we actually, we just wrapped up a project that went into general availability on this past Monday, June 28, which was to launch our presence in the EU. So used to be at Page of Duty, if you wanted it, you could, you know, one color black, you get it in one place, which is the us And a bunch of our European customers are saying, well, you know, that’s not quite what we want to do. We’d much rather host our data in the EU for their own, for not for any compliance reasons on us, but because it was important to them for their own risk analyses to keep their data in the eu. And so I’ve been involved with that through and through. And the way that that project kind of unfurled is a really good example because at the very beginning of it, there were a lot of fundamental technical questions about how we’re going to do this, how are we going to set up our networking, how are we going to set up deploy, how are we going to manage our AWS accounts and just kind of like core architectural bits. And so part of, at the very beginning, part of what I was doing was writing proposals for that. I think we should do it this way. A lot of what I was doing was finding the people that know those particular areas best, because that’s certainly very rarely me, not just asking their opinion, but also kind of coaching them through having an opinion sometimes in terms of like, okay, so here’s what I’m hearing from you. Here’s a bunch of constraints that we’re going to have to work under. Here’s what the business needs. How can we shape this into a decision? Once all that kind of was out of the way, then we kind of moved on to, well, we need to actually deploy and implement all this stuff that we came up with there. I switched from being a proposal writer and idea gatherer towards being a bit of a project manager, in that I was taking care of making sure that all the teams in the infrastructure department were executing and delivering all of the bits they needed in the right order. And if things got blocked, I would go down to that Runway level and figure okay, well, where’s the block? What if we do this? Have we tried that? And then once it’s back on the road again, back up to altitude to kind of get an idea of the whole thing going on. Once that section was done, then we started running into getting, we had to actually get our services deployed and stuff like that. At which point I really wanted to leave just for scalability, I needed to leave that up to the folks responsible for the deployment tooling and the teams that needed to do the deploying. But there’s a lot of just kind of keeping an eye on things and making things are going smoothly the best, you know, the sign of success there is that I don’t have to get involved at all. But that wasn’t always true. Just kind of, you know, dive down and clear this bit up here, clear this bit up there. While this was all happening, there was also, of course, other things going on. So I was having conversations with, you know, the management and the infrastructure group and senior management about how to balance these, the stuff we’re doing in the EU with the other priorities that come up and incidents that come up and so forth to make sure that stuff kind of all got balanced. And then towards the end there was a lot of situations where for instance, now that we have two hosting locations instead of one, we have a concept of global services that we never had before. And so we need to step kind of carefully around those. How can we minimize them, how can we make sure we implement them correctly so that when we do this another time in the future, it’s going to keep working there and so on. One of the best things that I did just in kind of terms of that staffy kind of work was at the very beginning I worked with a couple people to come up with a set of architectural principles which were very high level things like not bringing legacy. Like if there was two ways to do things, we wouldn’t support the legacy way, we would only support the new way in the new service region. It was really useful to have that at the very beginning because there were a lot of decisions we kind of went ran into throughout where we were able to go, okay, so here’s two options. Either of them will probably work, but way back at the beginning we laid out these principles to say, look, this is the kind of the reasoning and the philosophy or how we’re doing this. Can we look at this decision through that lens? And I mean, again, I’ve been talking about leverage a lot here. That was again a high leverage thing that I can Write up and socialize and get buy in on these general principles. And that can be like a decision making engine, meaning that I don’t need to get folded to things later on through the whole thing. There were also tickets to do and you know, it was a combination. I was still embedded with the team while I was doing all this. The cloud infrastructure team that maintains kind of the underlying infrastructure and platform. And so, you know, take some of the tickets that are maybe a little bit more challenging, but make sure there’s still enough challenge for the team. Take some of the tickets that are kind of grindy, not so fun work so that the rest of the team can take the interesting stuff. So there’s still technical work underneath all that as well.

David: I’m really fascinated by the idea of defining architectural principles up front. I’m actually kind of fascinated by the idea of trying to distill sort of any reasoning process into like a list of guiding principles or like strategy docs, vision docs, these sort of things. It seems really nebulous to me. What’s the process that you follow to put that together?

Rich: I would love to say, oh well, here’s the process. But I mean, a lot of the times it’s funny, sometimes I’ve had, you know, some folks I’m mentoring or whatever say like, well, how do you come up with this proposal? And my answer is like, well, I start with an open editor and when I’m done there’s a proposal on the page and I don’t know what happened in the middle. And it’s the same kind of thing with this. I mean, it’s certainly not my idea. Heavily influenced by Amazon’s leadership principles, which I’ve always liked more than sort of your typical company values because they’re extremely specific and they’re written to be a decision making filter. And so I kind of did the same thing here. I think what was happening is basically I had a bunch of ideas for things that we would have to do as we started this project and what kind of decisions were going to be coming. And I never explicitly sat down and went, oh, I better write out some principles so that people can make these decisions easily. But I realized that there was a bunch of stuff that I kind of knew where the right thing was in my gut. You know, for instance, to use this, I wish I had them in front of me because I could reference more than one of them. But to use this no legacy processes thing for as an example of a legacy process. Current infrastructure at Page of Duty is terraformed, but PagerDuty has been around for 14 odd years now. And so there is kind of, we haven’t migrated everything over. So in the US there’s still a bunch of stuff that had been created by hand. And so we drew a bright line in the EU to say no, that you can only create infrastructure using terraform. But that was kind of like a gut thing to me. I didn’t have to sit down and reason it out. It’s like, well, if we’re going to do this, we need to do it this way. Another one, now that I think about, another one was around global services that by default you’re not allowed to talk between the US and EU service regions because once we start doing that, we can’t really make promises to our customers about where their data is going to reside. But at the same time I recognize that, well, there’s going to be some places where we have to do that. All of this, I don’t know, it’s just from experience. And you go into a project like this and you go, okay, well here are the things that are going to bite us. And you kind of get a gut feel for what all those things are. And it’s like, well, how can I communicate that gut feel to everyone else? Well, I can write down the ideas that I think led me to those gut feel decisions and those in turn with a bunch of wordsmithing and running past my peers and the folks on the teams that are going to be impacted by this most, refine them into things that people can actually use for decision making.

Alex: One of the things I think I hear you talking about is you’re working on a lot of things that have business value. I’m curious, how do you go about understanding the work that has the most business value or value to the people that you serve?

Rich: Oh, that’s a tough one. That is a tough one. My instinctive answer to that is the same thing about how I write proposals is that there’s an empty editor and then it’s got a proposal in it and something happened in the middle and I don’t know what it was, but a lot of it. One model that I really liked is from what Larson’s book tie this back to that is the idea of snacking. I am terrible for snacking. I can lose an entire day on slack just helping a bunch of people get unblocked in a bunch of places. And all of those little places are really helpful things. But I’m probably not. Even though I’m probably, I’m in a good Place to help unblock people and answer questions and stuff like that, or jump into a conversation when I know that it’s going kind of like, you know, if I. If I jump into the slack conversation, then I can probably save them half an hour of going off in the wrong direction. But that’s not necessarily the highest leverage thing for me to do. So it’s really a matter of looking at. And I’m saying this, I’m not good at it, but it’s really a matter of not answering what, you know, is this something that I could be good at? But rather, what are the things that only I. Or, you know, the things. It’s the. Again, it’s the highest leverage things. And how do you know what they are? You’ve done them a lot. I know that’s a terrible answer for people listening to this going, well, how do I become a staff engineer? Well, you do it a lot, but it’s really, you know, it’s really looking for the things where the smallest amount of work can have the biggest impact. And to bring it back to what we were talking about a second ago, that’s where writing up a set of architectural principles that the other 300 people in engineering can use to make their own decisions amongst themselves without having to talk to me at all, without having to run through any big process. Especially since we want teams at pagerduty to own their own things, writing out those is a lot higher leverage than walking a bunch of engineers through those decisions one at a time. So it’s really just a case of finding the things where, like, how can we align people? You know, how can we give people the tooling to do the right thing? It’s like the architectural side of having a sort of a golden path for deployment or something like that. You know, one of the things that our delivery team who focus on deployment, tooling and processes, one of the things that they do is they always try to make it really easy to do the right thing. Teams fully only serves as a pagerduty, so if a team wants to deploy their software slightly differently, they’ve got the autonomy to do that. But unless they have a really good reason, it’s probably going to suck because we’ve made the default path so easy. And for instance, we’ve got a thing called the. We just started using Backstage, which is really cool, to make it easy to spin up a new microservice using our elixir skeleton. Press a button, receive, repo with the skeleton in it, and it’s got some basic Terraform. It’s got your deployment stuff in there, it’s got everything that you would need to get a hello World running in production. And then all you have to do is write your service around it. So having that makes it really easy for a team that wants to deploy a new service to have all of the things that we as a company have decided are the best way to do things. They can then go in and change it. They don’t have to use a skeleton at all, but it’s so easy. And the same kind of thing for architectural stuff, if you make the reasoning behind it really obvious. The highest leverage stuff here is communicating. Because if you make if make it really clear to people why we decided X or why we want to keep this in mind, then we basically kind of empowered them to understand where the business is coming from and where the technical leadership is coming from, but still make the right decisions on their own. And again, it’s a leverage thing. Being able to take the time to get those, those principles just right is a lot higher leverage than having to look at the individual decisions underneath. Of course, I still look at the individual decisions underneath. In case something needs, you know, a little poke in the right direction or stuff like that. I wouldn’t want to come in and say, no, look, I wrote these principles. Why aren’t you following the principles? But I might come in and say, hey, so I noticed you’re thinking about doing it that way. That surprised me because given this principle, I would maybe sign this other way. Let’s talk about. So let’s talk about kind of what direction that should go in.

Alex: Yeah, the thing that strikes me about what you’re saying is that there’s a lot of upfront effort that gets put in. It seems like it delivers a great result. But one of the things that I’ve experienced is that upfront effort isn’t always seen as valuable as it should be. How are you communicating up and talking to them about the value of doing that upfront effort?

Rich: I remember learning this from someone else, a really good agile coach that I worked with at a previous gig, explaining that a project is a thing that has a beginning and a middle and an end. And everyone’s tendency is to start at the beginning of the middle and end at the end of the middle. Which is to say even if you’re doing agile, even if you’re trying to be as non waterfally, as non planning advanced as possible, you still need that beginning and end. You still need to make sure everyone’s on the same page about what you’re doing and at the end you still need to figure out what you can learn and people tend to skip that so they can get on to the next thing. Because once you start cranking the ticket handle then people start noticing the work a lot more than when you’re doing the preparation stuff that would make cranking the ticket handle go even faster. A lot of it is just demonstrating that it works. One of the things I like about the team that I’m embedded with right now, the cloud infrastructure team, is that we’ve had it come up that we started in the middle a little too often in the past. It’s a really good intent. It’s, you know, this other team is counting on us for getting this thing done and or the company is counting on us for getting this thing done. We want to get done as quickly as possible. So let’s go straight to implementing. And of course, really, you know how this turns out. This means rework later or we’ve gone through some, you know, closed door decisions that we wish we could revisit and we can’t because it now remaking that decision is actually going to turn into a migration and so forth. So the great part about that is that anyone on the team can remind anyone else on the team, hey, as a team we agreed to make sure we take time to design up front. And that doesn’t necessarily mean at the beginning of the project, that just means there’s a design phase and there’s an implementation phase at any level written a little wider where you don’t kind of have that working agreement in place to just remind people from, hey, we learned from previous things that this is a bit of an anti pattern. And here’s the language we use to remind ourselves of the anti pattern. It really is just kind of demonstrating being staff does have a little bit of role authority that comes with it. And I use it as little as possible and I would never want to use it. Like using that for a technical question is an absolute last resort because every time you use it, you lose a little bit. Using that for a process philosophy, how to think about things thing is a lot less, it’s a lot easier to do. Especially since I can kind of take on the responsibility for taking the time I can say to, you know, I can say to someone, hey, I think we should take some time to write this out first, make sure we’re all on the same page. And if someone comes to you, and that’s why it’s taking too long, you can point them at me. So that gives them a little bit of space and it’s kind of using that role authority as a bit of an umbrella rather than using it as a stick. And yeah, I think teams, it’s pretty. It doesn’t take too long for teams to notice that when they write things down and they share it with their team or they share it with, we’ve got an architecture working group they can kind of get input from that, that people ask questions that they hadn’t thought of and they can tell at the end of it that they come up with a more refined, a more bulletproof, just a better design. Or worst case, they end up exactly where they started. But now they’ve got confirmation that they’re on the right path and they’ve got an artifact that someone else can look at six months from now to figure out why on earth we did it this way. So it’s a little bit of take it on faith at the beginning. I think it’s kind of self evident how it can work. And everyone kind of knows that making sure you have a design doc and write down decisions and stuff like that are good practices. So giving people room to actually do it so that they can see that it worked so that they want to do it next time.

Alex: Yeah, sounds like what you’re saying is there’s sort of like a golden spiral where it’s like, you know, people feel good when they do it, then they deliver good work. I think that makes sense from an engineering perspective. I’m curious though, like when you talk to someone who’s more business minded, are you like, well, the reason why we’re doing this because the engineers think that it feels good, you know, like, is that how you’re describing it? How are you demonstrating it to the business side?

Rich: So, you know, a lot of it is just that we’re lucky in that one of the nice parts about working for a company whose customers are engineers and engineering departments and IT departments and stuff like that is that it’s a very technical group across, like even the folks that are in product, the folks are in their management and so on, have if not technical backgrounds, a good understanding of the space in which we’re working. And so I kind of get it easy, like compared to, you know, a lot of more traditional companies and so forth. I have it easy in that respect. That said, I think a lot of it is you can really put in terms of this is risk management. One of the things I’m actually just kind of starting to get my head around at PagerDuty is the value that we would have in having a more explicit framework about how to think about risks and just training, you know, small t training individual engineers and teams on how to really methodically think about risk. Because when you go to the business, I find especially senior management loves talking about risk and that’s really what they wanted to think of it in terms of is if engineers like to talk about trade offs, management likes to talk about risks. And if you can go to management and say, look, we need to take a little bit of time to do the design on this and just to refine that. Because I think there are some closed door decisions whereby closed door decisions, I mean once you go through that, once you make the decision, you start implementing it. It’s a considerable effort to redo the decision. The opposite is open door where after you make it, it’s relatively easy to redo it if you need to. We’re going to make some closed door decisions. If we decide those wrong, then the costs are going to be quite high. Might be ongoing maintenance cost, it might be rework, it might be that we’re not meeting some of the non functional requirements like performance and availability and security and so forth. It might mean that the whole project is going to be delayed because it’s going to slow us down forever. And so the way that we’re going to mitigate cat that word that’s very important. We’re talking about the way we’re going to mitigate these risks is by doing this design phase, writing up these dogs. That’s also going to mitigate some risk later. When we hire new people, we’re hiring like always, hiring like crazy. When we hire new people, they need to understand where we are coming from. And we want to make sure that there’s documentation that they can read as to not only what the decision was, but what the process was going into the decision like the value. You know, if you write up a proposal in Confluence, you can Google Docs and it fills up with comments. Keeping the comments around is really important because it’s not just what did we decide, but it’s also well, what objections did people bring up and how did we satisfy those objections. All that history is really important. I can’t remember where I learned what the idea between high context and low context engineering cultures was, but the idea a high context culture is where you find out how things were done by talking to people. And there’s a lot of oral traditions that get passed along and so forth. And a low context is where on your own you can basically figure out how we got here, whether it be from, you know, the design, like the architecture or the, you know, the design of particular code or the documentation that’s around it and so forth, where you can really sit down and look at something and it’s like it’s clear why it was done this way. That’s a low context culture. Ideally most, I’m sure a lot of the people listening to this are in high growth companies bringing a lot of people in all the time. Just the nature of the business is that people stay in a role, you know, two to four, six years, which means there’s always going to be some turnover. And the being able to have all those documents and so forth around and records of past decisions and the reasoning that went into things means that onboarding those new people into a low context culture is going to be a lot easier than the one where in high context culture where it’s all oral knowledge passed down and so forth.

David: You mentioned something, the idea that engineers talk about trade offs and managers talk about risk. That’s a fascinating framing. I’m going to keep that and think about it a little while. Another thing you mentioned was that you have an architecture working group and that’s one of the ways that changes, I guess, get reviewed. Can you say more about how that works? Those forms have not always seen them function effectively and so I’m curious how you do it at pagerduty.

Rich: Ours has not functioned that effectively. It’s hard to pull off with these.

David: Kind of projects when you’re organizing across a big group of engineers. You’ve already talked a little bit about sort of how you get buy in from management, but how do you get buy in across that, that group of engineers?

Rich: Yeah, that’s often the hard part. Right. A lot of it is like my ideas have to, I talked about role authority before and that’s one thing that you, you really can’t use. It’s really tempting to go, you know, I’m, I’ve got this title, I’m a staffer, principal engineer. And I’m saying this is how it is and here’s how it is. And there are probably some processes in place where that happens. Especially, you know, as an organization gets larger and larger. You really do need to kind of come out and say like, look folks, this is how we’re going to do things around here because we need to be consistent. But even if you’re doing that or when you’re doing it, the smaller the ideas kind of have to stand on their own merits. I’d say the main thing there is, first of all, you have to be right. And the best way to be right is to get a ton of input ahead of time. And I think one of the things that maybe some folks in the middle of the career miss is how much conversation happens before something sees the light of day. If I’m going to make a proposal for. I think it doesn’t matter what the proposal is. If I’m going to make a proposal for a technical thing, an architectural thing, a process thing, whatever, I kind of know up front who might object to it. And I also kind of know what people might not necessarily object to it, but would have really good input on the things that I’m missing. And so what I’m probably going to do is I’m going to make sure, not because of politics, but because actually figuring out why people are objecting to things is important. I’m going to start informal and talk to kind of those key people that I can think of to get their input and refine it and make sure that people are on the same page. And, you know, it’s funny what, even when I’m saying this, I going like, oh, people that are listening, they’re gonna be like, oh, no, that’s. That’s, you know, back off backroom politics and stuff like that. But it’s really not. It’s that if you want to. If you want an initiative to succeed, you need to have people on side. And a terrible way to get people inside is to put it out to 200 people and say, what do you think? Or worse, to put it out and say, hey, we’re doing this. And then you start getting their feedback and you need to walk it back. It’s a lot less formal than that. It’s more of, you know, you go out and you go, okay, well, these two or I know these two or three people have been really involved with this. And so I’m going to make sure. I loop them in to make sure it’s, you know, it’s not my initiative, it’s our initiative that they’ve got not only their thoughts put on it, but also their name on it. So that it’s coming from. It’s not just kind of one person coming and saying, hey, it’s going to be this now. And then once you get that, those people probably also know some other people who needs some input on it. And you can kind of just sort of work this sort of informal. Who has opinions about this? Who would it benefit to have on Board in terms of the messaging behind this from the very beginning. The other nice part about that is that it’s all really informal stuff. You can have Slack conversations, you can do an email thread, you can have a really low key meeting to just kind of go, hey, this is where I’m going with this. Is this, is this gonna flop? You know, what things can you folks think of might go wrong with this? And then as it gets refined and refined, then you can start opening the thing a little bit wider, make it a little bit more public, invite public. But by the time you’re doing that, a lot of the really obvious things were wrong. Because whatever you came up with is full of obvious things that were wrong. All the obvious things that were wrong will be taken care of. And you got kind of a mix of people will see it and go, yes, this makes sense. But also the wider audience confined to things that were way out there that no one thought of.

David: There’s a couple really interesting themes in what you’ve described. So first of all, I think you’re right to call out that this is something that is sometimes invisible to people in the middle of their careers. And I think it’s actually maybe one of the important keys to unlocking the next step of someone’s progression. But there’s sort of an unfortunate dichotomy there because given that it’s like the process can sometimes be sort of invisible by design, I think there’s a challenge in making sure that people understand how the process works and have their own opportunities for growth. So that’s, that’s sort of one thought that I have. You also mentioned the idea. Oh, okay, yeah.

Rich: Just before you move on, just quickly on that one thing that’s important is what the thing that I’m talking about doesn’t have to happen all at the senior staff principal level. Often the folks that are subject matter experts on one specific thing will be all over the org. And being able to pull them in and expose them to this is really good. For that matter, being able to pull them in and expose them to how things work like this is just good in general. And even though, you know, if you’re kind of thinking, well, I’m not sure about the value of bringing in so and so on this, sometimes it’s worth doing it just so that they have exposure to how the process works, I 1000% agree.

David: I think there’s still sort of a risk of bias there where the people that we choose, we might not be creating a sort of equitable environment. I want to circle back to that. I think we’ll have an opportunity to talk about sort of how we help bring folks up. But you also touched on organizational politics and sort of the ickiness of that. One of my favorite quotes from someone who taught me a few things is that politics are the way that groups of people make decisions. That was an interesting insight for me because it’s like, yeah, if I really think about it, you know, big groups of people, the only way that they can kind of move forward is by forming coalitions and making decisions in smaller groups and then, and then hopefully propagating their ideas or decisions to bigger groups. And there’s still good and bad ways to do that. But I think the interesting thing is that after a certain point, it’s hard to avoid sort of the idea that there is going to be a certain degree of quote, unquote, politics happening inside an organization as it grows. I’ve seen that happen as organizations transition through. And actually the other thing that we’ve been talking about of having ideas that start out within small groups and then get broadcast to the whole org later is when an organization gets big enough that you can’t just do everything, you know, inside one room, that’s like an interesting transition that happens, right, where people start to feel left out. And that sort of, I think, breeds the beginning of this feeling of like politics happening in an organization. What are your thoughts there? I don’t know if mitigating organizational politics is the right way to think about it, but like basically how to. What are some ways that you’ve found to navigate this both in terms of a leader, in terms of creating culture around it, and also when you’re sort of like when you’re finding yourself in a political decision making process and surely not a pager duty, but maybe in previous organizations that you’ve been a part.

Rich: Of, we have our stuff as well. But yeah, no, you mentioned bias and it’s really important. Awareness and intention is a huge part of it. When I mentioned that it’s useful to pull people in, if you’re only pulling in the people that you think are going to agree with you, or if you’re only pulling into the people that are just kind of like you, then you’re actually not going to get the, the kind of input that you want. The idea of starting with a small group and then enlarging the group as things get more refined, it’s really a scalability thing. You can’t get input from a lot of people right at the very beginning. And so I think the big thing about bias, I’m very lucky at pagerduty. I think we’re doing quite well in the diversity and equity side of things. I mean, I’m saying that as a straight, white CIS male, but compared to some other companies I’ve been at, I think we’re doing all right still, obviously a long way to go just by virtue of being a company in tech. But having some intention in who you bring in is definitely important in that respect. And again, that’s having the sense of who to bring in is one of those skills you kind of develop. I mean, fundamentally, if you’re trying to bring in people that agree with you, you’re wasting your time. Because what you want to do is you want to bring in the people that are going to disagree with you so that you can end up somewhere in the middle. You know, you can figure out together what the best way forward is. And as you start to enlarge that circle, bring in more people that will be able to, in a healthy, friendly, kind way, tear things apart and you end up with a better proposal and you end up with a better kind of foundation for that proposal. As an example, you know, there’s one of my peers at PagerDuty who is also named Rich. I call him my nemesis because he is the Rich that’s on the security team. And I figured that people on a security team would like having a nemesis. And I was right. He and I have some different opinions on how to do things in AWS sometimes. And so one thing I could do is I could go, well, I’m going to write a proposal and I’m going to find a bunch of people like my way, and I’m going to get them all on side, and then I’m going to go talk to Rich. And that’s not going to work. But right from the beginning, I know that look, I can go to Rich and I can say, hey, I’m thinking of doing this. I know you’re not going to like it. Let’s talk about it. And we do. And we end up coming, like maybe we, maybe we completely disagree and can’t get anywhere on it, but then we can talk about, okay, how are we going to get past this? Who else do we need to bring in? What are we missing? Or maybe the way that I was headed and the way that he would head on, whatever the decision was, we’ll find, I don’t want to say a compromise. Compromise is the wrong thing, but we’ll be able to use both of our positions to come up with an even better solution. From there, you take it to the next step, and you use the same function to figure out who to bring in and so forth. Really just the idea being you’re finding people who will have a perspective, whether it be a technical perspective or a, you know, a diversity perspective or both, or to really look at the idea and go, okay, like, tear this apart, please. Tear this apart. One thing that I very. I consider myself very lucky at PagerDuty is that it is an extremely kind place. People are very intentional, even. Even in disagreement, and especially in disagreement. Everyone understands that the reason that the disagreement is happening is because we want to make. Want to find the best outcome and produce the best results and so forth. And, yeah, we can get grumpy sometimes about discussing things, but it’s a very healthy kind of grumpy where, you know, if someone’s kind of not moving from the spot, then other people will not ask, hey, look, you’re. You know, you’ve really dug in here. Let’s talk about. Let’s stop talking about the proposal. Let’s stop. Start talking about why and really kind of dig in to see, well, what’s underneath all that? What are you worried about? What’s keeping you up at night? What’s making you uncomfortable here? And having an environment where you can have those kind of conversations is really nice to have.

Alex: That sounds awesome. Changing subjects a little bit. You mentioned earlier that you mentor folks, and I was wondering, what’s the process for you for doing that?

Rich: Yeah, it’s funny, I’ve got at pagerd a bit of an outsider opinion on mentoring. So we do have a formal mentoring program. I participated in it. I love participating in it. We kind of divided up into technical and leadership mentoring, But I don’t know. When I’m talking to folks that are technically mentoring, I’m often giving them kind of leadership coaching. When I’m talking to people that I’m leadership mentoring, then there’s always technical stuff that comes in. So it’s all very blurry. And the reason I say I have a better outsider opinion is that I find that formal mentoring relationships isn’t quite what I think of as mentoring. And what I think of mentoring is definitely spans, jobs, you know, having a mentor to me. And it’s probably just a case that the word means a couple different things. But there are a couple people that I have had the opportunity to mentor in my career, and I’ve watched them go, you know, job to job, and we keep in touch and it’s pretty informal, but, you know, someone will check in with me and go like, hey, you know, you were a lot of help to me way back when. I’m thinking, moving to this other role. Tell me what you think, or I’ll see that they started a job somewhere, and I’ll reach out and go, hey, you know, congratulations. It’s really great to see. To see their trajectory take off. And it’s a very different framing than the sort of formal mentoring programs that places often have in organizations. One of the challenges I have with the sort of the formal mentoring programs is that I don’t always have a ton of introspection for how things. How my brain works. I mentioned this back when I was talking about writing proposals that I don’t have a process. I just write shit down and eventually a proposal comes out of it. And I hope other people don’t do it that way because I’m sure it’s really inefficient. But I’ve been doing it that way for a couple of decades, and that’s just what I know. But, yeah, one of the. You know, one thing that I find a little bit difficult is having started at sort of the beginning of the beginning of the web, I started my career in 1994. I had opportunities for learning that just aren’t available these days because no one knew what they were doing back then. And I have a really hard time sometime working with people that have started their career in the last few years. And they’re saying, well, you know, like, how can I learn about all of this? Like, how do I learn how to learn things that I’ve never seen before? And it’s like the environment is so much different now with uptime, you know, availability requirements and compliance regimes and just the general complexity of things as well. And that a lot of cases, especially on the infrastructure side, are more about gluing things together than they are building things from scratch. And so sometimes I really have a hard time putting myself in the shoes of someone that started very recently and going, well. You know, one of the things that I consider myself pretty good at is approaching something that I have never seen before and kind of figuring out just enough about it to help someone solve a problem and to be able to kind of coach someone through developing that skill is hard. I do it all the time. Pairing is huge. I have a monkey brain right in the middle of my prehistoric brain. Aversion to the sort of these social parts of pairing. I’m going to sit down and I’m going To talk to this person while I do the work and I’m going to explain and they’re going to drive and stuff like that. I know that it’s a really useful technique, but in the middle of my brain there’s something saying, no, no social, go away. What you do is you go and you set a terminal. So I kind of push myself through that and I find that really useful to actually do stuff rather than have a, you know, a half hour discussion on, you know, so what’s on your agenda today? Type of things. That said, I find that the mentoring that I do at HDRT does tend to be the half hour meeting type thing. And one of the things I’m going to take away from having this conversation is that I’m going to try to stop doing that and start doing more hands on stuff with the people that I mentor. The other thing, of course is a lot of this stuff, especially again from the infrastructure side, is a little opportunistic in that the developing that ability, the technical ability to bring yourself into something you’ve never seen before and understand it, or on the leadership side to find yourself in a sticky situation doesn’t lend itself to a recurring meeting. And it’s a lot more. I just need to make myself available so that if the person that I’m mentoring in the formal mentoring program needs to pull me in, then they can.

Alex: One other thing that strikes me is I’ve been reading a lot about lots of things around resilience, but something that comes up a lot is that experts are very rarely good at explaining their expertise. Like, that’s just, it’s hard. And so one of the things that they, if you, if you’ve ever read the Etsy postmortem facilitation guide, they do a really great job at describing how having that first person account of the situation, the incident, can sometimes allow a less experienced engineer to live that experience through the retelling. And so it’s a way of exposing your expertise in a really interesting way. And I found that really interesting. I too have the same issue where it’s like, I didn’t know. I learned this 15 years ago. It’s hard for me to explain it. But when someone says, well, why did you type that command at that moment? Be like, oh, because I saw these 15 thingamabobbers and that led me to do this thing.

Rich: Yeah, that’s it. You get a weird feeling. It’s like that old joke about the repair person that places the X and invoices the company chalk $2, knowing where to place the X9998. And yeah, no, that really is a big deal. And one of the things that I’ve been pushing at PagerDuty for the last little while is moving from a prevention approach in incident review and postmortems and so forth to an adaptive capacity approach. And I think that’s exactly what you’re talking about there. The power of narrative is huge. It’s important of course, to record some facts in a timeline. And this many customers were impacted and this is how they were impacted. You can’t not do that just because you need to communicate that stuff out to the rest of the organization. But if you really want staying power, then you need to tell a story. And hearing those stories, and especially when you get those stories. The idea comes from David woods about being inside versus outside the tunnel, where the tunnel is kind of the viewpoint of the person in the middle of the incident. What they can see is very different than what you can see when you’re reviewing what happened afterwards. And staying inside that tunnel of a few different participants in an incident has such bigger learning opportunities. There’s definitely to be said that you know a lot of what builds up, especially again especially in sre, what builds up that experience is experience and incidents because you can read a lot about how things go wrong. But it’s something else to be inside the tunnel yourself and have that very limited view of what’s happening and develop that capacity for hunches. One thing that actually occurred to me a little while ago when we were talking about trade offs and risk and stuff like that is that I think folks that haven’t been in management or at least haven’t been exposed to working with senior management as peers in staff or principal role, is that they think there’s some decision making power that senior folks have, whether on the management side or the engineering side, less senior folks don’t have. When you’re an intermediate engineer, you’re going, well, I think this is going to work, so I’m going to try it. But if I were more senior than I would just know it was going to work. And I don’t think that’s true at all. All the way up to the CEO. What really comes with that seniority is comfort with uncertainty and being able to think about risk. And yes, also you have a lot of experience. So you’re right a lot. One of the Amazon principles we were talking before is right a lot of. One of the things about being in the tunnel in an incident is that you have a ton of uncertainty and you have to make all the decisions in that context of uncertainty. You can only see what you can see, even though in hindsight a whole bunch of other things will have become obvious. If you’re going to learn how to operate resilient systems and design resilient systems, then really understanding where those limits are is critical.

David: Yeah, that realization of, like, sort of all the way up the chain, everyone is making decisions with uncertainty, and there’s no sort of superhuman. There’s certainly people that have either more experience or more capacity than others, but broadly speaking, everyone is pretty similar, actually. At the end of the day, we’re all, you know, struggling with the same things. And so I think that’s an interesting realization to make as folks get further into their careers, is that, like, some people may be initially viewed as disconcerting, but I think it’s actually empowering. Right. It’s like you are capable of doing all the same decisions, of making all the same decisions as everybody else. You have access to the same information as everybody else. And that can be an empowering feeling once you sort of accept it. We’re almost at time. I do want to quickly. There’s two questions that we tried to ask everybody that comes on the show, and one of them is you already mentioned Tanya Reilly’s blog, and I think you mentioned a couple other resources throughout. But are there books or blogs or people that you have learned from and that you would recommend people check out who are listening to the show?

Rich: Oh, boy, that’s. That’s a good question. There’s a lot. A lot of the stuff that kind of shaped my thinking about things is probably considered a little bit old at this point. But in terms of more recent stuff, we didn’t spend a ton of time talking about incidents. But I do think a lot in terms of, you know, in terms of adaptive capacity and resilience and stuff like that. I wish I had a book list in front of me right now. I could name particular titles. Dava Wood’s the Field Guide to Human Error is great in that respect. There are other ones that are in my ebook library that I can’t think of right now in terms of working with people and that uncertainty and stuff like that I could probably do. I could do better with some names maybe, than particular things. John Alspaugh, obviously, if you would like.

Alex: To just, like, send us an email, what we can do is, like, in our posts, we can put it and we can make sure that people get it.

Rich: One specific thing I will call out there, though. Camille Fournier’s the Manager’s Path. Even if you’re not going into management, especially if you’re not going into management, is a must read just to get a flavor of what it’s like at each level of management. To tie this back to, we were just talking about uncertainty, really. The understanding that everyone is winging it the same way that you’re winging it is really like terrifying and comforting at the same time. People that are in senior roles didn’t get promoted there because of superpowers. They’ve got experience, they’ve got frameworks and stuff like that, but they’re still winging it. We’re all winging it. Yeah.

Alex: Okay, so the last question, and this is tongue in cheek fun, how much time do you find yourself coding nowadays?

Rich: Very little. The flip side of that is, you know, as a site reliability engineer that comes at it from the systems perspective. I didn’t code a lot before, but in terms of just individual work in front of a terminal, you know, it’s almost to the point where outside of proof of concepts and taking on honestly taking a lot, a lot of the stuff to help out the team rather than lead the team in terms of writing code, like taking on that toil is a great way to build build rapport. Very, very little. There’s not a ton of leverage I can have in a code editor compared to what I can have in Google Docs or in talking to people and stuff like that. And that’s even though I know a lot of people are going, no, not coding. That’s perfect for me. Being able to kind of do the technical management stuff that I loved when I was in management, but staying closer to the technology and not having people report to me, which I found extremely stressful, is perfect. Awesome.

David: Well, Rich, thank you so much for taking the time today. It has really been a pleasure. That’s it. Thanks so much for listening to Staff Eng. If you enjoyed today’s show, please consider adding a review on itunes, Spotify, or your podcaster of choice. It helps others find the show and is a really useful signal to us that folks are finding value in this so that we keep doing it.

Alex: You can find the notes from today’s episode at our website, Podcasts staffenge. Com. The website also has our contact info. Please don’t be shy.