My first six months at SoundCloud as an iOS engineer on the Recommendations team have just finished. In that time, I’ve already contributed to around 15-20 repositories. I have made a couple of changes to our frontend and backend, ranging from a minor improvement in tooling to a newly baked feature. I feel I contribute to the team and grow every day. And I’m happy.
Is this because I’m smart and hardworking? Well, this is just one part of it. 😉 But there’s something about the culture and practices here that enable me to succeed. So I decided to take some time to articulate exactly what these things are.
If you are new to a company, it likely means you have a myriad of random questions every day, perhaps every hour. But I’m an introvert — maybe you can relate. I don’t want to bother people, especially for things I don’t necessarily need to know. And I don’t want to ask questions without knowing how to ask them. Would I go to someone and say “I’m curious about this. I have no idea how it works or what it is for. Could you explain this to me?” No, never.
Even so, I’ve found that something like this is possible at SoundCloud because people here are very supportive. However, what really helped me were the engineering practices in place that allowed me to explore the code base, documentation, and infrastructure on my own, following my curiosity and learning what I could until I needed someone’s help.
So, here’s a collection of random things that I found useful for quickly onboarding. Some of them may be widely known practices in the context of SRE and microservices. Some of them may be so common that no one bothers to mention them. Nevertheless, I’d like to share them while I still maintain an outsider’s view.
SoundCloud is a distributed system built on top of a gazillion microservices, as explained in many previous posts. These microservices enable product teams to work autonomously, but doesn’t it mean that individual services grow so differently that I need to start over whenever I look into a new project? Well that’s not what’s happening at SoundCloud. Instead, I see common patterns appear in many places, again and again.
What do I mean by this? Well, we have shared frameworks, with which one can get many things for free. We have shared scripts that package and deploy applications. Most importantly, we have a dedicated team that continuously applies the latest practices to different services so that none of them falls behind too much and ends up confusing engineers with outdated information.
This means that once I learn one thing, I can transfer it to other projects, and the learning curve is not just linear.
In addition to the above, we have internal example applications that capture our best practices. These allow us to understand the essentials of how individual apps are configured, deployed, and monitored without being distracted by subtleties.
We can also quickly and easily start a new project using the example applications as templates. For the simplest application, we could start it in a day just by copying an example application and making the relevant tweaks to it.
No matter how good the documentation is, we sometimes still need to mess up things in order to know what we don’t know. But how can we do that without causing havoc?
Here at SoundCloud, we have a hands-on program that everyone can try. This program allows developers to provision a new machine, deploy the aforementioned example application to it, and set up monitoring for it in the same way our production services do.
As a new hire, you may have questions, some of which seem too basic that you are almost ashamed of asking. In fact, you might not even know where to ask them. So you finally work up the courage to post the question in a chat room that looks relevant. You keep coming back to the thread, hoping your question made sense and someone eventually replies to you, but no one responds in the end. Everyone knows this feeling. Understandably, people are often too busy to spend time clarifying what you want to ask, or maybe they expect someone else who knows the subject better will answer it.
At SoundCloud, many teams have a “first responder.” This is a person designated on a rotational basis to answer all the questions asked while they are in charge. Having someone like this greatly reduces the mental cost of asking questions because we know who we can bother, and there’s at least one person responsible for answering any questions.
If your company operates with a fairly big number of users, you probably have an internal admin tool where you can manage users and their contents. What I experienced in the past was that it was open only to operational teams.
At SoundCloud, everyone can access this tool, though its capabilities are limited by the permissions set for the individual user. When selecting an account, one can quickly tell which experimental features it is getting, what its contents are, when it was created, which program it subscribes to, and so on. This information often explains why a problem happens only to specific users, and it helps facilitate communication with other teams while debugging.
It’s common for new people to face permission issues, and if you need to talk to different people whenever you have these problems in order to gain access to something and solve the problem, it will likely discourage you from exploring new things.
Here at SoundCloud, in most cases, the way systems are provisioned can be found as code in a single place. Along with an internal code search tool, which enables fast cross-repository search, we can easily identify what we’re looking for by searching with an error message or our account name and fixing it ourselves by updating the relevant configuration.
When you start in a new company, it’s a great opportunity to find issues that many people have gotten used to and ignore. However, if you can’t narrow them down, your report probably won’t help. Imagine that you are a mobile engineer and find a 500 error that happens intermittently. If you report it ambiguously like “I got a 500 error when I opened screen X and it never happens again,” it probably just adds another ticket that no one ever looks into.
We have been making a rigorous effort to make debugging information useful and user friendly. For example, you can find an improvement we made to the tracing metadata in the Using Kubernetes Pod Metadata to Improve Zipkin Traces blog post. In a scenario like this, I can check the trace and tell exactly where things go wrong and who is in charge of the issue. With this information, I can rewrite the occasional 500 error report with more specific information, which can be like “An upstream service X timeouts when requesting information of user Y.”
Allowing people to work on side projects in working hours has been a popular concept since Google shared more about its 20-percent time policy, but it easily becomes a window dressing, and people don’t do it for many reasons. For example, they can’t find a suitable project or people to work with, or they can’t find time for it because project deadlines are always tight. And once the majority of people stop doing it, it’s hard for other people to justify continuing.
We have this same concept here at SoundCloud, but we refer to it as SAT, or self-allocated time. And I’m happy to report that I see many people actually taking time on Fridays for this. I myself use this time to work on the things I know the least about. It also gives me a chance to collaborate with people whom I don’t usually work with. It often pays off because it adds a new tool to one’s toolbox, which, sooner or later, becomes useful in regular projects too.
I put this at the end of the list because this looks fundamental, but it works as collective knowledge only if the subject can be accessed and tested by everyone, and if it’s supported by the things I mentioned above.
What I like about how we update the documentation at SoundCloud is that it’s not authoritative. Although people ask other’s reviews for things they are not too sure about or when they think that other people should be aware of a change, the documentation can always be updated straight away. This removes the mental cost of updating documents and encourages more people to do so, which, in the end, also allows for minor errors which might be introduced by this non-authoritative model to be quickly fixed.
As engineers, we’re all familiar with the idea of being driven by our curiosity so profoundly that we forget to sleep or feel like we can work forever (which of course is untrue). And while smarter people often perform well, there are many high performers who do so because they’re the ones most excited about learning and putting that learning into practice.
What I’d like to emphasize with this post is that you and your company can leverage those people by having good engineering practices in place and allowing people to explore things freely without barriers to learning.
Or, more simply put, let good people do their jobs.