When we were developing our SoundCloud app for Xbox One, something became very obvious during usability testing: signing in with a game controller really sucks. Entering text requires navigating a virtual keyboard to individual letters, numbers, and characters one at a time – such a nightmare! Plus, letters, numbers, and special characters are spread across three screens. The more secure your password is, the worse the experience is.
We needed a way for people to sign in to the app some other way. Something secure and simple and fast.
The person wants to sign in to SoundCloud on Xbox One. They visit soundcloud.com/activate on another device, such as their phone or laptop, and enter a short code. Boom, they are now signed in to SoundCloud on Xbox One. Magic!
We needed to consider that the person might initiate this flow on a laptop, tablet or a mobile phone. We hoped to take advantage of the person already being signed in on their second device.
Consider a person who has two devices:
The person wants to be signed in to your service on Device A. The person has Device B nearby.
The app on Device A requests a short easy-to-read code from your service and displays it along with an easy-to-read URL.
The app on Device A polls your service every few seconds to see if the code is associated with a user.
The person pulls Device B from between the couch cushions, opens a browser, and goes to that easy-to-read URL. On Device B, the person might already be signed in to your service. If they aren’t, they either sign in or create an account.
Once authenticated, the person is prompted to enter the short easy-to-read code.
Device B asks your service if the code that was entered is valid. For security reasons (outlined in a later section), the person is asked to confirm that they want to allow Device A to be signed in as them.
Upon confirmation by the person, Device B updates the code using the authenticated user’s access token, to associate the code with the target user.
At the moment of confirmation, the code is associated with the user. The next time Device A polls, it sees that it can use the code to request an access token for that user. Device A then makes a request for an access token using the code.
Et voilà! Now Device A has an access token for the target user. In other words, the person is signed in to Device A as the desired user.
This pattern is not something we invented. Our flow is influenced by a similar flow used by YouTube on TVs and Google Sign-In for TVs and Devices.
This method of signing in means the person does not have to enter their password on Device A. If they are signed in to your service on Device B, it is possible for them to avoid needing to sign in again at all.
At SoundCloud, we have a web app at soundcloud.com, and we have native apps available for Android and iOS.
If the person visits soundcloud.com/activate on a laptop, we use our standard means of requiring authentication before showing the prompt for the person to enter the activation code.
On mobile, we chose to reuse the same “activate” web app opened in a web view in the native app. If the person visits soundcloud.com/activate on a mobile device, we invite them to open our native app. Inside the app, we use our standard means of requiring authentication, and then we open a web view with the user’s access token appended in a URL fragment to an oauth2-style callback web address:
This allows the web app to get the access token from the URL fragment. We put secrets in a fragment instead of a query parameter so that secrets are not sent to the web server and therefore not stored in the web server logs. This is the recommended way of passing access tokens and other secrets in a web address.
We also instruct the web app how to display the page using a query parameter:
You could of course build this form into your native app directly. We like that our implementation allows us to iterate quickly on this brand new feature. We can easily make security or usability improvements to the interface without shipping a new version of our native apps.
You want the code to be as short as possible while still having enough possible codes available so that everyone who wants to sign in at the same time can.
You also need to have a sparse enough usage of the possible codes such that it is unlikely that a typo will result in the person entering a valid code that has been issued at the same time to someone else.
Imagine if your codes were one number long, ranging from 0 through 9. The risk of an unnoticed typo is very low, but you could still only issue 10 codes at one time.
If your codes were two letters long, then you could issue 26 x 26 = 676 codes at the same time. There is now a risk of an unnoticed typo, which means you can’t safely issue all 676 codes at the same time.
If you use just numbers, and your codes are 6 digits long, then there are one million codes available, assuming you allow leading zeros. Pretty good! That might be enough for your needs.
If you use just letters, and your codes are 4 letters long, then there just under half a millions codes available. Still pretty good, and the codes are shorter. Not bad!
If you chose to use a mix of letters and numbers, then do not use characters that are hard to tell apart. Specifically, the letter O and the number 0 are hard to tell apart, as are the letter I and the number 1. Even if you use fonts that make these characters distinct, you should avoid these altogether.
Definitely don’t use special characters. That’s just asking for usability problems. If you use letters, display the codes in uppercase for readability, but verify the codes with case insensitivity. Our form for entering the code uses CSS to transform the characters into uppercase for ease of entry.
We chose to use a mix of letters and numbers, and we chose to use 6 characters, which gives us over a billion codes that we can issue at one time. If a billion seems like way more than one needs, read on for further security considerations.
In its most basic form, this flow has a number of security problems ranging from mild and unlikely to wildly irresponsible. Mitigating these security considerations requires adding details to the basic implementation.
When the person visits the your service on Device B, a user may already be signed in there. People who share a device, or those with more than one account on your service, may want to sign in with a different user than the one that is currently authenticated. If they are not informed which user is authenticated on Device B, they might accidentally grant Device A access to the wrong user.
Show the person which user is authenticated on Device B, or display several authenticated users, and allow them to choose which user will be granted access.
Imagine that two similar codes, such as XXN and XXM, were issued at the same time. XXN was issued to Device AN and shown to Person N. XXM was issued to Device AM and shown to Person M. Person N is authenticated as User N. If they mistype and enter code XXM before Person M does, then Device AM will become authenticated as User N. Oops!
Have a big enough range of possible codes that typo collisions are unlikely. If you have very a large range of possible codes, the chances that a typo is also a valid code is very small.
If possible, collect information from Device A when you create the code that the person could recognize when they are granting access. For example, if the device has a name like “Deejay’s TV”, then collect that from Device A, and show this information to the person when they are confirming that they want to grant the device access. We found that we did not have access to such data on Xbox One, however, so the best we could do was capture the type of device.
An attacker could destroy the ability for people to sign in by exhausting all the available codes. If your codes are generated on request, then as your possible codes are being used up, your service may have a near impossible time finding one it can issue.
Implement a rate limit for creating codes. Limit based on all the facets of the request that you can, such as the IP and any unique device identifier. You will need to investigate to determine what the peak rate would be for legitimate traffic.
Your codes should expire so you can reuse codes after a time. If you reuse codes, then you need a large enough pool of codes so that you don’t re-issue the same code too soon between uses. Your pool of codes should be able to carry you through at least several days before you need to start re-using them.
An attacker could try to poll for random codes, hoping to find one that is activated so they can exchange the code for a user access token.
You could have such a very large number of possible codes that guessing a valid one is nearly impossible. However, a code that works for a person needs to be short and use a limited character set, and a bot can burn through guessing many codes if they are easy for people.
You could add rate limiting for polling, and then cross your fingers and hope that you have enough possible codes, and your rate limits are good enough that the attacker won’t be able to guess a valid code while staying within your rate limits. You should add rate limiting for polling, but there is a much more secure way to mitigate this risk.
When you issue a code, you should also issue a very long, hard to guess polling token. You should require this polling token for unauthenticated requests for the code. You can see this secure feature described in the Google Sign-In for TVs and Devices documentation. They call the easy-to-read code “
user_code”, and they call the polling token “
There is a category of attacks where an attacker attempts to influence their victim to take an action that is not in their best interest. These are called “social engineering attacks”.
For example, the attacker could take advantage of your UI design choices and their ability to supply some aspect of the display text, and use it to trick people into thinking the UI does something different than it does.
Take this example, where “Deejay’s TV” is an open text field called device_name that is supplied when the code is created:
From a design perspective, this UI seems succinct and clear.
Now imagine an attacker sends a victim an email saying “Visit this URL and enter this code to get a FREE subscription!” When the victim gets to the page and enters the code it looks like this:
The attacker has created a code with the device_name “SoundCIoud Premium Add-on”. The design makes it possible to fool people into giving away access to their account.
Use wording and design elements that make it clear what is happening. Some labels and text might seem redundant in the standard scenario, but are necessary to prevent this type of attack.
When the code is created, you could store the requesting IP or resolved geography, and then provide a warning if the code is entered in suspiciously different geography.
Have tight enough expirations on your codes such that this kind of attack would need to occur in a tight time frame. If your codes are only valid for 5 minutes, then the attacker has to find a way to provoke an action from their victim within 5 minutes.
From the very beginning, we included colleagues from our Security Team to perform a Threat Model analysis. We built out our API design using this analysis as a basis, instead of waiting until later to consider the security implications. During development, we had an eye on how to secure our new feature at all times. We adjusted the graphical interface based on a security review as well.
Our team really enjoyed thinking through this problem. We were able to make the painful experience of entering text with a game controller seamless for our users. We had a lot of fun delivering a shiny new authentication experience that’s secure and feels like magic.