This article explains why password hashing is important and how to do it properly on Android. The primary audience are software engineers working with passwords.

Password hashing or password-based key derivation takes a password from the user as input and generates key material as output. This is a helpful thing because many people struggle to memorize 256-bit encryption keys. The standard procedure is to pass the user password through the password hashing function, and then use the derived key for cryptographic operations. These operations can be symmetrical encryption/decryption of files or the generation of login tokens.

An attacker who tries to bruteforce such a system has two points of attack. Either they can try out all passwords or they can directly try all encryption keys. Generally, the search space for the passwords is much smaller and therefore preferred. This is because choosing high-entropy passwords is hard – as illustrated by this XKCD comic.

Therefore, password hashing is used as a defence mechanism that slows down the attacker. For every password attempt, the attacker needs to compute the password hash before being able to verify it. High costs for the password hashing step make up for the smaller search space of the passwords. The increase should be large enough to still hold if the attacker uses specialised hardware.

Salts and Iterations

One of the most straight-forward implementation would be to use a standard hash function such as SHA-256. The password acts as the direct input to the hash function, and the output is truncated to the desired key length. This approach works, but it has two important flaws.

First, on all devices the same passwords will produce the same output. This property allows an attacker to pre-compute a look-up table (or better: rainbow tables) in advance. They then can use it efficiently to find the password for a hash – and even reuse it for more attacks. The common solution for this issue is to include a salt value. A salt value is a randomly generated value that is stored with the password. It becomes a keying parameter to the hashing function in a construction such as HMAC.

Second, the computational cost of a simple hash function is not high and can be easily accelerated by purpose-built hardware. This issue can be improved by calling the hash function for multiple iterations with each round depending on the output of the previous one.

Both improvements have been combined in a scheme called password-based key derivation function - commonly referred by the acronym PBKDF2. In addition to the password, it also takes a salt and iteration count as input. It runs the underlying hash construction (e.g. HMAC) for iteratively with the final output depending on the initial password, salt, and the previous rounds.

PBKDF2 is still an okay choice and it is easy to use since it is part of the Android SDK. I suggest to use PBKDF2WithHmacSha1, because PBKDF2WithHmacSha256 is only available on API 26+. The code below runs in about 50ms on a Google Pixel 3.

import javax.crypto.SecretKey
import javax.crypto.SecretKeyFactory
import javax.crypto.spec.PBEKeySpec

fun pbkdf2(
    password: CharArray,
    salt: ByteArray,
    iterationCount: Int = 4096,
    keyLength: Int = 256
): SecretKey {
    val factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSha1")
    val spec = PBEKeySpec(password, salt, iterationCount, keyLength)
    return factory.generateSecret(spec)!!
}

However, PBKDF2 solely relies on CPU cycles for making its computation hard. This means that an attacker can scale up quickly by using GPUs and application-specific integrated circuits (ASICs). Therefore, newer password hashing also impose memory costs.

I will skip over some history (bcrypt, scrypt) that is less relevant for Android and jump directly to Argon2.

Argon2

In 2015 the Password Hashing Competition looked for a better password hashing function and found Argon2. Argon2 defends against specialized computation hardware by enforcing memory requirements. It does so by filling a fixed amount of memory and performing pseudo-random look-ups based on its internal state. Emulating this computation is very hard for devices that have less memory available, as they would have to backtrack a lot. Both the CPU cycle costs (iterations) and memory costs can be specified independently.

Unfortunately, Argon2 is not (yet) available in the Android SDK. Therefore, I have created a light-weight library for Android called Argon2Kt. It bundles the Argon2 C-Library and calls it via the Java Native Interface (JNI). The following shows a sample call that takes about 60ms on a Google Pixel 3.

import com.lambdapioneer.argon2kt.Argon2Kt
import com.lambdapioneer.argon2kt.Argon2Mode

fun argon2(
    password: ByteArray,
    salt: ByteArray
): ByteArray {
    val argon2Kt = Argon2Kt()
    return argon2Kt.hash(
        mode = Argon2Mode.ARGON2_ID,
        password = password, salt = salt,
        tCostInIterations = 1, mCostInKibibyte = 37888
    ).rawHashAsByteArray()
}

Parameters to choose

Both PBKDF2 and Argon2 come with various parameters to choose and tune. The Open Web Application Security Project (OWASP) recommends to use Argon2 in its ID mode. The ID mode uses both password-dependent memory access rounds (good for memory hardness) and password-independent rounds (good for side-channel resistance). Following, I provide wall-time measurements on a Google Pixel 3 for the minimum parameters recommended by OWASP for both PBKDF2 and Argon2.

The suggested iteration counts for PBKDF2 are high because its security just relies on CPU cycle costs. For PBKDF2WithHmacSha1 it is 710,000 iterations (about 9500ms) and for PBKDF2WithHmacSha256 it is 310,000 iterations (about 2200ms).

Argon2 can get away with much lower iterations counts because of its memory hardness. Based on available memory, one of the following is a good baseline to choose: Memory costs of 37 MiB and 1 iteration (about 60ms) or memory costs of 15 MiB and 2 iterations (about 45ms). In comparison to PBKDF2, Argon2 runs significantly faster. This gives plenty of room for even more conservative parameters.

Faster is Better

It might sound unintuitive at first that we want the password hashing implementation to run as quickly as possible. Have we not said that it should slow to make it expensive for the attacker? Yes, but we expect the attacker to have the fastest theoretical possible implementation available (potentially using custom chips). They do not gain anything if your local password hashing runs faster. But having a fast version ourself allows choosing stronger parameters while keeping our apps fast.

Credits: cover photo by Anton Maksimov juvnsky on Unsplash