Encryption / Useful Notes

https://static.tvtropes.org/pmwiki/pub/images/security_3437.png

See, this is how you spot the truly evil: those who go beyond rubber hoses

and use metal wrenches instead.

In cryptography, encryption is the process of transforming information (the plaintext) using an algorithm (the cipher) and a secret (the key) into something unreadable to anyone except those possessing the key (the ciphertext).

What it comes down to is that information is scrambled and that some other person can only unscramble it if he/she knows the key used to scramble it. Cracking the encryption is figuring out how the information was scrambled. Encryption can be done both by hand and with a computer. Note that doing encryption by hand has some limitations; for instance, it is impossible to encrypt large amounts of information by hand due to the time it would take.

Symmetric vs. asymmetric encryption

There are two main forms of encryption today: symmetric and asymmetric.

Symmetric encryption is the classic form of encryption as is known today. The plaintext is encoded into ciphertext using a secret key; the recipient, to decode the message, must know the secret key that was used to encode it. The name refers to the fact that encryption and decryption are inverse functions and both use the same key to work (thus symmetrical). This was the first form devised — primitive versions (simple substitutions of one letter for another, or shift-by-x codes like the Caesar Cipher) go back to ancient times, with increasingly sophisticated variations being developed as the arts of codemaking and codebreaking advanced. Eventually, the algorithms became so complex that machines (such as the Enigma device used in World War II) were required to encrypt and decrypt messages with reasonable speed and accuracy.

Asymmetric encryption is a newer form of encryption, devised in The '70s; in this form, the key used to encrypt the message and the one used to decrypt it are not the same. In an asymmetric cipher, each party has a pair of keys: a public key and a private key. If Alice wants to send Bob a message, she uses Bob's public key to encrypt the plaintext, and Bob uses his private key to decrypt it. Public keys, as the name indicates, are not required to be secret; private keys are. In short, encryption and decryption are not inverse to each other in asymmetric encryption schemes, hence the name.

The advantage of asymmetric encryption is that there is no need for the sender and recipient to know a shared secret key. Suppose you wanted to send an encrypted message to somebody, and you tried to do so using a symmetric cipher. How would you send them the secret key if you're concerned that somebody might eavesdrop? To send them the key, you need to use a special, secure channel that is resistant to eavesdropping—for example, an in-person meeting.

Another advantage of asymmetric encryption is that it can be used in reverse, encrypting a message with your private key to create a ciphertext that anyone can decrypt using your public key. This gives the message a digital signature that proves that only the private key owner could have written it, because it was encrypted with a key nobody else knows. The two processes can be combined, so that Alice can send Bob a message that nobody else could have written and nobody else can read.

The biggest practical disadvantage of asymmetric encryption is that you need to "trust" that what you think is the recipient's public key really is theirs, and that their private key has not been disclosed — we can pretend to be the President of the United States and send you a public key, and if you mistakenly believe us, you might unwittingly send your top secret messages to us instead of the President and accept our digitally signed messages as if they came from the President. The normal method to verify this is for a third-party public key repository to digitally sign and store usernames and public keys. This does require a third party trusted to be impartial and an accurate record-keeper. Another alternative is a "web of trust", in which people sign each other's keys, so that (for example) Alice can verify that Carol and Dave have signed Bob's key, vouching for the fact that it actually belongs to Bob. Alice then decides whether or not to take their word for it (much as she would if Carol and Dave were vouching for Bob in-person).

Another disadvantage of asymmetric encryption is that it is more computationally expensive than symmetric systems. Most secure encrypted channel schemes use a hybrid of asymmetric and symmetric encryption to get around this problem: the asymmetric encryption is used solely to transmit a randomly generated one-time symmetric encryption key, which is then used in symmetric encryption for the bulk of the transaction proper.

Security certificates — the bits of bits that tell us that individuals online are in fact who they say they are — use the above described digital-signature technique to generate a "seal of approval" that can be read by everyone, but only manufactured by the issuing authority.

There is also one other consideration for asymmetric encryption: it must be impractical to calculate a private key from a public key, or the system can easily be broken. Quite a few algorithms amount to what are essentially very difficult mathematical problems, such as working out the factors of a number which is the product of multiplying two large prime numbers together — easy to multiply them, but very hard to work out which numbers you used from the final number. Quantum computers can potentially solve some of these problems that would never be practical on a traditional computer, and ensuring a system is secure against quantum attacks has been a consideration for some time now.

One major problem with encryption of these types is the concept of diffusion. With simple encryption, while the data may look different from the original, the patterns within the original can still remain, which is considered low diffusion. For example, if you encrypt a picture of the TV Tropes logo, the output will still resemble the logo, even though none of the colors are the same as the original (another example can also be seen in Figure 2 in this article). To get around this, various methods use some form of scrambling the input in a reversible way such that no easily discernible pattern can be made in the output, which is considered high diffusion.

One-time pad

The one-time pad

is a special kind of cipher that is completely unbreakable if used correctly—BUT, is very weak if used incorrectly, and also very impractical. The trick is that the key must be at least as long as the plaintext, must be completely random, and must never, ever be reused.

The reason one-time pads are unbreakable is that for any conceivable plaintext, there exists a possible key that would produce that plaintext from the encrypted message. This means that if you try to guess what the key is, there are exponentially many more false positives than the real message, and no way to tell a false positive from a true positive. The only information an observer can gain is the length of the message, and if we assume padding is used ^note, the maximum length of the message. Otherwise, with perfect use, you gain no information on the message contents.

But if the users of a one-time pad get sloppy and reuse a key for more than one message, it becomes trivial to break ^note. If the keys are not truly randomly generated, it can be broken, too. A number of historical codebreaking successes resulted because somebody tried to use one-time pads but either reused the keys or generated them in a non-random fashion.

Then there is also the problem of communicating the keys, which is even harder than in the normal case because: (a) you need as many keys as you have messages, and (b) the keys are at least as long as the messages.

While using one-time pads to protect a whole conversation is cumbersome, a related idea of throw-away cipher input is the cryptographic nonce, which prevents replay attacks from happening. A replay attack is when an attacker can take encrypted data (such as login credentials or an order request) and send it to the server to achieve some goal, without ever needing to know what the data actually contains.

To do this, for login authentication, Alice first gives Bob a nonce. Bob combines this nonce with his password, runs it through a hashing function (described later), and sends the result to Alice. Alice then applies the nonce she generated with the password she got from Bob when Bob created his account and runs it through the same hash function. If Alice gets a matching output, she knows it's Bob. Every time Bob wants to log in, Alice gives Bob a new nonce.

However, there's a slight flaw to this. Eve, an eavesdropper, can pretend to be Alice to ask Bob, Charlie, and others for their passwords using the same nonce. From their responses, Eve can try figure out the log in information using cryptoanalysis. To combat this, Bob can also generate a nonce then run the hashing function with his nonce, Alice's nonce, and the password. Then Bob sends his nonce along with the hash function output, and Alice will use Bob's nonce along with hers to generate what should be expected. Since Alice and Bob are both using nonces, it's virtually impossible for Eve to decode the data.

Another use of the one-time pad are for two-factor authentication systems. This can be either in the form of a generated throw-away key similar to a nonce, such as when a login system sends you a code to type in, or in the form of a continuously changing key, such as the token fobs or authentication apps found on smartphones.

Cryptographic hash functions (One-way Encryption)

Suppose you want to store sensitive information for a challenge/response, like a password to an account. Obviously, storing passwords in a database in plaintext is highly insecure. You could encrypt the password and store the encrypted version, but this presents several issues:

You need a key to encrypt something, and that key must be stored in a place where the database can access it.
If someone dumps the database of the encrypted passwords, you can easily see when multiple people have used the same password.
Most encryption methods output varying sizes based on the input. AES-128 for example, outputs 16 bytes for every set of 16 characters you input. An attacker could use this to deduce how many characters the plaintext has.

This is where the cryptographic hash function comes into play. A hash function takes the bytes that make up the data, adds them together in a convoluted manner, and spits out a number of fixed size called a digest. That is, for any input, the function will spit out a value that has the same number of bytes. Hashing, as it's called, is one-way, hence "one-way encryption". That is, theoretically, you cannot map a digest back to its original content because a digest has an infinite number of things that could map to it. This characteristic obviously has the problem that there's an infinite number of things that can map to the same digest, called collisions. So for a hash function to be suitable for cryptography, it must have a reasonably large digest size to make the chance of a collision prohibitively small for someone to guess via brute force. Another characteristic is that for a small change in input, the function must create an output so that no obvious relationship can be made between the outputs for a given change.

However, the hash function itself is not enough. If you input the same thing in a hash function, it'll spit out the same output. To combat this, salt is added. Salt is a random value that's included with the input to a hash function and is also stored with the user account. ^note This way, in the password storing database, even if two or more people use the same password, their hashes will be different due to the salt.

Running the password with salt through a cryptographic hash function solves the three issues noted before with encryption:

No key is necessary.
If someone dumps the database of hashed passwords, you can't tell if people used the same password.
The outputs are of same size, so you can't tell if someone used a 6-letter password or a 100-letter one.

There is also another benefit to using salting: it prevents calculating hash values offline in advance ^note If it wasn't used, an attacker could build up a list of hash values and all they would need to do is compare them to a leaked database and they have the passwords - essentially these were already cracked. Given that a way of making hash functions more secure is to make it take longer so that you can make fewer guesses a second, this would be undermined if someone could do all the hard work before stealing hashed passwords allowing them to be immediately exploited. Salting effectively means an attacker has to start from the beginning every single time, slowing them down.

Aside for storing passwords securely, a superset of that use is that cryptographic hash functions are used to verify the integrity of any given piece of data. For example, when downloading files off the internet, you want to make sure that the files weren't tampered with. The provider of the file can say what the hash digest is supposed to be and then you can verify it by running it through the same hash function. It would be extremely difficult for an attacker to change the file in such a non-obvious way to have the hash function spit out the same hash value. Hashing is also used in cryptocurrency to maintain the integrity of the block chains.

Homomorphic encryption

A problem with encryption is that the encrypted data cannot be changed, otherwise decrypting it will result in garbage. But what if you can make modifications to the encrypted data and when you decrypt, those changes carry over? That is, say Alice wants to know the answer to 2 + 5 but doesn't want Bob to know she's using 2. She encrypts 2 as 5, Bob adds 5 to the encrypted value, and then she decrypts the value to get 7. This is the idea behind homomorphic encryption: allowing people to change the data, making it appear the changes to encrypted data are still valid, and keep those changes upon decryption, but not know what the data actually is.

Like encryption in general, homomorphic encryption starts with a key to encrypt and decrypt the data. However, instead of hiding the information by swapping bits around, it encodes the data by using a math function. For example, let's say Alice has an array of numbers: 9, 0, 2, 6, 7, 2, 8, 1, 6, 8. She can apply a simple "add 2" cipher to it to get 11, 2, 4, 8, 9 ,4, 10, 3, 8, 10. Then she asks Bob to add these numbers together. Bob does so and gives Alice the answer "69". As far as Bob knows, Alice just asked him to add a bunch of numbers but he doesn't know what Alice actually wanted. Alice can apply the opposite of the "add 2" cipher, achieving the real answer of 49. Of course, this is a highly simplified example; more complicated functions are used in practice. Also, for certain schemes not every math operation can be used on the encrypted data. If that's the case, then the encryption scheme is known as Partial Homomorphic Encryption (PHE). If any math operation can be done, it's known as Fully Homomorphic Encryption (FHE)

One area where homomorphic encryption is being considered is where data needs to be processed, but data privacy is a concern. The medical field is one such area. By using homomorphic encryption, a doctor can send encrypted data about a patient to a third party for predictive analysis. Should the third party have a security breach, the attacker can't get anything useful out of the data the third party has, because the third party doesn't even know the actual values of the data it was working on.

Cryptanalysis

The act of analyzing the cipher and the ciphertext in order to retrieve the original plaintext. It is not true that any ciphertext can be cracked. Using a wrong key can sometimes result in a valid-looking plaintext that is in fact not the correct plaintext (one-time pads are all about this).

To recover a plaintext from a ciphertext, the key and the algorithm used are required. Having only the ciphertext is the hardest problem: the cryptanalyst must guess both the algorithm and the key. This is called a "ciphertext-only attack" and it requires the experience and the intuition of the analyst, knowledge of the circumstances, the sender, the receiver, current events, etc... While statistical analysis of the ciphertexts could provide information about the algorithm, it requires plenty of ciphertexts or it doesn't give any meaningful information. With modern encryption algorithms, ciphertext-only cryptanalysis is basically impossible no matter how much data you have.

If the algorithm is known, the recovery can be easier: only the key (usually a password, though other things can be considered as keys) is required. When evaluating the security of an encryption system, it is prudent to assume that the attacker knows the algorithm (a dictum known as Kerckhoffs's principle, named after cryptographer Auguste Kerckhoffs).

The simplest method of cracking a password is known as "brute force": trying every possible password. The problem with this is that it can take a very long time to find the right password. The number of possibilities for a password increases with every character added to the length of the password and every character added to the range of options. For example, if you wanted to to find a password that was six (uppercase only) letters long, you might have to try 26⁶ = 308,915,776 possible passwords. At the rate of a thousand guesses per second, it would take three and a half days to run through the list. Trying every seven-letter password at the same rate would take three months. If, instead of uppercase letters only, the passwords use lowercase letters, uppercase letters, and digits (26 + 26 + 10 = 62 options for each character), a six-character password requires 1.8 years to exhaustively search at this rate, and a seven-character one requires 111.5 years. It's also worth noting that on average, an attacker would have to try half the passwords to succeed.

The problem for the user is that memorizing a truly random string of characters is very difficult. It's easier to use actual words as passwords. However, this is more vulnerable to brute-force attack: the number of words in the dictionary is much smaller than the number of random combinations of characters. Using odd spelling (such as "leetspeak" substitutions of other characters for letters) and using unusual words makes a dictionary attack more difficult; however, sophisticated attackers will use an exhaustive vocabulary and try a range of variations for each word. Ultimately, this adds very little password strength ^notein technical terms, information entropy and makes the passwords harder to remember, meaning people are more likely to use a simpler password. Any trick you can think of to make a 'strong' but memorable password is one that will be accounted for ^note

Another problem is the incredible growth of computing power over the past few decades. The example above for a seven letter password has around 8 billion combinations. Sounds like a lot, right? And at a thousand guesses a second, it would be. Except we can do a lot more than that. For example, if we have our passwords secured with an MD5 hash (don't do this!), a single RTX 4090 can calculate 164 billion hashes a second. A cluster of 12 GPUs is just shy of 2 trillion a second. To say nothing of the resources available with cloud computing, for those willing to open their wallets — and for the right target, it can be worth it.

It is possible to combine randomness and easy memorization using tricks such as remembering a phrase and using the first letter of each word (e.g. "This website will ruin your life" becomes "Twwryl"). For a strong example you would need this to be significantly longer, and avoid any well known phrase which will be caught up in a dictionary attack. Another option is to use a password manager program to store an encrypted database of passwords; the user then only needs to remember one master password to access all the others.

Of course, if the encryption algorithm itself is weak, even an unguessable password won't help you. Cryptographers consider an algorithm broken if there is a way to figure out the key faster than brute forcing it. Sometimes, this is only of theoretical interest (like if, say, it would still take longer than the age of the universe, with or without the faster speed). Other times, the algorithm is so broken and/or outdated that the key can be recovered quickly and easily (as was the case with the DES cipher, which was designed in The '70s and proved unable to keep up with the rise in computing power by the late '90s, which forced people to resort to the more expensive Triple DES while also spurring calls for a replacement that eventually gave us the current AES standard). There are a large variety of attack techniques using advanced math, and new cryptosystems are expected to show evidence of resistance to them. If, after years of analysis by expert cryptographers, there aren't any practical attacks discovered, then it's considered probably secure. That little code you created yourself, however, doesn't stand a chance.

As mentioned above, the key doesn't have to be a password. For example, in Cryptonomicon, two people communicate using the "Solitaire cypher". The cypher uses a deck of cards; their initial arrangement is the key leaving 54! (54 factorial, 54×53×52×...×2×1 = about 2.3 × 10⁷¹) possible keys and no dictionary to use.^note

The knowledge of the plaintext or parts of the plaintext (so-called "cribs") can make a cryptanalysis problem exponentially easier. The plaintext - or parts of it - could be acquired by old-fashioned spying or, more inventively, by feeding the mole. This is called a "known plaintext attack".^note

And then (as the xkcd comic at the top of the page illustrates) there's the age-old standby of rubber-hose cryptanalysis — beating/torturing the key out of a holder. (The name comes from the rather vivid image of the keyholder being beaten across their bare feet with a rubber hose). This does not have a direct counter, but many applications (such as VeraCrypt) allow a defense based on plausible deniability for an encrypted volume to decrypt to a 'decoy', which hides a second encrypted volume with a different key. Thus, someone coerced into giving up a key can reveal one secret while hiding a bigger one. The interrogator may suspect the presence of a hidden inner volume, but its existence can not be proved or disproved.^note

Of course, no encryption can protect you from stupidity. If you ever find yourself in a situation where the Secret Service is digging through your trash and anything you say might spell your doom if it ever gets in the wrong hands (because, be honest, who doesn't get into situations like this?), remember the following:

Use good passwords. Single words that can be easily guessed will easily fold under a dictionary attack, and short passwords are relatively easy to brute-force. There are lots of resources regarding strong password generation on the web.^Why?
Keep the keys secret! This is pretty obvious; if someone knows the key, your encryption is fucked.^Why?
Don't re-use your keys. If, for example, you use the same password to log into multiple websites, your key is only as safe as the weakest protection it's under.^Why?
Choose the algorithm carefully! Don't use any algorithm that has been cracked (such as the Enigma)!
- On the developer side, NIST and OWASP regularly publish lists of algorithms recommended for application usage.^Which?At the moment of this edition, some recommended algorithms are (1) any proven AEAD implementation of AES (for symmetric crypto), (2) RSA and ECDH (even better) for asymmetric crypto, (3) SHA-256 at a minimum for hashing, (4) Argon2 or bcrypt for password hashing (use PBKDF2 with a high iteration count if you can't use either) and (5) HSMs for crypto operations if available (if not, rely on the underlying OS's functions). Algorithms like DES, MD5 and SHA-1 are already declared insecure due to brute-force attacks on them being feasible, so don't use them unless you have to provide compatibility with legacy systems.
  - A special note on using Argon2 and Bcrypt for password storage. These two algorithms are known as slow hashing algorithms. Which begs the question, why would you want something slow? It's because the cost to run these algorithms to generate rainbow tables is prohibitively expensive compared to the server running them. The server only needs to run these algorithms when either storing a new password or comparing an inputted one. Assuming nobody's actively trying to attack the system by repeatedly submitting passwords (which your server should have a rate limiter on clients), the server isn't going to be busy with these. But for an attacker who wants to generate rainbow tables or try to brute force someone's password, with a high enough work factor, even with a powerful GPU, the number of guesses per second drops dramatically. This is also why if you don't have access to these two for some reason, PBKDF2 with a high iteration count works. But the overall takeaway is, make it hard for the attacker to generate guesses, because 1 second for a server to generate a response is nothing compared to 1 billion seconds an attacker could see trying to make a rainbow table.
- On the user side, if you need to protect your documents, use properly implemented, well-backed products like LUKS, Veracrypt, BitLocker, and any decent OpenPGP implementation.
If you're a developer, whatever you do, NEVER make up your own encryption. For that matter, try to avoid writing your own code to implement existing cryptosystems, too, and use existing protocols and libraries as much as possible. Encryption is notoriously difficult to get right, and you almost certainly won't.^Why?
Be weary of tells, habits, and other repeated phrases you use. What allowed code breakers to defeat Enigma (among other things) was that the German military always sent the same type of message at specific times and ended each message the same.^Why?

And finally, a few bonus tips:

If you are going to keep any password secure, your email account is damn good one to use a unique password for? Why? Because many services will allow you to do a password reset using only your email account. If an attacker has access, they might not know your password to a particular account you have if it's not shared... but they can reset the password and gain access, meaning your email password is effectively a master password. Additional security measures like 2 factor authentication (2FA) can help prevent this. If you want to be especially security conscious, you can use multiple email accounts.
"Security questions" are essentially anything but. In many implementations they act as weaker second passwords, and as a backup way of authenticating a user. There is no sense in using a near uncrackable password like "xBzZxTEDCem6JQGnbwEf" if an attacker can claim to have 'forgotten' the password, and then have to answer the question on your birthplace to gain access to the account, information that can often be guessed or retrieved from social media, or in many cases is already available due to data breaches (and there have been a lot of data breaches). If you are forced to use a "security question", the best solution is to make up the answer using another randomly generated password and store this information in a password manager along with your main password.
Biometrics can provide convenient access to devices and services, but for security you're probably better off without it. In addition to making you directly a target (consider the story of someone having a car stolen by having their finger cut off to unlock it), if that data is lost or stolen, you can't exactly go and order a new set of fingerprints. Some jurisdictions will allow the authorities to force users to grant access to a device using biometric data, or forcibly obtain that data during the course of their work (e.g. taking fingerprints upon arrest at a police station), and then try to use that information to gain access to a confiscated device, but users are not forced to disclose keys and passwords. This means that there are places where biometrics provides a huge backdoor that would not otherwise exist if using normal passwords or a PIN. You also can't claim to have forgotten your fingerprint.
Tying into the above, this also applies to using physical tokens as the only means of protecting access, as these can often again be legally confiscated. However, most instances will use them as part of multi-factor authentication, alongside a password or similar.
Note the difference between multi-factor authentication and mere multi-step authentication. Multi-factor authentication requires at least two of the following: what you know (passwords, PIN, etc.), what you have (smart card, security token), and what you are (biometrics, something else unique about you). If for instance, you log in via your smartphone to a place that sends you a verification code via text message, this isn't multi-factor authentication per se. Sure you need a phone to receive text messages, something a PC normally can't do, which might suggest it satisfies the "what you have" requirement. But authenticating via a smartphone means it's the same device, so it doesn't count. Not to mention, it's relatively easy to intercept text messages as long as the attacker knows your phone number. To have actual multi-factor authentication, you should use something like a security key fob (such as a Yubikey), and while the previous points about biometrics are valid, they also count as "what you are," which can increase security if used correctly.

See Hollywood Encryption for the usual treatment of cryptography in fiction (which generally involves a lot less detailed analysis and a lot more technobabble).

Wait, they are after me... IBUDHRYKPSSRCGCSXDHGRECTRHNZMFZUMLPOAPUNPBXHJFIIMKQMQDLPRVEXYUXKOKJJATCNHTTJOLPBXCEYNYITDZWFHXHJ

Useful Notes / Encryption

Symmetric vs. asymmetric encryption

One-time pad

Cryptographic hash functions (One-way Encryption)

Homomorphic encryption

Cryptanalysis

Previous

Index

Next

Useful Notes / Encryption

Edit Locked

Symmetric vs. asymmetric encryption

One-time pad

Cryptographic hash functions (One-way Encryption)

Homomorphic encryption

Cryptanalysis

Previous

Index

Next