# Encryption

## What is encryption?

Encryption is the process of converting a piece of plaintext into a ciphertext using a key.

Plaintext - the original, unencrypted message. This is the data that's being encrypted.

`This is some readable plaintext. It's a really big secret so I had better be sure to encrypt it properly.`

Ciphertext - the encrypted version of the plaintext after it's completed encryption.

Once plaintext is encrypted into ciphertext, it looks like this:

`U2FsdGVkX1+H7BSzYgumzI4SfcqHpp9KxqAPsPTZ1TU13gC6dEBYnP2Q5q0r7wRRR1WxGMvsYFVGJlV6/atZQfC6XiaiMZUafJyhCvf/h52gzR7qv2o+G76XBaAItir+ZrcqDaCkLvKtWbEGkS44LsDVBU4lEqnTrA==`

This ciphertext is no longer human readable text, but given the right key, it can be converted back into the original plaintext.

Key - this data is used to encrypt, and sometimes decrypt a message(some methods of encryption will use a different key for encrypting than for decrypting). For our purposes "key" and "password" are used interchangably.

## Demonstration

Let's make an encryption algorithm. Though for real use cases, it's not a good idea to build your algorithm, but for a demonstration it's ok

We'll start with some assumptions about our input plaintext. For simplicity, we'll limit it to lowercase letters(`a-z`).

For the password(or key), you pick any number, and each character gets incremented by that amount

So, if we have the plaintext: `here is my secret message`

And for the key we'll pick: `13`

our first letter `h`, becomes `u`

```here is my secret message <-- plaintext
ifsf jt nz tfdsfu nfttbhf
jgtg ku oa ugetgv oguucig
```
```.... .. .. ...... ....... <-- 8 more steps show all
```
```khuh lv pb vhfuhw phvvdjh
livi mw qc wigvix qiwweki
mjwj nx rd xjhwjy rjxxflj
nkxk oy se ykixkz skyygmk
olyl pz tf zljyla tlzzhnl
pmzm qa ug amkzmb umaaiom
qnan rb vh bnlanc vnbbjpn
robo sc wi combod wocckqo
```
```spcp td xj dpncpe xpddlrp
tqdq ue yk eqodqf yqeemsq
urer vf zl frperg zrffntr <-- ciphertext
```

Our final encrypted message would look like this:

`urer vf zl frperg zrffntr`

To decrypt this someone would need to know the ciphertext, and the key used to encrypt it `13`.

So is our message safe? No, not really. This is known as a Caesar Cipher, which was used by Julius Caesar over 2000 years ago, and it has some major weaknesses. The version we did, called ROT13, has a unique property, it doesn't have to be reversed for encryption or decryption, because the latin alphabet has 26 characters, shifting 13 characters, two times, will take you back to the original message.

You can try encrypting and decrypting your own messages at rot13.com

## Cracking 'Caesar's Cipher'

So why is our algorithm not secure? It's very vulnerable to an attack called frequency analysis. Letters will appear in roughtly the same ratios in any text.

I've taken the first chapter of Sherlock Holmes by Arthur Conan Doyle, and counted the number of times each letter appears, then ordered them.

Here are the letters(`e` appears the most, and `z` the least):

`etoaisnhrdlumcwyfgpbvkxjqz`

I took the first paragraph of the chapter 2, and put it through our algorithm, and then counted the frequency of each letter. Remember this is is encrypted, so `r` isn't the most frequently used letter, but whatever has been encrypted as `r` is.

`rgvnufbaeqypjzslhtoicxdk`

Because we know how to decrypt our algorithm, let's take a peek at what these letters are when shifted 13 characters the original letters.

`etiahsonrdlcwmfyugbvpkqx`

Here is are the letters from Chapter 1, and our encrypted paragraph, side by side

```etoaisnhrdlumcwyfgpbvkxjqz <--Ch. 1
etiahsonrdlcwmfyugbvpkqx__ <-- Ch. 2 Paragraph
```

It's not identical, but it's close. If someone didn't know how to decrypt the text, or that the key was `13`, they could just replace each letter in the text with the corresponding letter in the "true" letter frequency order. So `r` becomes `e`, etc.

```rgvnufbaeqypjzslhtoicxdk__ <- replace each one of these letters
etoaisnhrdlumcwyfgpbvkxjqz <- with the corresponding letter here
```

Let's see the original text next to our "cracked" text.

```at three oclock precisely i was at baker street but holmes had not yet returned the landlady....
at tiree nulnuk vreuosely o mas at paker street pft inlces iad hnt yet retfrhed tie lahdlady...
```

It's not quite perfect, but there are some obvious changes we could make:

1. `h` -> `n`
`lahdlady` is clearly `landlady`, so we know `h` should really be `n`
2. `o` -> `i`
The `at` in the beginning is correct, so we know `o mas` should be `i mas`, rather than `a mas`
3. `m` -> `w`
We can guess that `m` in `mas` should be a `w`

I made these substitutions, and some other obvious ones that appeared after making the above substitutions, and within about 6 substitutions we get to:

```at three oclock precisely i was at baker street but holmes had not yet returned the landlady...
at three oflofk vrefisely i was at baker street but holces had not yet returned the landlady...
```

Ok, so we missed `oclock`, `precisely`, and `holmes`, but if this was the plans of some enemy, we still have most of the information we might need to thwart their attack, and as the text gets longer, the more likely it is to align with the "true" letter frequency, as well have lots of words we can use to find obvious fixes to any errors.

# Hashing

## What is hashing?

Hashing is another part of the world of cryptography, but it's different from encryption. With encryption the important part was that the data was preserved, but with a hash we can't get the information back that we put in, but it can be used to verify that the inputted information is the same.

Hash Function - instructions used for a hashing operation.

Digest - the output from the hash function.

## Encryption vs Hashing

##### Encryption

Encryption is just one direction of a cyclical process, the other is the decryption.

`PLAINTEXT`
KEY ENCRYPTION
DECRYPTION KEY
`CIPHERTEXT`
##### Hashing

With hashing it's a one-way operation, and the hash is the final result.

`INPUT`
HASH FUNCTION
`DIGEST`

#### Examples(SHA256 hashing algorithm)

• Input
• SHA-256 Hash
• `5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8`
• A
• `559AEAD08264D5795D3909718CDD05ABD49572E84FE55590EEF31A88A08FDFFD`
• Hello, world!(repeated 10k times)
• `CE9FE3447A34D159CBF59C8B01688AFEF4EDAFD32D5A2DB20EC4F002C8C43BDC`

## Real-world usage

Say we want to store usernames and passwords in a database to use for user sign up/sign in. You sign up with username: `bob` and password: `password`. We could just store `bob`/`password` in the database and quite easily use it to sign you in. But what if the database gets hacked or leaked? Your username and password, which you likely use other places too, is now out there, in plaintext.

But this is taking unneeded risks, I don't really need to know your actual password to acheive my goal of authenticating you. I don't care that you type `password`, what I really need to know is "did you send the same thing this time, that you sent when you signed up." For that we can replace the plaintext password with the hash of the password.

• Value from user
User value conversion
Comparison
Value from database
none
==
hash function
5e884898
5e884898
==
5e884898
5e884898

## Demonstration

##### Step 1: password -> numbers
• Letter are replaced with numbers order in the alphabet

a
1
b
2
c
3
d
4
e
5

...

• Numbers stay as numbers

...

2
2
3
3
4
4

...

• Non alphanumeric characters are replaced with "0"

...

/
0
:
0
!
0
(
0

...

##### Step 2: Calculate the hash
```a2c4e6            <- the raw input
123456            <- input converted to numbers
12 34 56          <- split numbers into groups of two
12 + 34 + 56      <- sum the numbers

102               <- if length < 5:
10200             <- pad from the right with `0` until 5 digits
10200             <- final hash digest

10234567          <- if length > 5:
10234567          <- cut digits from the left until 5 digits
34567             <- final hash digest

```

We are cutting/padding the result because hashes tend to be a fixed output length. Whether you input a ".", or the entire dictionary, you'll get back values of equal length, but different contents.

Here's the process of hashing the password: `hello`

```he l l o
85121215        <- replacing 'h' with '8', 'e' with '5', etc.
85 12 12 15     <- split the number into 2-digit numbers
124             <- sum the numbers
12400           <- pad with "0" until 5 numbers long
```

So the hash(or digest) of `hello`, for this hashing algorithm, is `12400`. Any time `hello` is given as an input to this function, th e digest will always be `12400`.

And for the password: `password.1.2.3.password.4.5.6.`

``` pa s s w o rd.1.2.3. pa s s w o rd.4.5.6.
161191923151840102030161191923151840405060
16 11 91 92 31 51 84 01 02 03 01 61 19 19 23 15 18 40 40 50 60
728
72800
```

Can we take `12400` and use it to get back to the input, `hello`? No, the only way to determine the original input, is by calculating the hash of every possible combination we can think of(well, at least for a more secure hashing algorithm than the one above). This is called a "brute force" attempt, and would take a long time, depending on the input length, and the hashing algorithm. But if we had `hello` and `12400` (and knowledge of the hashing algorithm), could we quickly tell if `hello` is the password that hashed to `12400`? Yes. So what should be store in the database? `hello` or `12400`? Definitely the "digest".

# Client v Server

It's useful before we go on to understand all the factors at play, and which person/computer/server has access to what data. The requests we'll be talking about happen between the `client` and the `server`

Client - the service requester. When you visit Wikipedia, you are not the client, but it's usually your web browser, or mobile/desktop app.

Server - computer that is providing a resource or service. When you visit a Wikipedia link, you will send a request the gets routed to one of Wikipedia's many servers, which will respond with the information you requested(or maybe some error message).

`Code / Algorithms` represents the code and encryption/decryption tools that TLWSD sends to your browser.

KEY
KEY
is passed outside of the channel where the encrypted text is sent.
KEY
KEY
KEY
##### Encryption(Client)

This is the initiator of the message

KEY
is never sent to nor seen by the server
##### Storage(Server)

This is a TLWSD server.

##### Decryption(Client)

The end user trying to receive a message.

###### Starts with
PLAINTEXT
KEY
CODE / ALGORITHMS
NOTHING
###### Starts with
KEY
CODE / ALGORITHMS

# Encryption + Hashing

## Why do we need both?

Ok, we now have two tools, encryption and hashing. But we haven't really discussed the problem we're trying to solve. The problem is, I want to take a message from you, and deliver it to your friend. I want to provide a seamless experience where your friend will know if they got the password wrong, so they can try again. They'll also know if they got it right, and then see your original message in plaintext. However, I don't want at any time, even for a moment, to have your plaintext, your password, or any other thing that is easily derivable into the plaintext or password.

Encryption gets us most of the way there. I won't know what your plaintext or password is, but your friend won't get the certainty that they've correctly decrypted the message.

Let's look at how we can use encryption + hashing to solve this. In trying to simplify things, i've realized our encrpytion algorithm requires a number and our hashing algorithm can take more complex inputs. Since neither of us will ever use these algorithms to actually encrypt anything, we can make the arbitrary rule that the alphabet position of the first letter of the password/key is used for our encryption algorithm

## Demonstration

We need a message to encrypt. We also need a password.

plaintext: `we need a message to encrypt` (lower case, no punctuation for simplicity)

password: `wealsoneedapassword`. w is the 23rd letter, so we'll use `23` to encrypt

First we'll encrypt the message with the password(or "key")

```we need a message to encrypt
xfaoffeabanfttbhfaupafodszqu
........20 more rows........
h9v099zvwv19ddw79ve v90ycjae
i8w 889wxw08eex68wfaw8 zdkbf
```

Now here's the trick that enables me to "know" if your friend has the right or wrong password, without ever knowing it.

We're going to take the ciphertext and the password, and take the hash of them together.

Why does that solve the problem? Because the decryption code will have access to the cipher text, and when your friend guesses a password if their password is the same as yours, the the hash will be the same too.

`hash(ciphertext + password)` => `somehash`

`hash(ciphertext + password_guess)` => `??` <- if this is anything but `somehash`, it's the wrong password. If it is `somehash`, then your friend got the password correct.

Let's hash: `i8w 889wxw08eex68wfaw8 zdkbf` + `wealsoneedapassword`

```i8w 889wxw08eex68wfaw8 zdkbfwealsoneedapassword                                       <- ciphertext + password
23701657421213719211952201707145180162201015250321721220228111921144116119192315184"  <- converted to numbers
[23, 70, 16, 57, 42, 12, 13, 71, 92, 11, 95, 22, 1, 70, 71, 45, 18, 1, 62, 20,        <- split into 2 digit numbers
10, 15, 25, 3, 21, 72, 12, 20, 22, 81, 11, 92, 11, 44, 11, 61, 19, 19, 23, 15,
18, 4]
1421                                                                                  <- summed
"14210"                                                                               <- 0 padded, we have our hash!
```

Finally our problem is solved with just two pieces of infomation:

ciphertext: `i8w 889wxw08eex68wfaw8 zdkbf`

hash: `14210`

And with this pattern and these two pieces of data, we get some important benefits:

• We transport only encrypted/obscured data. Nothing in the cipertext or hash digest can be used to derive that password or plaintext
• The decryption code on your friends computer doens't need to have access to the plaintext or password/key in order to determine if a password guess is the true correct.
• When your friend sees the unencrypted plaintext, they can be confident they got the password/key correct
• When your friend sees the unencrypted plaintext, they can be confident they are reading the same message that was encrypted

That last one is important and worth an extra bit of explaination

## Why `hash(ciphertext + password)`?

##### Why can't we just hash the password `hash(password)`?

Originally the goal was the be able to give a "Wrong password" message to the user trying to read the message. If we just hashed the password, and not the ciphertext + password, we would acheive this goal. So why include the ciphertext? We want to ensure not just that the password is correct, but more importantly, the decrypted plaintext message is the same as the plaintext that was originally encryped

If we only hash the password, we also leak information. Or at least make it easier for an attacker. Below are 5 messages that were sent, 3 of them are from you to a friend, using the same password.

##### Using `ciphertext.hash(ciphertext + password)`
• `811C19.39`
• `3F0F0B.B0`
• `01311E.23`
• `7714D5.91`
• `548087.86`
##### Using `ciphertext.hash(password)`
• `811C19.2C`
• `3F0F0B.16`
• `01311E.2C`
• `7714D5.E7`
• `548087.2C`

It becomes much easier for a malicious actor to know which messages are related, or have the same password. But worse than that, lets say this message `811C19.2C` is your bank account number which your friend is supposed to send \$1m to. If a malicious actor got a hold of this message, and changed it to `911C19.2C`, just changing the 8 to a 9 will change the output, but your friend will have no idea. They will send that \$1m to "223" instead of "123". What would happen in the hash(ciphertext + password) case? Because the `39` digest is `hash("811C19" + password)`, even with the right password, the cipher text is wrong, so the hash digest will never match. The message would essentially be "broken". Even the right password would fail like a wrong password. That's a bit annoying, but it's by design. What's more annoying, seeing an "incorrect password" message when you know you put the right password in, or sending \$1m to the wrong bank account? This is designed to only show the plaintext when 1. the password is correct AND 2. the message has not be altered.

This is a basic explanation of Message Authentication Codes(MAC). You can read more about MACs here.

# TLWSD vs Others

## Others

##### Isn't there a simpler method?

There may be simpler methods, but they lose some or all of the benefits of doing the encryption and decryption on the client. Let's look at how some other sites do "encrypted messages", and why it's flawed.

The method i've seen on some other sites is to send the plaintext and the password to the server. The data that gets sent from your computer to the server might look something like this:

```example.com/?utf8=%E2%9C%93&authenticity_token=4n%2BMwYx4iMcggjmRiaiF%2BKUYbrW8otsUMybeduiXB0M%3D&message%5Bbody%5D=This+is+my+message&message%5Bpassword%5D=This+is+my+password&message%5Bterms_of_service%5D=0&message%5Bterms_of_service%5D=1&commit=SAVE+THIS+MESSAGE
```

This url is currently encoded. Encoding is different from encryption. Encoding allows us to have characters that are reserved for special use, but that can also be used by the user. For example `/` is a reserved character in a URL. I has a special meaning, but if I want to submit some text that contains a `/` character, I can, the browser will encode it to `%2F`, but the goal here is not to distort or hide data, just to separate what is user data, and what is used internally as a control character

We want to decode this URL and see that data that's being sent. To do that we need something that can decode the encoding, and parse the `query string` for us. A query string is just the part of the URL that contains the parameter data. It could be what you entered on a form, or the language/timezone/etc. you have set.

Here's the URL query string decoded and parsed:

```"authenticity_token": "4n+MwYx4iMcggjmRiaiF+KUYbrW8otsUMybeduiXB0M=",
"commit": "SAVE THIS MESSAGE",
"message[body]": "This is my message",
"message[terms_of_service]": "1",
"utf8": "✓"
```

The `authenticity_token` param might look similar to an encrypted message, but it's just a marker that denotes a particular user, and can be used to block a user, or remember their settings, etc.

The parts here that should alarm you,It's possible that the message gets encrypted on the server, and the password gets hashed on the server. But how do we know? Can we trust the person who created the site? Maybe, but even trustworthy people make mistakes. And what about all the steps in between you and the server, do you trust AT&T? Comcast? The only way to make sure your data is secure, is to never let it leave your computer unless it's encrypted.

The end user's plaintext password guesses also get send to the server, where they're compared against the original password(or a hash of that password), if correct, the plaintext is sent to the end user.

Here's the parsed query string from a password guess:

```{
"authenticity_token": "gPNLW/31XYeI3MMJvQztSCASg8m1K/s0Ot1OEcnFSEM=",
"commit": "RETRIEVE MESSAGE",
"utf8": "✓"
}
```

And when I get the password correct, the response is a bunch of HTML, but burried in there is of course, the plain text message

```...
<pre id="retrieved-message">This is my message</pre>
...
```

Hopefully it's clear why this is not secure. Two main reasons:

1. The text(plain or cipher) and password are being sent over the same channel, at the same time. This is like locking your front door, and tying the key to the doorknob.
2. The plaintext and password are leaving your domain(the client), and being sent to another server. Assume that any information that is sent from your computer, to the server, could be seen by anyone.

## TLWSD

##### What do we send to the server?

I've shown the data that other sites send to the server, so let's look at ours. Here is the data:

```csrfmiddlewaretoken=bzah36gpB9pwrj3VACqZdfGYWS7xYdbvZXcpjkKsDwIIjBlfVLjPNIYpmvMMd8N6&msg_text=eyJpdiI6IjAxTjJCeVNIUEZ4OXFMK2hYTURUalE9PSIsInYiOjEsIml0ZXIiOjEwMDAsImtzIjoxMjgsInRzIjo2NCwibW9kZSI6ImdjbSIsImFkYXRhIjoiIiwiY2lwaGVyIjoiYWVzIiwic2FsdCI6IlY3T0plNjFaU25jPSIsImN0IjoiWnBGOUp1dnMvQ0JJRnpTTWZaajNycTk5WXp2V21hV3lwdTQ9In0%3D&access_count_remaining=1&max_view_time=0&max_view_time_units=seconds&ttl=0&ttl_units=hours&desc_text=&has_password=true&password_hint=
```

Here it is parsed:

```  {
"access_count_remaining": "1",
"csrfmiddlewaretoken": "bzah36gpB9pwrj3VACqZdfGYWS7xYdbvZXcpjkKsDwIIjBlfVLjPNIYpmvMMd8N6",
"desc_text": "",
"max_view_time": "0",
"max_view_time_units": "seconds",
"msg_text": "eyJpdiI6IjAxTjJCeVNIUEZ4OXFMK2hYTURUalE9PSIsInYiOjEsIml0ZXIiOjEwMDAsImtzIjoxMjgsInRzIjo2NCwibW9kZSI6ImdjbSIsImFkYXRhIjoiIiwiY2lwaGVyIjoiYWVzIiwic2FsdCI6IlY3T0plNjFaU25jPSIsImN0IjoiWnBGOUp1dnMvQ0JJRnpTTWZaajNycTk5WXp2V21hV3lwdTQ9In0=",
"ttl": "0",
"ttl_units": "hours"
}
```

The `csrfmiddlewaretoken` is similar to the `authenticity_token` above. It just ensures that the form being submitted was one that was generated by our site.

All the other fields are options you can add to your link. I used the same message and password as I did in the previous example. `msg_text` is where we see what the message looks like when it's being sent to the server.

We also include a `has_password` parameter, why? Because we're only sending `msg_text` to the server, and don't have or send a `password` parameter, to the server, a password encrypted message and a plaintext message all look the same. The only place this really matters is when the end user opens the link. Do we show them the `msg_text` as is, or do we prompt them for a password, and use that to decrpyt the `msg_text`(still client-side, nothing is leaving the browser while the user attempts to decrypt the message)

## HTTPS

##### Doesn't HTTPS take care of all this?

You've probably seen the padlock on the left of the address bar that looks like this:

Or maybe you've seen something like this:

Or more likely you've gotten this when trying to visit a website:

These are all related to HTTPS or Hypertext Transfer Protocol Secure. HTTPS is a secure way of transmitting data between a client and server. HTTPS provides a few protections:

• It encrypts the data you send to the server, and the data the server sends to you. This stops anyone who maybe be listening on a public network from seeing the data you're sending(well at least the plaintext data, they can see the encrypted data, but that's useless to them)
• It also protects against `man-in-the-middle` attacks, where someone/some machine sits between you and the website you want to visit, and mimics the actual website, mean while having access to the data you're sending.

Sure, HTTPS is encryption, between you and the server, but HTTPS is not the same thing as the client-side encryption, we're talking .

It's encryption between your computer and the server. But HTTPS this You still have to trust that the server you're sending that plaintext to, is trustworthy, capable, flawless, superhuman, incapable of mistakes, etc. And I can assure you they're not all of those things.

## Isn't Server Side/HTTPS good enough?

##### I'm not sending nuclear lanch codes or anything, just a Netflix password

Maybe you're a generally trusting person, and you think "Hey, not-tlwsd.com has cooler fonts than tlwsd.com, so i'll just use them, I trust them not to share my data". That's fine, i'm not saying other people aren't to be trusted, or that we're more trustworthy. The purpose of client-side encryption is that you don't have to trust anyone that doesn't have your secret key. You don't have to trust that they'll encrypt your message. You don't have to trust that their code is water tight and bug free, or that their database could never accidentally get leaked or hacked. The point is, even if the website doesn't do what they said they'll do, or their database does get hacked, you don't need to worry. If you never sent data in a form that was sensitive, it doesn't really matter who sees the cipher text.*

* The type of encryption used at TLWSD (AES-GCM 256-bit) is infeasible to crack. This article does a good job explaining how long it would take. But that doesn't mean once it's encrypted, you're off the hook. If you use `password1234` as your password, it's not going to take trillions of trillions ... of trillons of years to crack, it'll take seconds. Likewise, if you use the `password hint` feature for your TLWSD link, and put `Password is ThisIsSoSecure:)101?` it'll take seconds.

Here's the way I see it, either what you're sending is sensitive, or it's not. If you want to tell your friend the name of that TV show you were talking about, then e-mail it, yell it over loudspeaker in front of their house, or print it on a million sheets of paper and drop them out of a plane over their office(maybe stick to email). But if it's at all sensitive, whether it's a Netflix password, your credit card number, nuclear launch codes, or your jeans size, then use the most secure method you can, with the lowest number of people with access to the data except for you, and the person you're sending it to(if that number is 1 or more, it should be 0).