From b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9 Mon Sep 17 00:00:00 2001 From: Rob Landley Date: Sun, 29 Jan 2006 06:29:01 +0000 Subject: [PATCH] Add explanations of encrypted passwords, and fork vs vfork. --- docs/busybox.net/programming.html | 115 ++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html index e44f291b3..f77f3c3a6 100644 --- a/docs/busybox.net/programming.html +++ b/docs/busybox.net/programming.html @@ -12,6 +12,11 @@
  • Adding an applet to busybox
  • What standards does busybox adhere to?
  • +
  • Tips and tricks.
  • +

    What are the goals of busybox?

    @@ -172,6 +177,116 @@ applet is otherwise finished. When polishing and testing a busybox applet, we ensure we have at least the option of full standards compliance, or else document where we (intentionally) fall short.

    +

    Programming tips and tricks.

    + +

    Various things busybox uses that aren't particularly well documented +elsewhere.

    + +

    Encrypted Passwords

    + +

    Password fields in /etc/passwd and /etc/shadow are in a special format. +If the first character isn't '$', then it's an old DES style password. If +the first character is '$' then the password is actually three fields +separated by '$' characters:

    +
    +  $type$salt$encrypted_password
    +
    + +

    The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.

    + +

    The "salt" is a bunch of ramdom characters (generally 8) the encryption +algorithm uses to perturb the password in a known and reproducible way (such +as by appending the random data to the unencrypted password, or combining +them with exclusive or). Salt is randomly generated when setting a password, +and then the same salt value is re-used when checking the password. (Salt is +thus stored unencrypted.)

    + +

    The advantage of using salt is that the same cleartext password encrypted +with a different salt value produces a different encrypted value. +If each encrypted password uses a different salt value, an attacker is forced +to do the cryptographic math all over again for each password they want to +check. Without salt, they could simply produce a big dictionary of commonly +used passwords ahead of time, and look up each password in a stolen password +file to see if it's a known value. (Even if there are billions of possible +passwords in the dictionary, checking each one is just a binary search against +a file only a few gigabytes long.) With salt they can't even tell if two +different users share the same password without guessing what that password +is and decrypting it. They also can't precompute the attack dictionary for +a specific password until they know what the salt value is.

    + +

    The third field is the encrypted password (plus the salt). For md5 this +is 22 bytes.

    + +

    The busybox function to handle all this is pw_encrypt(clear, salt) in +"libbb/pw_encrypt.c". The first argument is the clear text password to be +encrypted, and the second is a string in "$type$salt$password" format, from +which the "type" and "salt" fields will be extracted to produce an encrypted +value. (Only the first two fields are needed, the third $ is equivalent to +the end of the string.) The return value is an encrypted password in +/etc/passwd format, with all three $ separated fields. It's stored in +a static buffer, 128 bytes long.

    + +

    So when checking an existing password, if pw_encrypt(text, +old_encrypted_password) returns a string that compares identical to +old_encrypted_password, you've got the right password. When setting a new +password, generate a random 8 character salt string, put it in the right +format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the +second argument to pw_encrypt(text,buffer).

    + +

    Fork and vfork

    + +

    On systems that haven't got a Memory Management Unit, fork() is unreasonably +expensive to implement, so a less capable function called vfork() is used +instead.

    + +

    The reason vfork() exists is that if you haven't got an MMU then you can't +simply set up a second set of page tables and share the physical memory via +copy-on-write, which is what fork() normally does. This means that actually +forking has to copy all the parent's memory (which could easily be tens of +megabytes). And you have to do this even though that memory gets freed again +as soon as the exec happens, so it's probably all a big waste of time.

    + +

    This is not only slow and a waste of space, it also causes totally +unnecessary memory usage spikes based on how big the _parent_ process is (not +the child), and these spikes are quite likely to trigger an out of memory +condition on small systems (which is where nommu is common anyway). So +although you _can_ emulate a real fork on a nommu system, you really don't +want to.

    + +

    In theory, vfork() is just a fork() that writeably shares the heap and stack +rather than copying it (so what one process writes the other one sees). In +practice, vfork() has to suspend the parent process until the child does exec, +at which point the parent wakes up and resumes by returning from the call to +vfork(). All modern kernel/libc combinations implement vfork() to put the +parent to sleep until the child does its exec. There's just no other way to +make it work: they're sharing the same stack, so if either one returns from its +function it stomps on the callstack so that when the other process returns, +hilarity ensues. In fact without suspending the parent there's no way to even +store separate copies of the return value (the pid) from the vfork() call +itself: both assignments write into the same memory location.

    + +

    One way to understand (and in fact implement) vfork() is this: imagine +the parent does a setjmp and then continues on (pretending to be the child) +until the exec() comes around, then the _exec_ does the actual fork, and the +parent does a longjmp back to the original vfork call and continues on from +there. (It thus becomes obvious why the child can't return, or modify +local variables it doesn't want the parent to see changed when it resumes.) + +

    Note a common mistake: the need for vfork doesn't mean you can't have two +processes running at the same time. It means you can't have two processes +sharing the same memory without stomping all over each other. As soon as +the child calls exec(), the parent resumes.

    + +

    (Now in theory, a nommu system could just copy the _stack_ when it forks +(which presumably is much shorter than the heap), and leave the heap shared. +In practice, you've just wound up in a multi-threaded situation and you can't +do a malloc() or free() on your heap without freeing the other process's memory +(and if you don't have the proper locking for being threaded, corrupting the +heap if both of you try to do it at the same time and wind up stomping on +each other while traversing the free memory lists). The thing about vfork is +that it's a big red flag warning "there be dragons here" rather than +something subtle and thus even more dangerous.)

    +