Plaintext Unbiasing using Secure Hash Algorithms

Adding correlation-attack resistance to data.

Presented as a step-by-step tutorial for C programmers.

Aaron Logue, Jul 2013

Recurring patterns are common throughout the sequences of bits that encode higher level data. Pseudorandom XOR pads can be used to make any sequence of bits virtually indistinguishable from randomness, which helps to defend cryptosystems that encrypt that data against correlation attacks.
unbias.tar.gz

--------------------------------  0  -------------------------------

  1. Downloading and compiling Skein - a secure hash algorithm
  2. Hashing a byte to check interoperability and endianness
  3. Generating an XOR unbiasing pad with a hash, and how it works
  4. Adding an Initialization Vector to the pad, and why one is needed
  5. Test program pseudocode, demonstration of unbiasing with debug output
  6. Source code of an example implementation with base64 encoding added
  7. Additional comments

--------------------------------  1  -------------------------------

On occasion, I am tasked with writing software to keep something private.
It might be customer account data or credit cards or SSNs or logfile
backups or client contracts, but no matter what it is, it's usually hard
to get right and is an area in which a programmer should never feel truly
confident.  I try to add something new to my geek toolkit from time to time,
and recently decided to take a look at the Skein hash by assigning myself
the project of using it to write a file unbiasing utility.  I've done this
sort of thing before, so I thought I'd take notes and share in the event
that it might be useful to others.

Recurring patterns can be found throughout the long sequences of zeros and
ones that encode higher level information.  For example, GIF files always
begin with 010001110100100101000110 ("GIF") and ZIP files always begin with
0101000001001011 ("PK").  Specific patterns in specific locations are often
the focus of correlation attacks by cryptanalysts against an encryption
system.  While the primary intent of this page is to serve as a beginner's
guide to using one of several secure hash algorithms available online, the
example application developed here removes bit bias from data and makes it as
indistinguishable as possible from randomness.  Such a preprocessing step
would, without the use of keys and without itself performing encryption,
strengthen any cryptosystem that is subsequently used to encrypt that data.

                               -  -  -

I downloaded the Skein V1.3 source code from http://www.skein-hash.info/
http://www.skein-hash.info/sites/default/files/NIST_CD_102610.zip
and after unzipping into a temporary directory and poking around a bit,
settled on the following seven files from its optimized 32-bit directory:

[gorf@box unbias]$ ls -al *.h *.c
-rw-r--r--    1 gorf     users        6141 Jun 30 14:43 brg_endian.h
-rw-r--r--    1 gorf     users        6921 Jun 30 14:21 brg_types.h
-rw-r--r--    1 gorf     users       26877 Jun 30 14:38 skein_block.c
-rw-r--r--    1 gorf     users       35534 Jun 29 08:03 skein.c
-rw-r--r--    1 gorf     users       16551 Jun 29 08:02 skein.h
-rw-r--r--    1 gorf     users        5856 Jun 30 14:23 skein_iv.h
-rw-r--r--    1 gorf     users        4576 Jun 30 14:43 skein_port.h

Those contained the minimum of what I needed to use the Skein hash
without having to change them.  Next, I tried compiling it with a
simple "Hello, World!" program.  Here's the program and a simple
Makefile for use with the GNU make utility:

[gorf@box unbias]$ ls -al Makefile test1.c
-rw-r--r--    1 gorf     users         196 Jun 30 14:39 Makefile
-rw-r--r--    1 gorf     users         117 Jul  1 12:07 test1.c

[gorf@box unbias]$ cat test1.c
#include <stdio.h>
#include "skein.h"
int main(int argc, char *argv[]) {
   printf("Hello, World!\n");
   return 1;
}

[gorf@box unbias]$ cat Makefile
BINS= test1
OBJS= skein.o skein_block.o
CC=   gcc
CFLAGS= -s -O
all: $(BINS)
test1: $(OBJS) test1.c skein.h
        gcc $(CFLAGS) -o test1 $(OBJS) test1.c
clean:
        rm -f $(BINS) $(OBJS)

If you copy and paste the above Makefile, watch out for the indentation on
the lines that begin with gcc and rm.  Those need to be tabs, not spaces.
The -s switch strips symbol info from the executable, making it a fair bit
smaller, and -O is the default optimization. Build it by running make:

[gorf@box unbias]$ make
gcc -s -O   -c -o skein.o skein.c
gcc -s -O   -c -o skein_block.o skein_block.c
gcc -s -O -o test1 skein.o skein_block.o test1.c

[gorf@box unbias]$ ./test1
Hello, World!

--------------------------------  2  -------------------------------

Next, I wanted to try hashing something. Hashing typically takes
an input of an arbitrary length and produces an output value of a
predetermined length.  A cryptographically secure hash does that in
such a way that it is very difficult to determine the input value
from the output value.  A cryptographic hash has other output
characteristics that we'll take advantage of, such as the odds of
any one bit being 1 is about 50/50 and that it is hard to tell what
any output bits should be by looking at any other output bits.
If any particular collection of bits tends to have more 0s than 1s
(or vice versa) then it is said to be "biased".  Removing bit bias
from any particular sampling of data is what we're up to here.

I looked at skein.h and skein.c to figure out what I needed to do
to hash something and saw the Skein_256, 512, and 1024 functions.
https://www.schneier.com/skein1.3.pdf gave me the output
that I should see if I hashed a single byte 0xFF using the Skein_512
functions, so I modified test1.c to do that.  I also wrote a little
function called hexdump() to display the contents of memory in hex:

[gorf@box unbias]$ cat test1.c
#include <stdio.h>
#include <string.h>
#include "skein.h"

void hexdump(unsigned char *ptr, int len) {
 int i;
 unsigned char *addr = 0;
   while (len) {
      printf("%08x  ", addr);
      for (i=0; i < 16; i++) {
         if (len > 0) {
            printf("%02x ", *ptr++);
            addr++;
            len--;
         }
      }
      printf("\n");
   }
}

int main(int argc, char *argv[]) {
 int i;
 Skein_512_Ctxt_t ctx;
 unsigned char data[1];
 unsigned char hash[64];

   data[0] = 0xff;

   Skein_512_Init(&ctx, 512);
   Skein_512_Update(&ctx, data, 1);
   Skein_512_Final(&ctx, hash);

   hexdump(hash, 64);
   return 1;
}

[gorf@box unbias]$ ./test1
00000000  71 b7 bc e6 fe 64 52 22 7b 9c ed 60 14 24 9e 5b
00000010  f9 a9 75 4c 3a d6 18 cc c4 e0 aa e1 6b 31 6c c8
00000020  ca 69 8d 86 43 07 ed 3e 80 b6 ef 15 70 81 2a c5
00000030  27 2d c4 09 b5 a0 12 df 2a 57 91 02 f3 40 61 7a

Success!  The output matches the expected test values listed in the
PDF when hashing the single input byte 0xFF using a 512-bit hash.

This step is important because it verifies that the hash code is
working the same way on the target platform as it works on other
platforms.  What can sometimes happen is that everything will appear
to be working just fine, but when the same values are hashed on a
platform with different endian or native integer sizes, the output
is different.

--------------------------------  3  ---------------------------------

Next up is to turn this into a Cryptographically Secure Pseudorandom
Number Generator (CSPRNG).  What we are after is the ability to
reproduce, from a given starting point, an arbitrarily long stream
of what appear to be random bits.  An observer of these bits should
not be able to predict what the next bit in the stream will be,
and the chances of the next bit being a particular value should be as
close to 50/50 as possible.  I have neither the mathematical chops
nor the network of cryptology peers to have a chance of creating
a CSPRNG with those characteristics from scratch on my own, but I
can use something like the Skein hash to make one.

One way to turn a cryptographic hash like Skein into a CSPRNG is
to use part of each hash block output as input for the next hashing
round.  After moving the hexdump() function into its own hexdump.c
module and adding test2 to the Makefile, my files looked like this:

[gorf@box unbias]$ cat hexdump.h
void hexdump(unsigned char *ptr, int len);

[gorf@box unbias]$ cat hexdump.c
#include <stdio.h>
void hexdump(unsigned char *ptr, int len) {
 int i;
 unsigned char *addr = 0;
   while (len) {
      printf("%08x  ", addr);
      for (i=0; i < 16; i++) {
         if (len > 0) {
            printf("%02x ", *ptr++);
            addr++;
            len--;
         }
      }
      printf("\n");
   }
}

[gorf@box unbias]$ cat test2.c
#include <stdio.h>
#include <string.h>
#include "skein.h"
#include "hexdump.h"

int main(int argc, char *argv[]) {
 int i;
 Skein_512_Ctxt_t ctx;
 unsigned char hash[64];
 char *starting_point = "static starting point";

   Skein_512_Init(&ctx, 512);
   Skein_512_Update(&ctx, starting_point, strlen(starting_point));
   Skein_512_Final(&ctx, hash);
   hexdump(&hash[32], 32);
   for (i=0; i < 7; i++) {
      Skein_512_Update(&ctx, hash, 32);
      Skein_512_Final(&ctx, hash);
      hexdump(&hash[32], 32);
   }
   return 1;
}

This CSPRNG works by performing a 512-bit hash on an arbitrarily-chosen
static starting point, then treating the hash output as two 256-bit parts.
The first part is recycled as the input to the hash to produce the next
512-bit block, and the second part is output as actual CSPRNG stream data.
Note that we are not resetting the starting state between hashes, nor are
we passing any of the recycled part directly to the output stream.

In this particular test, I'm only producing 256 bytes of unbiasing pad,
but it could go on for terabytes.  Running the program looks like this:

[gorf@box unbias]$ ./test2
00000000  5b 99 cb ca d2 0c 5e 18 a0 bc c0 6e ef 92 08 69
00000010  e9 c8 fc ed 36 a1 d6 22 d2 d5 6f 2f 87 fd 66 1f
00000000  9a 24 26 4b bb 49 72 42 0e 2b e6 a6 8e 54 20 70
00000010  5d 26 fe 06 ea 2e 4c 85 e8 71 81 0d 1d 4a d6 60
00000000  d0 88 e5 a1 95 e3 7b d5 77 44 6e 3b 7a f4 2f f5
00000010  38 38 f9 f6 46 e9 af 53 a6 09 58 cc 78 58 61 97
00000000  3c 79 ac fd 92 e7 c0 e0 a4 52 9b ac 6a 8f c4 b0
00000010  3d 53 91 37 7e b9 d2 8e e6 62 9a 3b 71 b5 50 5e
00000000  8f f8 96 40 8f e2 61 94 8e 89 49 f8 b4 3f bb 47
00000010  d5 74 1e 18 b1 2a 42 bd 49 9b e6 5d 2f 46 c5 cc
00000000  3c ff be 63 71 62 01 8b 03 66 de 53 b5 23 f4 f5
00000010  1a 2f c6 a9 ce 42 1e b4 21 7d 61 1f 3a 43 99 e6
00000000  2a cb af d4 33 73 49 8e fb 39 ec 7d 4c 7f ec 05
00000010  52 fb 7e bd 8b 5d f2 29 60 40 84 be 13 96 ab b0
00000000  1d 81 6f a6 c9 21 f8 34 11 8d 45 5e 74 ae 22 26
00000010  4b ed 80 de f5 ff 86 cf c0 d7 39 a6 2c c1 f2 6b

We've got a working CSPRNG!  So how can something like this
be used to unbias and restore data?

Above, we see the hex value 5B is the first 8 bits of our
unbiasing pad, which is 01011011 in binary.  Suppose that we
want to unbias the letter 'A', which has an ASCII value of
hex 41, binary 01000001.  We can apply the unbiasing pad
to what we want to unbias using the XOR operator.  All XOR
does is compare two bits, and if they are the same, it outputs
0.  If they are different (so one or the other is 1, but
not both - hence the name "Exclusive OR") then it outputs 1.
Thus, 01011011 XOR 01000001 is 00011010.
To repeat, if we unbias the letter 'A' by XORing it with
the pad value hex 5B, the result would be ciphertext hex 1A.

Restoration is easy: Just reproduce the pad and XOR with the ciphertext.
If you have the ciphertext hex 1A, and you XOR it with the same
pad value hex 5B, you get hex 41 back, or the original letter 'A'.

That's where we're headed with all this; we're going to include
this CSPRNG in a program that generates as much pseudorandom XOR pad
as it needs to unbias an entire file, then reproduces the exact same
pad later to reverse the unbias operation and restore the file.

--------------------------------  4  ---------------------------------

Even though we're going to use the same CSPRNG algorithm repeatedly
to unbias input data, we want to use a random starting point so that
the CSPRNG outputs a different pad every time.  Never repeating a pad
helps to prevent an observer from seeing bits that do not change even
if the same data is unbiased over and over.

The random number that we'll use to do this is called an Initialization
Vector (IV).  The IV should contain a sufficient number of truly random
bits, known as entropy, to insure that it cannot be guessed or predicted.
To generate the IV, we'll need to use a true Random Number Generator (RNG).
This is the only point in the process where we require a truly random number.
Restoring the file later is a deterministic process that relies on the IV,
which is stored at the beginning of the unbiased data.

Here's an example of an IV generator based on the Linux urandom device.
The Linux urandom device is in turn a CSPRNG based on a secure hash,
but it is continuously stirring its starting point by hashing any truly
random bits that it can find, such as hard drive temperatures and timing
between keyboard and mouse interrupts to make it unpredictable in the
way that a purely deterministic CSPRNG is not.

#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include "skein.h"
#include "hexdump.h"

int get_iv(unsigned char *iv, int len) {
 int fd, num;
   num = 0;
   fd = open("/dev/urandom", O_RDONLY);
   if (fd >= 0) {
      num = read(fd, iv, len);
      close(fd);
   }
   return (num == len);
}

int main(int argc, char *argv[]) {
 int i;
 Skein_512_Ctxt_t ctx;
 unsigned char hash[64];
 unsigned char iv[32];

   if (!get_iv(iv, 32)) {
      printf("Could not obtain random number for IV.\n");
      return 0;
   }
   printf("IV:\n");
   hexdump(iv, 32);
   printf("Pad:\n");
   Skein_512_Init(&ctx, 512);
   Skein_512_Update(&ctx, iv, 32);
   Skein_512_Final(&ctx, hash);
   hexdump(&hash[32], 32);
   for (i=0; i < 7; i++) {
      Skein_512_Update(&ctx, hash, 32);
      Skein_512_Final(&ctx, hash);
      hexdump(&hash[32], 32);
   }
   return 1;
}

Running it should yield a different IV and pad each time:

[gorf@box unbias]$ ./test3
IV:
00000000  dd 26 91 d5 72 dc 0b b4 54 90 0f 8f f4 c5 67 dd
00000010  2f a6 5c 54 91 43 64 a5 a3 24 4d 01 0c 31 0a 92
Pad:
00000000  a5 25 30 45 ad 59 72 54 55 36 04 70 99 c8 e2 16
00000010  6e af 93 f7 35 d3 86 73 e9 30 d0 69 7b 6b b2 6c
00000000  94 a1 c4 dd 69 c2 97 64 e8 fe 2f 59 99 ad 01 b1
00000010  f5 f8 b9 7b a7 70 c9 bb 70 bc c4 51 fd 60 d1 e2
00000000  95 90 5f e2 b5 ec 95 04 34 cb e7 c0 56 99 0c aa
00000010  17 0d 8f ec 4b 09 ca b8 f6 44 6c 47 48 79 90 72
00000000  f8 6c ed 99 9f f9 ba ba 13 eb da 99 5b 06 f2 33
00000010  d1 38 ac 39 9d da bd d2 ef 85 11 58 10 c0 4a 01

By storing and using the IV to initialize the CSPRNG later, a restorer
should be able to reproduce the pad.

Another thing that would be useful would be to include a check value
so that the restorer has a way to know for sure if the restore succeeded.
We'll call it CHK and compute it as a 256-bit hash of the file plaintext.
CHK will itself be unbiased using the beginning of the pad to cause the
CHK value to be different even if we unbias the same file.

Our unbiased file format will look like this:

32-byte Initialization Vector, straight from /dev/urandom.
32-byte CHK value, unbiased by XORing with bytes 0-31 of the unbias pad.
N-bytes of ciphertext, XORed with bytes 32-onward of the unbias pad.

--------------------------------  5  ---------------------------------

A few other things that are needed, such as parsing command line switches
and reading and writing files are not specific to cryptography, so I'm not
going to go into too much detail on those things and just direct readers to
the source code descriptions that follow.  There are two programs below;
test4, which has been instrumented with debug output of some relevant memory
contents, and unbias, which shows how base64 encoding was added.  Base64
encoding isn't particularly useful in this context, but it looks cool and
is handy if you want to do something like email unbiased binary data as text.

Here's the pseudocode for program test4.c:

Read input filename and whether we're unbiasing from command line
If unbiasing, make up a random IV
Generate a unique name for output file
Open the input file
If unbiasing, compute hash (biased CHK) of entire file
   otherwise, read IV and unbiased CHK hash from unbiased input file
Open the output file
Initialize the CSPRNG using IV
XOR CHK with the first 32 bytes of the pad to unbias or restore it
If unbiasing, write IV and the unbiased CHK to the output file
   otherwise, init the plaintext hash that we'll use to compare with CHK
Process the input file {
   Read 32 bytes from the file
   Generate 32 bytes of XOR pad
   XOR the input file with the pad
   Write the results to the output file
   If restoring, update the plaintext hash that we'll compare with CHK
}
Close both the input and output files
If unbiasing, overwrite the input file with zeroes!
   otherwise, finalize the plaintext hash and compare with CHK
              If plaintext hash does not match CHK, print error and stop!
Delete the input file
Rename the output file to the input file
If unbiasing, print Input File unbiased.
   otherwise, print Input File restored.

Here's the source code with some debugging output:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "skein.h"
#include "hexdump.h"

#define HASHBITS  (512)
#define HASHSIZE  (HASHBITS/8)
#define BLOCKBITS (HASHBITS/2)
#define BLOCKSIZE (BLOCKBITS/8)
#define PIDLEN 20

int get_iv(unsigned char *iv, int len) {
 int fd, num;
   num = 0; 
   fd = open("/dev/urandom", O_RDONLY);
   if (fd >= 0) {
      num = read(fd, iv, len);
      close(fd);
   }
   return (num == len);
}

void show_usage(char *me) {
   printf("Usage: %s [-u] <filename>\n", me);
   printf("\t-u\tUnbias file (default is to restore)\n");
   return;
}

int main(int argc, char *argv[]) {
 int i, num;
 char c;
 int unbiasing;
 Skein_512_Ctxt_t ctx;
 Skein_512_Ctxt_t ctx_chk;
 unsigned char pad[HASHSIZE];
 unsigned char iv[BLOCKSIZE];
 unsigned char chk[BLOCKSIZE];
 char buffer[BLOCKSIZE];
 FILE *fp, *fo;
 char *infile, *outfile;
 struct stat info;

   unbiasing = 0;
   infile   = NULL;
   i = 1;
   while (i < argc) {
      if (!strcmp(argv[i], "-u")) {
         unbiasing = 1;
         if (!get_iv(iv, BLOCKSIZE)) {
            printf("Could not obtain random number for IV.\n");
            return 0;
         }
      } else {
         if (!infile) {
            infile = (char *)malloc(strlen(argv[i]) + 1);
            if (infile) {
               strcpy(infile, argv[i]);
               outfile = (char *)malloc(strlen(infile) + PIDLEN);
               if (!outfile) {
                  printf("Could not allocate buffer for output filename.\n");
                  return 0;
               }
               sprintf(outfile, "sk%d_%s", getpid(), infile);
            }
         } else {
            show_usage(argv[0]);
            return 0;
         }
      }
      i++;
   }
   if (!infile) {
      show_usage(argv[0]);
      return 0;
   }

   fp = fopen(infile, "r");
   if (fp == NULL) {
      printf("Could not open %s for input.\n", infile);
      free(outfile);
      free(infile);
      return 0;
   }
   if (unbiasing) {
      /* Hash plaintext to compute CHK value with, then fseek back to start */
      Skein_512_Init(&ctx, BLOCKBITS);
      while (!feof(fp)) {
         num = fread(buffer, 1, BLOCKSIZE, fp);
         if (num) {
            Skein_512_Update(&ctx, buffer, num);
         }
      }
      Skein_512_Final(&ctx, chk);
printf("hash of plaintext:\n");
hexdump(chk, BLOCKSIZE);
      fseek(fp, 0, SEEK_SET);
   } else {
      num = fread(iv, 1, BLOCKSIZE, fp);
      if (num == BLOCKSIZE) {
         num = fread(chk, 1, BLOCKSIZE, fp);
      }
      if (num != BLOCKSIZE) {
         printf("Could not read IV and CHK from %s\n", infile);
         fclose(fp);
         free(outfile);
         free(infile);
         return 0;
      }
   }
   fo = fopen(outfile, "wb");
   if (fo == NULL) {
      printf("Could not open %s for output.\n", outfile);
      fclose(fp);
      free(outfile);
      free(infile);
      return 0;
   }

   Skein_512_Init(&ctx, HASHBITS);
   Skein_512_Update(&ctx, iv, BLOCKSIZE);
   Skein_512_Final(&ctx, pad);
printf("Beginning of xor pad:\n");
hexdump(&pad[BLOCKSIZE], BLOCKSIZE);
   /* Unbias or restore the chk value using the start of the xor pad */
   for (i=0; i < BLOCKSIZE; i++) {
      chk[i] ^= pad[i+BLOCKSIZE];
   }
   if (unbiasing) {
      fwrite(iv, 1, BLOCKSIZE, fo);
      fwrite(chk, 1, BLOCKSIZE, fo);
   } else {
      Skein_512_Init(&ctx_chk, BLOCKBITS);
   }
   while (!feof(fp)) {
      num = fread(buffer, 1, BLOCKSIZE, fp);
      if (num) {
         Skein_512_Update(&ctx, pad, BLOCKSIZE);
         Skein_512_Final(&ctx, pad);
printf("buffer in:\n");
hexdump(buffer, num);
printf("xor pad:\n");
hexdump(&pad[BLOCKSIZE], num);
         for (i=0; i < num; i++) {
            buffer[i] ^= pad[i+BLOCKSIZE];
         }
printf("buffer out:\n");
hexdump(buffer, num);
         fwrite(buffer, 1, num, fo);
         if (!unbiasing) {
            Skein_512_Update(&ctx_chk, buffer, num);
         }
      }
   }
   fclose(fp);
   fclose(fo);
   if (!unbiasing) {
      /* Overload pad with computed chk value to test against read value */
      Skein_512_Final(&ctx_chk, pad);
printf("hash of restored plaintext:\n");
hexdump(pad, BLOCKSIZE);
      for (i=0; i < BLOCKSIZE; i++) {
         if (chk[i] != pad[i]) {
            printf("The restored plaintext does not match the original. The IV may be corrupt.\n");
            remove(outfile);
            free(outfile);
            free(infile);
            return 0;
         }
      }
   } else {
      /* overwrite the original file with zeros */
      for (i=0; i < BLOCKSIZE; i++) {
         buffer[i] = 0;
      }
      if (!stat(infile, &info)) {
         i = info.st_size / BLOCKSIZE + 1;
         fp = fopen(infile, "w");
         if (fp) {
            while (i--) {
               fwrite(buffer, 1, BLOCKSIZE, fp);
            }
            fclose(fp);
         }
      }
   }
   remove(infile);
   rename(outfile, infile);
   if (unbiasing) {
      printf("%s unbiased.\n", infile);
   } else {
      printf("%s restored.\n", infile);
   }
   free(outfile);
   free(infile);
   return 1;
}

And here's a sample run of the above program showing the file
foo being unbiased and restored:

[gorf@box unbias]$ ls -al foo
-rw-r--r--    1 gorf     users           6 Jul 22 00:16 foo
[gorf@box unbias]$ hexdump foo
   0: 41 42 43 44 45 0A      -                        ABCDE.
[gorf@box unbias]$ ./test4 -u foo
hash of plaintext:
00000000  9d de 76 60 b4 42 1c 51 34 69 39 c8 42 b8 77 aa
00000010  94 95 a2 cb e1 f1 49 e2 4c 70 9d f8 20 cb 5b e6
Beginning of xor pad:
00000000  9c 85 f7 12 e8 0a d4 fb fa fa 89 fc ce 26 d7 28
00000010  28 fb 75 f6 2a 08 59 df 9a c3 82 80 35 72 1d d9
buffer in:
00000000  41 42 43 44 45 0a
xor pad:
00000000  bf fa 70 44 42 bf
buffer out:
00000000  fe b8 33 00 07 b5
foo unbiased.
[gorf@box unbias]$ hexdump foo
   0: 3C F6 DF 2E DB 84 1C 7F-16 F9 76 21 80 94 04 EF <.........v!....
  10: E2 F6 57 49 35 53 4E EF-67 CE 0C D4 D8 CA 2C 7C ..WI5SN.g.....,|
  20: 01 5B 81 72 5C 48 C8 AA-CE 93 B0 34 8C 9E A0 82 .[.r\H.....4....
  30: BC 6E D7 3D CB F9 10 3D-D6 B3 1F 78 15 B9 46 3F .n.=...=...x..F?
  40: FE B8 33 00 07 B5      -                        ..3...

[gorf@box unbias]$ ./test4 foo
Beginning of xor pad:
00000000  9c 85 f7 12 e8 0a d4 fb fa fa 89 fc ce 26 d7 28
00000010  28 fb 75 f6 2a 08 59 df 9a c3 82 80 35 72 1d d9
buffer in:
00000000  fe b8 33 00 07 b5
xor pad:
00000000  bf fa 70 44 42 bf
buffer out:
00000000  41 42 43 44 45 0a
hash of restored plaintext:
00000000  9d de 76 60 b4 42 1c 51 34 69 39 c8 42 b8 77 aa
00000010  94 95 a2 cb e1 f1 49 e2 4c 70 9d f8 20 cb 5b e6
foo restored.

--------------------------------  6  ---------------------------------

Finally, an example of how one might implement the above with the
inclusion of additional features such as base64 encoding is shown
here.  It would thus become possible to unbias small binary files
and paste unbiased output into the body of an email or to
send short text messages as unbiased base64, for example.

This program takes the name of a file and optional switches to select
whether it's unbiasing or not (the default is to restore) and whether
it should use base64 encoding for the output file:

[gorf@box unbias]$ ./unbias
Usage: ./unbias [-u] [-b] <filename>
        -u      Unbias file (default is to restore)
        -b      Unbias to or restore from Base64

[gorf@box unbias]$ cat foo
Here's some ASCII text, and if we unbias it with a random
IV and base64-encode it, we'll get lots of awesome password
suggestions for free!

[gorf@box unbias]$ ./unbias -u foo
foo unbiased.
[gorf@box unbias]$ hexdump foo
   0: D7 D5 8C B7 03 EB B1 89-75 FA 44 3F 97 4B 07 B0 ........u.D?.K..
  10: B4 CF 15 B2 77 A3 DF D1-23 B3 07 F4 27 18 AB E7 ....w...#...'...
  20: 29 7C 75 81 F9 32 E2 5E-DD D6 F9 41 63 9B AA 41 )|u..2.^...Ac..A
  30: 4E 91 A5 20 B4 24 8A 12-4B 0A 20 90 BC 65 C9 60 N.. .$..K. ..e.`
  40: 86 20 6C 0B 40 8C CA EA-CD 63 CD AD F0 58 EF 8B . l.@....c...X..
  50: 2B 4B 1F 7B 8A 2E CF 83-22 D4 6D 25 C5 9B FF 68 +K.{....".m%...h
  60: 14 7B DB 79 FF 7A 61 3E-74 66 A4 ED AF 09 E5 20 .{.y.za>tf.....
  70: 6D C9 D4 B4 C4 02 09 4F-40 EA E0 7F 57 BB 22 9F m......O@...W.".
  80: D7 76 B8 45 4E 8C 9A C7-C6 15 64 9A 1B 38 D4 23 .v.EN.....d..8.#
  90: E5 52 7E 18 D8 E0 75 21-B1 6C 26 10 F1 43 53 89 .R~...u!.l&..CS.
  A0: 10 8C 6F 7C C4 1F F8 C0-02 07 6D 20 E0 ED 65 A4 ..o|......m ..e.
  B0: D0 93 41 55 9A 33 D9 02-0D 22 DD E0 93 BB 51 BC ..AU.3..."....Q.
  C0: 97 E1 24 1F C7 BB 8C CD-75 98 73 F3             ..$.....u.s.
[gorf@box unbias]$ ./unbias foo
foo restored.
[gorf@box unbias]$ ./unbias -u -b foo
foo unbiased.
[gorf@box unbias]$ cat foo
tEWtLPLjL2PZxI/53h683v6VlEvQ6hludnX//SpqSYcxu0RVLK9wl130tbd56f4g
CcRzqmlHhn/v2QN0+t3Uu7/x69UApzjz3ZED+d9OPz1iRcFKju+49rY8mNAv4PuW
DtI9+XN6n+Vsg51fXC3c6wQ2XPb6v7mKucjghcwc235jjAXBnTejWm1txzzmRxtP
y/Y1gKc+x97REXliEzzEmRrk0De8Q2rh4kA9ku2qSQEjn9ULvlkXo2SdiD4ZkuTR
8qmO/S8Y6HG3vq6l=

Base64 encoding has a 3:4 information density ratio to 8-bit binary.
48 bytes of 8-bit binary data will require 64 base64 characters
to encode the same data because base64 has an information density
of 6 bits per character.  A buffering mechanism is helpful when
converting between different information densities, and this
implementation uses buffers that match the lowest common demonimators
of 48 and 64 for buffer sizes.  A high-throughput implementation might
use larger buffers for improved performance.  Similarly, this
implementation's disk reads and writes are done using the 32-byte
unbiasing block sizes where a high-throughput implementation might
use larger buffers.  It may also be desired to change the amount of
bits fed back into the hashing algorithm for pad production.
This implemention recycles fully half of the hash's 512 bit output.
128 bits recycled and 384 bits to pad would have higher throughput.
Finally, note the two different hash widths being used for the
overall plaintext hash calculation (the CHK value, a 256 bit hash)
and the pad generator, (a 512 bit hash):

[gorf@box unbias]$ cat base64.h
#ifndef BASE64_INCLUDE
#define BASE64_INCLUDE 1

#define BASE64_EQTERM   1
#define BASE64_NULLTERM 2
#define BASE64_CRLF     4

#define BASE64          "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

void base64_enc(unsigned char * src, int len, unsigned char * dst, int opts);
int  base64_dec(unsigned char * src, unsigned char * dst, int len);

#endif

[gorf@box unbias]$ cat base64.c
/*
   base64.c
   Aaron Logue 2009
*/
#include <stdio.h>
#include "base64.h"

/*
   base64_enc(source, length, destination, options)

   The caller can enable '=' termination with BASE64_EQTERM
   The caller can enable null termination with BASE64_NULLTERM
   Both terminations can be specified with BASE64_EQTERM+BASE64_NULLTERM
   The caller must provide a buffer large enough to receive the output.
   Four bytes of output are generated for every three bytes of input
   plus up to five more bytes of output depending on termination options
   and how many bytes of input were left over.
   The two buffers must be different.
*/
void base64_enc(unsigned char * src, int len, unsigned char * dst, int opts) {
   char base64[] = BASE64;
   int i;
   int n;
   int l = 0;   // line length

   /* Process all complete groups of 3 */
   for (i = 0; i <= (len-3); i = i + 3) {
      *dst++ = base64[((src[i]&0xFC) >> 2)];
      *dst++ = base64[((src[i]&0x03) << 4)   + ((src[i+1]&0xF0) >> 4)];
      *dst++ = base64[((src[i+1]&0x0F) << 2) + ((src[i+2]&0xC0) >> 6)];
      *dst++ = base64[src[i+2]&0x3F];
      if (opts & BASE64_CRLF) {
         l += 4;
         if (l >= 64) {
             *dst++ = '\r';
             *dst++ = '\n';
             l = 0;
         }
      }
   }
   /* Process leftovers */
   n = len - i;
   if (n > 0) {
      *dst++ = base64[(src[i]&0xFC) >> 2];
      l++;
      if (n == 1) {
         *dst++ = base64[(src[i]&0x03) << 4];
         l++;
         if (opts & BASE64_EQTERM) {
            *dst++ = '=';
            l++;
         }
      } else {
         *dst++ = base64[((src[i]&0x03) << 4) + ((src[i+1]&0xF0) >> 4)];
         l++;
         *dst++ = base64[(src[i+1]&0x0F) << 2];
         l++;
      }
   }
   if (opts & BASE64_EQTERM) {
      *dst++ = '=';
      l++;
   }
   if (opts & BASE64_CRLF) {
      if (l > 0) {
         *dst++ = '\r';
         *dst++ = '\n';
          }
   }
   if (opts & BASE64_NULLTERM) {
      *dst = 0;
   }
   return;
}

/*
   base64_dec(source, destination)

   Three bytes of output are generated for every four bytes of input.
   The source and destination buffers can be the same.
   Decoding stops when a null or a '=' is encountered, or if len is
   not equal to -1, after len input characters have been processed.
   Return value: The number of bytes that were fully produced.
*/
int base64_dec(unsigned char * src, unsigned char * dst, int len) {
   unsigned char unbase64[] = { /* 64s are ignored, 65s kick us out */
      65,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,62,64,64,64,63,
      52,53,54,55,56,57,58,59,60,61,64,64,64,65,64,64,
      64, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,
      15,16,17,18,19,20,21,22,23,24,25,64,64,64,64,64,
       0,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
      41,42,43,44,45,46,47,48,49,50,51,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
      64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,
   };
   unsigned char bits;
   int i = 0;
   int state = 0;
   char * start = dst;

   while ((bits = unbase64[src[i]]) <= 64) {
          if (len != -1 && i >= len) {
             break;
          }
      if (bits < 64) {
         switch (state) {
            case 0: *dst  = (bits & 0x3F) << 2; i++; state++; break;
            case 1: *dst |= (bits & 0x30) >> 4; dst++;
                    *dst  = (bits & 0x0F) << 4; i++; state++; break;
            case 2: *dst |= (bits & 0x3C) >> 2; dst++;
                    *dst  = (bits & 0x03) << 6; i++; state++; break;
            case 3: *dst |= (bits & 0x3F);      dst++; i++; state = 0; break;
         }
      } else {
         i++;
      }
   }
   return (int)((char*)dst - start);
}

[gorf@box unbias]$ cat unbias.c
/****************************************************************************
 * Aaron Logue 2013 www.cryogenius.com
 * File Unbiasing / Restoration using the V1.3 Skein hash from
 * http://www.skein-hash.info/sites/default/files/NIST_CD_102610.zip
 * Copy the following files from the Optimized_32bit directory:
 *   brg_endian.h brg_types.h skein_block.c skein.c skein.h
 *   skein_iv.h skein_port.h
 * Also copy base64.c and base64.h from whereever you got this
 * And build with:
 *   gcc -s -O   -c -o skein.o skein.c
 *   gcc -s -O   -c -o skein_block.o skein_block.c
 *   gcc -s -O   -c -o base64.o base64.c
 *   gcc -s -O -o unbias skein.o skein_block.o unbias.c
 * NOTE: This program wipes and deletes, so be careful of what you test it on!
 ***************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "skein.h"
#include "base64.h"

#define HASHBITS  (512)
#define HASHSIZE  (HASHBITS/8)
#define BLOCKBITS (HASHBITS/2)
#define BLOCKSIZE (BLOCKBITS/8)
#define PIDLEN    20

#define WRITEBUF  (BLOCKSIZE*3)
#define ROWLEN    64
#define ROWBYTES  48

typedef struct {
   FILE *fp;
   char *buf;
   int start;
   int remaining;
} filebuf_t;

int get_iv(unsigned char *iv, int len) {
 int fd, num;
   num = 0;
   fd = open("/dev/urandom", O_RDONLY);
   if (fd >= 0) {
      num = read(fd, iv, len);
      close(fd);
   }
   return (num == len);
}

int count64(unsigned char *buf) {
 int num = 0;
   while (*buf) {
      if (strchr(BASE64, *buf)) {
         num++;
      }
      buf++;
   }
   return num;
}

/****************************************************************************
 * init_file_buffer()
 ***************************************************************************/
filebuf_t * init_file_buffer(FILE *fp) {
 filebuf_t * fbuf;
   fbuf = (filebuf_t *)malloc(sizeof(filebuf_t));
   if (fbuf) {
      fbuf->fp = fp;
      fbuf->buf = (unsigned char *)malloc(WRITEBUF);
      fbuf->start = 0;
      fbuf->remaining = 0;
   }
   return fbuf;
}

/****************************************************************************
 * base64_read()
 * Load up to ROWLEN characters of base64 and decode ROWBYTES bytes at a time.
 * ROWLEN = 64, ROWBYTES = 48, and len almost always = 32.  This code takes
 * advantage of that to reduce data copies.
 ***************************************************************************/
int base64_read(filebuf_t *fbuf, unsigned char *buf, int len) {
 int num, count, start;
 unsigned char scratch[ROWLEN + 20]; /* 1 row of base64 output */

   if (fbuf->remaining < len) {
      /* load one 64-character row of base64 into scratch buffer */
      start = 0;
      num = 0;
      count = 0;
      scratch[start] = 0;
      while (num = fread(&scratch[start], 1, ROWLEN - count, fbuf->fp)) {
         scratch[start + num] = 0;
         count += count64(&scratch[start]);
         start += num;
         if (count >= ROWLEN) {
            break;
         }
         /* Defend against too much non-base64 garbage in file */
         if (start + ROWLEN - count > (ROWLEN+20)) {
            num = 0;
            start = 0;
            while (scratch[num]) {
               if (strchr(BASE64, scratch[num])) {
                  scratch[start] = scratch[num];
                  start++;
               }
               num++;
            }
            scratch[start] = 0;
         }
      }

      if (!fbuf->remaining) {
         fbuf->start = 0;
      } else {
         if (fbuf->start + fbuf->remaining + ROWBYTES > WRITEBUF) {
            /* Prevent fbuf->buf overrun by sliding data up to make room */
            /* We should only get here if BLOCKSIZE != 32 or ROWLEN != 64 */
            memcpy(fbuf->buf, &fbuf->buf[fbuf->start], fbuf->remaining);
            fbuf->start = 0;
         }
      }

      /* Decode up to ROWBYTES bytes to fbuf->buf[fbuf->start + fbuf->remaining] */
      num = base64_dec(scratch, &fbuf->buf[fbuf->start + fbuf->remaining], -1);
      fbuf->remaining += num;
   }

   /* Return what we've got, advance start, reduce remaining. */
   num = len;
   if (fbuf->remaining < num) {
      num = fbuf->remaining;
   }
   if (num) {
      memcpy(buf, &fbuf->buf[fbuf->start], num);
      fbuf->start += num;
      fbuf->remaining -= num;
   }
   return num;
}

/****************************************************************************
 * base64_write()
 * len is assumed to be BLOCKSIZE or a one-time final value which can be less.
 * The receiving buffer is 3*BLOCKSIZE.  BLOCKSIZE is < ROWBYTES, so the first
 * write does nothing but stores the data in fbuf->buf.  The second write
 * appends the block and triggers a base64 output, which bumps start and
 * leaves 16 bytes remaining to be written in fbuf->buf.  The third write
 * appends the block to the 16 and triggers another base64 output, which
 * consumes the remaining data and resets start to 0. This saves a data copy.
 ***************************************************************************/
int base64_write(filebuf_t *fbuf, unsigned char *buf, int len) {
 unsigned char scratch[80]; /* 1 row of base64 output */

   /* We are unbiasing to base64. Copy len bytes to buffer */
   memcpy(&fbuf->buf[fbuf->start + fbuf->remaining], buf, len);
   fbuf->remaining += len;
   if (fbuf->remaining >= ROWBYTES) {
      base64_enc(&fbuf->buf[fbuf->start], ROWBYTES, scratch, BASE64_CRLF);
      fwrite(scratch, 1, 66, fbuf->fp);
      fbuf->start += ROWBYTES;
      fbuf->remaining -= ROWBYTES;
      if (!fbuf->remaining) {
         fbuf->start = 0;
      } else {
         /* Defend against fbuf->buf overrun */
         if (fbuf->start + BLOCKSIZE > WRITEBUF) {
            /* We should only get here if BLOCKSIZE != 32 */
            memcpy(fbuf->buf, &fbuf->buf[fbuf->start], fbuf->remaining);
            fbuf->start = 0;
         }
      }
   }
   return 1;
}

/****************************************************************************
 * flush_and_close()
 * If outputting unbiased base64, perform final write of partial row.
 ***************************************************************************/
int flush_and_close(filebuf_t *fbuf) {
 unsigned char scratch[80]; /* 1 row of base64 output */

   if (fbuf->remaining) {
      base64_enc(&fbuf->buf[fbuf->start], fbuf->remaining, scratch,
                 BASE64_EQTERM | BASE64_CRLF | BASE64_NULLTERM);
      fwrite(scratch, 1, strlen(scratch), fbuf->fp);
   }
   fclose(fbuf->fp);
   free(fbuf->buf);
   free(fbuf);
   return 1;
}

void show_usage(char *me) {
   printf("Usage: %s [-u] [-b] <filename>\n", me);
   printf("\t-u\tUnbias file (default is to restore)\n");
   printf("\t-b\tUnbias to or restore from Base64\n");
   return;
}

int main(int argc, char *argv[]) {
 int i, num;
 char c;
 int unbiasing, base64;
 Skein_512_Ctxt_t ctx;
 Skein_512_Ctxt_t ctx_chk;
 unsigned char pad[HASHSIZE];
 unsigned char iv[BLOCKSIZE];
 unsigned char chk[BLOCKSIZE];
 char buffer[BLOCKSIZE];
 FILE *fp, *fo;
 char *infile, *outfile;
 struct stat info;
 filebuf_t *base64buf;

   unbiasing = 0;
   base64 = 0;
   infile   = NULL;
   i = 1;
   while (i < argc) {
      if (!strcmp(argv[i], "-u")) {
         unbiasing = 1;
         if (!get_iv(iv, BLOCKSIZE)) {
            printf("Could not obtain random number for IV.\n");
            return 0;
         }
      } else if (!strcmp(argv[i], "-b")) {
         base64 = 1;
      } else {
         if (!infile) {
            infile = (char *)malloc(strlen(argv[i]) + 1);
            if (infile) {
               strcpy(infile, argv[i]);
               outfile = (char *)malloc(strlen(infile) + PIDLEN);
               if (!outfile) {
                  printf("Could not allocate buffer for output filename.\n");
                  return 0;
               }
               sprintf(outfile, "sk%d_%s", getpid(), infile);
            }
         } else {
            show_usage(argv[0]);
            return 0;
         }
      }
      i++;
   }
   if (!infile) {
      show_usage(argv[0]);
      return 0;
   }

   fp = fopen(infile, "r");
   if (fp == NULL) {
      printf("Could not open %s for input.\n", infile);
      free(outfile);
      free(infile);
      return 0;
   }
   if (!unbiasing && base64) {
      base64buf = init_file_buffer(fp);
   }
   if (unbiasing) {
      /* Hash plaintext to compute CHK value with, then fseek back to start */
      Skein_512_Init(&ctx, BLOCKBITS);
      while (num = fread(buffer, 1, BLOCKSIZE, fp)) {
         Skein_512_Update(&ctx, buffer, num);
      }
      Skein_512_Final(&ctx, chk);
      fseek(fp, 0, SEEK_SET);
   } else {
      if (!unbiasing && base64) {
         num = base64_read(base64buf, iv, BLOCKSIZE);
         if (num == BLOCKSIZE) {
            num = base64_read(base64buf, chk, BLOCKSIZE);
         }
      } else {
         num = fread(iv, 1, BLOCKSIZE, fp);
         if (num == BLOCKSIZE) {
            num = fread(chk, 1, BLOCKSIZE, fp);
         }
      }
      if (num != BLOCKSIZE) {
         printf("Could not read IV and CHK from %s\n", infile);
         fclose(fp);
         free(outfile);
         free(infile);
         return 0;
      }
   }
   fo = fopen(outfile, "wb");
   if (fo == NULL) {
      printf("Could not open %s for output.\n", outfile);
      fclose(fp);
      free(outfile);
      free(infile);
      return 0;
   }
   if (unbiasing && base64) {
      base64buf = init_file_buffer(fo);
   }

   Skein_512_Init(&ctx, HASHBITS);
   Skein_512_Update(&ctx, iv, BLOCKSIZE);
   Skein_512_Final(&ctx, pad);
   /* Unbias or restore the chk value using the start of the xor pad */
   for (i=0; i < BLOCKSIZE; i++) {
      chk[i] ^= pad[i+BLOCKSIZE];
   }
   if (unbiasing) {
      if (base64) {
         base64_write(base64buf, iv, BLOCKSIZE);
         base64_write(base64buf, chk, BLOCKSIZE);
      } else {
         fwrite(iv, 1, BLOCKSIZE, fo);
         fwrite(chk, 1, BLOCKSIZE, fo);
      }
   } else {
      Skein_512_Init(&ctx_chk, BLOCKBITS);
   }
   do {
      if (!unbiasing && base64) {
         num = base64_read(base64buf, buffer, BLOCKSIZE);
      } else {
         num = fread(buffer, 1, BLOCKSIZE, fp);
      }
      Skein_512_Update(&ctx, pad, BLOCKSIZE);
      Skein_512_Final(&ctx, pad);
      for (i=0; i < num; i++) {
         buffer[i] ^= pad[i+BLOCKSIZE];
      }
      if (unbiasing && base64) {
         base64_write(base64buf, buffer, num);
      } else {
         fwrite(buffer, 1, num, fo);
      }
      if (!unbiasing) {
         Skein_512_Update(&ctx_chk, buffer, num);
      }
   } while (num);

   fclose(fp);
   if (!unbiasing && base64) {
      free(base64buf->buf);
      free(base64buf);
   }
   if (unbiasing && base64) {
      flush_and_close(base64buf);
   } else {
      fclose(fo);
   }
   /* If restoring, make sure CHK matches before clobbering input file */
   if (!unbiasing) {
      /* Overload pad with computed chk value to test against read value */
      Skein_512_Final(&ctx_chk, pad);
      for (i=0; i < BLOCKSIZE; i++) {
         if (chk[i] != pad[i]) {
            printf("The restored plaintext does not match the original. The IV may be corrupt.\n");
            remove(outfile);
            free(outfile);
            free(infile);
            return 0;
         }
      }
   } else {
      /* overwrite the original file with zeros */
      for (i=0; i < BLOCKSIZE; i++) {
         buffer[i] = 0;
      }
      if (!stat(infile, &info)) {
         i = info.st_size / BLOCKSIZE + 1;
         fp = fopen(infile, "w");
         if (fp) {
            while (i--) {
               fwrite(buffer, 1, BLOCKSIZE, fp);
            }
            fclose(fp);
         }
      }
   }
   remove(infile);
   rename(outfile, infile);
   if (unbiasing) {
      printf("%s unbiased.\n", infile);
   } else {
      printf("%s restored.\n", infile);
   }
   free(outfile);
   free(infile);
   return 1;
}

--------------------------------  7  ---------------------------------

Notes that I made while working on this project, in no particular order:

One way that the use of cryptographically secure hashes can be
compromised is by not providing enough input bits.  No matter
how good a hash algorithm is, an attacker might try hashing a
bunch of guesses until they find a match.  If you are using a
strong hash algorithm, make sure the hash outputs are based on
too many possible different inputs for an attacker to grind through.
In the unbiaser, the 256-bit IV is the starting point for
the hash-generated XOR pad.

Consider the unbiasing of ASCII data, where the high bit of every
byte is set to 0.  If the same pad were used to unbias multiple
files, an observer would see that some bits of the unbiased plaintext
never change, giving them a potential start on a correlation attack.
Defending against this is the purpose of the IV.  Ideally, you want
observers to measure a 50/50 bias in all of the bits, no matter how
they try to correlate them.  This includes metadata (which in the case
of unbias is the IV and CHK values.)

Beware that any deterministic system's RNG is of particular interest to a
cryptanalyst because the RNG tends to be the source of the bits that
establish the system's initial state.

True RNGs are harder to implement than one might think because of the
need to base them on bits from the physical world.  Their implementation
will vary from platform to platform.  For any given platform, you should
take a look at the RNG to be sure it isn't only providing 20 or 30 truly
random bits when you're expecting a lot more than that.  Windows platforms
have a long history of being problematic in this regard.

For Linux systems, /dev/urandom is similar to /dev/random, except that if
the RNG runs out of random bits that were gathered by measuring various
real-world things, it reverts to a CSPRNG.  /dev/random does not revert,
but instead stops your program and waits for more real-world bits to show up.
You probably don't want your program to just stop and wait.  Fortunately,
/dev/urandom is pretty good to use in terms of security because it's based
on a CSPRNG, unless your system is lacking in sufficient entropy sources
at which point it is deterministic.  On a Linux system, you can try reading
from /dev/random, and if it's working, it should be okay to use /dev/urandom.
If you absolutely positively must be sure about it, dig into the code that
implements the "entropy pool" and instrument it to tell you what its rate
of entropy gathering is.

There is a ramification of the above that programmers writing security-related
code that could execute on bootup should be mindful of.  If a system does not
maintain some form of persistent-storage entropy pool that it can stock its
PRNG with immediately after a reboot, then there will be a period of time after
a reboot when it doesn't have sufficient entropy in its RNG to generate secure keys.
If a system can be remotely crashed and reboots in one of a relatively small number
of possible states because it begins immediately generating keys based on /dev/urandom,
it may be vulnerable to attacks that use an initial crash/reboot as a stepping
stone to gain access.

----------------------------------------------------------------------

W9BcVSwTSrTYM1INpz6/1iJ9svVAXq26O4HdFcM7+Bm2PDUkbWZSdm0p9N0MvGC7
l3zI/KLE0aHmBJV+f13n64lnx60mVocFRdSoD+jgLVs1WP/CCNoER0pwTCPRsY2G
xQ==