How to succinctly, portably, and thoroughly seed the mt19937 PRNG?
An answer to this question on Stack Overflow.
Question
I seem to see many answers in which someone suggests using <random> to generate random numbers, usually along with code like this:
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 5);
dis(gen);
Usually this replaces some kind of "unholy abomination" such as:
srand(time(NULL));
rand()%6;
We might criticize the old way by arguing that time(NULL) provides low entropy, time(NULL) is predictable, and the end result is non-uniform.
But all of that is true of the new way: it just has a shinier veneer.
rd()returns a singleunsigned int. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.Using
std::mt19937 gen(rd());gen()(seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)std::random_devicecan be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse thantime(NULL).
Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.
In light of this, my question is How can one succinctly, portably, and thoroughly seed the mt19937 PRNG in C++?
Given the issues above, a good answer:
- Must fully seed the mt19937/mt19937_64.
- Cannot rely solely on
std::random_deviceortime(NULL)as a source of entropy. - Should not rely on Boost or other libaries.
- Should fit in a small number of lines such that it would look nice copy-pasted into an answer.
Thoughts
My current thought is that outputs from
std::random_devicecan be mashed up (perhaps via XOR) withtime(NULL), values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.std::random_device::entropy()does not give a good indication of whatstd::random_devicemight or might not do.
Answer
Here's my own stab at the question:
#include <random>
#include <chrono>
#include <cstdint>
#include <algorithm>
#include <functional>
#include <iostream>
uint32_t LilEntropy(){
//Gather many potential forms of entropy and XOR them
const uint32_t my_seed = 1273498732; //Change during distribution
static uint32_t i = 0;
static std::random_device rd;
const auto hrclock = std::chrono::high_resolution_clock::now().time_since_epoch().count();
const auto sclock = std::chrono::system_clock::now().time_since_epoch().count();
auto *heap = malloc(1);
const auto mash = my_seed + rd() + hrclock + sclock + (i++) +
reinterpret_cast<intptr_t>(heap) + reinterpret_cast<intptr_t>(&hrclock) +
reinterpret_cast<intptr_t>(&i) + reinterpret_cast<intptr_t>(&malloc) +
reinterpret_cast<intptr_t>(&LilEntropy);
free(heap);
return mash;
}
//Fully seed the mt19937 engine using as much entropy as we can get our
//hands on
void SeedGenerator(std::mt19937 &mt){
std::uint_least32_t seed_data[std::mt19937::state_size];
std::generate_n(seed_data, std::mt19937::state_size, std::ref(LilEntropy));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
mt.seed(q);
}
int main(){
std::mt19937 mt;
SeedGenerator(mt);
for(int i=0;i<100;i++)
std::cout<<mt()<<std::endl;
}
The idea here is to use XOR to combine many potential sources of entropy (fast time, slow time, std::random-device, static variable locations, heap locations, function locations, library locations, program-specific values) to make a best-effort attempt at initializing the mt19937. As long as at least once source is "good", the result will be at least that "good".
This answer is not as short as would be preferable and may contain one or more mistakes of logic. So I'm considering it a work in progress. Please comment if you have feedback.