How to compute a signature for a changing `std::map`
An answer to this question on Stack Overflow.
Question
We have 5 servers each running the same service that produces a std::map. Each item in the std::map is composed of a unique integer as key and a double as its corresponding value. In order to check the consistency across different machines, we need to constantly check the equality of the std::map among five servers.
Each std::map stores 2 millions different items and it keeps changing during the day. The naive way to compare the value is as follows:
compare S1 with S2, S3, S4, S5
compare S2 with S3, S4, S5
compare S3 with S4, S5
compare S4 with S5
This is N*N complexity and whenever a single value in the map is changed, the O(N) comparison has to be redone.
A better idea is to build a signature for each map and the map comparison is reduced to float number comparison in the end. There are two challenges here:
- how to build a single value based on a huge map
- how to compute this value incrementally based on the changing map
Any suggestion is appreciated.
Answer
The incremental hash is clearly the way to go.
Unless, for whatever reason, it isn't. Another way to do this is to serialize and hash the whole tree. Boost provides a handy serialization function and there is an MD5 hash generator available here. Combining these two gives you a way to hash any map at any time. The code below will compile with:
g++ test.cpp md5.cpp --std=c++11 -lboost_serialization
You can, of course, substitute the MD5 hash for another of your choosing.
#include <map>
#include <sstream>
#include <iostream>
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/map.hpp>
#include <boost/archive/text_oarchive.hpp>
#include "md5.h"
int main()
{
std::map<int, int> map = {{1,2}, {2,1}};
std::stringstream ss;
boost::archive::text_oarchive oarch(ss);
oarch << map;
std::cout<<md5(ss.str())<<std::endl;
}