Generating a Unique Number
Weird requirement at work today: We needed a number that was alpha-numeric, exactly 11 digits, unique, and non-sequential. At first we thought of using a hash, but MD5 and Sha1 give you way too many digits. We could truncate to 11, but not knowing much about hashing, that made me pretty nervous that we'd have collisions. After a bunch of discussion, we decided on this:
Update: After some interesting comments, I wrote a follow up post. Check it out for more math goodness.
I was thinking about using hex, but I was worried about what would happen if the number got too big and went to 12 digits. So I did this calculation:
MAXIMUM_FOR_10_DIGITS = 0xffffffffff
(MAXIMUM_FOR_10_DIGITS + cart_id + Time.now.to_i + rand(100_000_000)).to_s(16)
Turns out there's all sorts of room between the lowest and highest 11 digit hex number. Paul and Schubert did some quick calculations assuming lots of carts per day and figured out that we have about 250,000 years before we run out of space. Now since we use a random number, there is a chance that we could get two matching numbers so we do a database lookup to make sure that a generated number hasn't been used before. There is still a small chance that two identical numbers could be generated in the time between generation and saving to the database, but I think we can live with that.
irb(main):001:0> 0xfffffffffff - 0x10000000000
=> 16492674416639
Update: After some interesting comments, I wrote a follow up post. Check it out for more math goodness.
Comments
PRIME = 2654435761
MAX = 2**44-1
"%.11X" % (VALUE * PRIME & MAX)"
I'm not quite sure about that, but I think the average number of steps after which you get a collision in a set of N elements is sqrt(N). So that you'll probably need to wait till you got more than 300.000 ids to find one!
Why not just start at 0 and count up sequentially (or the opposite)?
2) You do not need a hash, you need a random number. As you are already checking for collisions, you could just rand(max) and then convert to your base of choice (which leads me to mynext point)
When you have about 15000 keys in your database, the probability of a collision is over 50%. When you have 50000 keys, you have a probability of 99.999% of a match! So it gets almost impossible to find a free key.
I think you should try to use the maximum possible randomness to get around this problem. The largest number is
largest = ("Z"*11).to_i(26+10)
id = rand(largest).to_s(26+11)
Then you have about 56 bit of randomness which is a lot more. The chance to get a collision here is about 50% when you have 400 million keys. With 800 million keys the probability is 91%.
((current Unix Time to the nanosecond precision) * 1000) + rand(1000)
You're only chance of collision with this number 1/1000 if two requests for an ID number is made within the same nanosecond. You will have several thousand years before this value will wrap around but even then you have very little chance of collision. This will not correlate your order id to a sequential number and should satisfy the business requirement. You can also avoid any chance of predictability by obfuscating the result by picking a secret prime number < MAX_ORDER_ID and do:
(PRIME**orderID) % PRIME.
Check out : http://en.wikipedia.org/wiki/Coprime