Return to Generating a Unique Number
In a previous post I discussed generating a unique id number for an order in our database. Read the original post for the full story, but to sum up, I decided that playing around with rand(100_000_000) was good enough. Martin Ankerl posted a comment pointing out that I was vulnerable to the Birthday Problem. So after a quick trip to wikipedia, I found out that if you have 23 people in a room you might think you have a 23/365 chance of having 2 people with the same birthdays, but no, your chances are closer to 50%! This is because in a group of 23 people there are 253 pairs each of which have a 1/365ish chance to have the same birthday. The formula to calculate how many are needed to give you a 50% chance of having a birthday collision is:
.5 + sqrt(.25 + 2 * 365 * ln(2) ) = 22.9999
Which means the 50% chance of collision mark for 100,000,000 is:
.5 + sqrt(.25 + 2 * 100,000,000 * ln(2) ) = 11774.600235771
Holy crap! I'm screwed if I can only have 12,000 orders before I get a 50% chance of collision. But wait, I think I'm ok because every time I insert a key into the database I check to make sure it isn't already in there. This means that when I'm at 12,000 orders the next key has only a 1 / (100,000,000 - 12,000) chance of being a collision. I think I've changed from the "Birthday problem" to the "Same birthday as you" problem, which is way more intuitive. If I'm wrong I rely on you, the blogoshpere, to tell me so.
If you haven't already, go read the wikipedia article on the Birthday problem. It's pretty cool.
.5 + sqrt(.25 + 2 * 365 * ln(2) ) = 22.9999
Which means the 50% chance of collision mark for 100,000,000 is:
.5 + sqrt(.25 + 2 * 100,000,000 * ln(2) ) = 11774.600235771
Holy crap! I'm screwed if I can only have 12,000 orders before I get a 50% chance of collision. But wait, I think I'm ok because every time I insert a key into the database I check to make sure it isn't already in there. This means that when I'm at 12,000 orders the next key has only a 1 / (100,000,000 - 12,000) chance of being a collision. I think I've changed from the "Birthday problem" to the "Same birthday as you" problem, which is way more intuitive. If I'm wrong I rely on you, the blogoshpere, to tell me so.
If you haven't already, go read the wikipedia article on the Birthday problem. It's pretty cool.
Comments
I'm not much on doing the math unless I have to, so I thought I'd just try your algorithm with different numbers of carts. Here are my results:
15,000 carts only generated a couple of collisions on average (0.013%).
150,000 carts generated about 90 collisions on average (0.06%).
1,500,000 carts generated 10,000 to 12,000 collisions (0.73%).
15,000,000 carts generated 1,180,524 collisions (7.87%).
The most collisions I saw while trying to find a unique key was 5, and that was quite exceptional, more often there was one or no collisions before a unique key was found. It gets lots worse as you approach your ramdom factor, so my guess is that this thing will get inefficient as you get close to 100,000,000 carts, but it will still work. I tried generating 150,000 carts when the random factor and it was working, but it would take sometimes more than 10 attempts to find a unique key.
I also tried replacing your algo with truncated MD5. At 150,000 carts there were no collisions, but generating 150,000 MD5 hashes is really, really, really slow, In terms of performance, your system is much faster and imho good enough.
But with only 11 digits you're definitely giving up a lot of the 32 hex digits that make up a normal GUID.
If the business accepts any alphanumeric character for this ID, I would use a random generator that selects from all the allowed characters, that will give you the least chance of collision. The implementation would be very similar to a Random Password Generator solution (generate random characters from an allowed character set).