Design Unique ID Generator

Ishan Aggarwal
4 min readJan 21, 2023

--

In this problem, we are required to generate IDs (numeric values — 8 bytes long) that have the following properties :

  1. The ID should be unique.
  2. The value of the ID must be incremental.

What does incremental mean?

Let us suppose two IDs (numeric values), v1 and v2 are generated at time t1 and t2, respectively. So, incremental means here that if t1 < t2, it should imply that value v1 < v2.

Questions: What options do we have to construct these IDs?

  1. Auto Increment in SQL
  2. UUID
  3. Timestamp
  4. Timestamp + Server ID
  5. Multi-master

Ques1: Why can’t we directly use IDs as 1,2,3,… i.e., the auto-increment feature in SQL to generate the IDs?

Sequential IDs work well on local systems but not in distributed systems. Two different systems might assign the same ID to two different requests in distributed system environment. So, the uniqueness property will be violated here.

Ques2: Why can’t we use UUID?

We cannot use UUID here because

  1. UUID is random.
  2. UUID is not numeric.
  3. The property of being incremental is not followed, although the values are unique.

Ques3: Why can’t we use timestamp values to generate the IDs?

The reason is again same — the failure to assign unique ID values in distributed system environment. It may be possible that two requests land on two different systems at the same timestamp. So, if the timestamp parameter is utilized, both requests will be assigned the same ID based on epoch values (the time elapsed between the current timestamp and a pre-decided given timestamp). So, here the 1st property for uniqueness gets violated.

Ques4: Timestamp + ServerID

The question here would be how many bits are to be assigned to the Timestamp and ServerID. This situation would be tricky.

Ques5: Multi-Master

Let us assume there are three different machines in a distributed computing environment. Let us number the machines as M1, M2, and M3. Now suppose the machine M1 is set to assign the IDs, which are a multiple of 3n, machine M2 is set to assign IDs that are a multiple of 3n+1, and machine M3 is set to assign IDs that are a multiple of 3n+2.

This situation can be represented diagrammatically as follows:

If we assign IDs using this technique, the uniqueness problem will be solved. But it might violate the 2nd property i.e. assign IDs (values) in incremental nature in a distributed system.

For example, let us suppose that request keeps coming consecutively on M1. In this case, it would assign the IDs as 99,102,105,… . Now, after some time, the request comes on some other machine (e.g., M2), then the ID assigned would have a numeric value lower than what was assigned previously.

Twitter Snowflakes Algorithm For Unique ID Generator

In this algorithm, we have to design a 64-bit solution. The structure of the 64 bits looks as follows :

Refer below for each of the section details one by one -

1. Sign Bit:

The sign bit is never used. Its value is always zero. This bit is just taken as a backup that will be used at some time in the future.

2. Timestamp Bits:

This time is the epoch time. Previously the benchmark time for calculating the time elapsed was from 1st Jan 1970. But Twitter changed this benchmark time to 4th November 2010.

3. Data center Bits:

5 bits are reserved for this, which means that there can be (2⁵) = 32 data centers.

4. Machine ID Bits:

5 bits are reserved for this, which again means that there can be (2⁵) = 32 machines per data center.

5. Sequence No Bits:

These bits are kept to ensure the uniqueness property to be maintained when multiple requests land on the same machine on the same data center at the same timestamp. So, these bits are used for generating sequence numbers for IDs that are generated at the same timestamp. The sequence number is reset to zero at every millisecond. Since we have reserved 12 bits for this, we can have (2¹²) = 4096 sequence numbers which are certainly more than the IDs that are generated every millisecond by every single machine.

Thank you so much for reading this article. If you found this helpful, please do share it with others and on social media. It would mean a lot to me.

--

--

Ishan Aggarwal
Ishan Aggarwal

Written by Ishan Aggarwal

Consulting Principal MTS @ Oracle Cloud Infrastructures | Works on designing highly Scalable and Distributed Systems

Responses (4)