MySQL: Insert Performance - UUID vs Ordered UUID vs Integer Primary Key
I've recently considered the use of UUIDs as a primary key on a project. Every fibre of my being says "no" to UUIDs as a primary key. The type-1 UUIDs used in MySQL do have the golden ticket of being genuinely unique, but they're cumbersome, slow and just nasty from a DBA perspective. Integers are smaller, faster, and easily passed between humans:
- It came up with error reading 15826
- It came up with error reading c416bfe0-a01f-11e5-9e96-5254001bb2ad
(and you know that'd arrive as a screenshot saved in a Word document.)
After a read up on UUID performance problems, it seems to come down indexing. UUIDs can't index sequentially, causing massive rebuilds of the index to insert them and long seeks to find them. This gets worse as the index grows and once it becomes disk bound, you've basically had it. In comparison, integers simply append to the index and increment a counter.
Since MySQL's UUID() returns a Type-1 UUID, there are solutions out there which suggest re-ordering the timestamp part of the UUID to allow for sequential indexing. I decided to put this to the test against InnoDB. A simple test: a stored procedure that inserts 25 million rows in transactions of 1m was run 4 times in parallel, to insert a total of 100m rows into a table containing two fields: the PK and a timestamp. This was then plotted onto the graph below. Click for a larger view.
As you can see, the ordered UUID smashes the unordered UUID, providing consistent performance as the table grows. This is because when ordered, the UUID only needs to append to the index like INT32. The counter check and increment is replaced with an index lookup, where it can quickly determine the new record can be appended. It still has more data to store, and there is a small CPU overhead in the re-ordering, so it will always be slower than INT32.
The unordered UUID result is as I expected and what I have witnessed at work where type-4 UUIDs are used. While they may work for small tables, once the index becomes disk bound at roughly 10 minutes in, the performance drops rapidly as the index grows and more seeks/rebuilds are needed.
Also interesting was the final size of the database, with the INT32 AI using 1.9GB, ordered UUID using 4.5GB and the unordered UUID coming in at 8.2GB. The difference between INT32 and the ordered UUID is down to the data size. While the difference between ordered and unordered UUIDs is down to the rebuilds that happen on the unordered index.
In summary, ordered UUIDs are twice as slow for inserts as INT32, but they are predictable and consistent in their insert performance. If you're going to use UUIDs in MySQL, order them.
The functions used to convert the UUID() into the ordered UUID() are below. These are based off the advice on the StackOverflow question linked above.
Possible future plans: I'm re-ordering on strings, I'd like to try doing the re-ordering at the binary level. Does MySQL have a way to measure the overhead of these functions? If high, could I offload the re-ordering to a GPU for silly speeds?
CREATE FUNCTION `fnGuidToBinary`( $uuid CHAR(36) ) RETURNS BINARY(16) NO SQL DETERMINISTIC BEGIN DECLARE $result BINARY(16) DEFAULT NULL; DECLARE $noDashes CHAR(32) DEFAULT NULL; IF $uuid IS NOT NULL THEN SET $noDashes = REPLACE($uuid,'-',''); SET $result = UNHEX(CONCAT( -- Move most significant bytes infront for sequential time based indexing. -- Char Index: 12345678901234567890123456789012 -- Input UUID: 123456789ABCDEFGHIJKLMNOPQRSTUVW -- Out UUID: DEFG9ABC12345678HIJKLMNOPQRSTUVW SUBSTRING($noDashes, 13, 1), -- D SUBSTRING($noDashes, 14, 3), -- EFG SUBSTRING($noDashes, 9, 4), -- 9ABC SUBSTRING($noDashes, 1, 8), -- 12345678 SUBSTRING($noDashes, 17, 16) -- H-W )); END IF; RETURN $result; END
CREATE FUNCTION `fnBinaryToGuid`( $binaryUuid BINARY(16) ) RETURNS CHAR(36) CHARSET utf8 NO SQL DETERMINISTIC BEGIN DECLARE $result CHAR(36) DEFAULT NULL; DECLARE $stringUuid CHAR(36) DEFAULT NULL; IF $binaryUuid IS NOT NULL THEN -- Reverse significant bit ordering done in GuidToBinary() and -- restore dashes. -- Char Index: 12345678901234567890123456789012 -- Input UUID: EFG9ABC12345678DHIJKLMNOPQRSTUVW -- Out UUID: 123456789ABCDEFGHIJKLMNOPQRSTUVW SET $stringUuid = HEX($binaryUuid); SET $result = LOWER(CONCAT( SUBSTRING($stringUuid, 9, 8), '-', -- 12345678 SUBSTRING($stringUuid, 5, 1), -- 9 SUBSTRING($stringUuid, 6, 3), '-', -- ABC SUBSTRING($stringUuid, 1, 1), -- D SUBSTRING($stringUuid, 2, 3), '-', -- EFG SUBSTRING($stringUuid, 17, 4), '-', -- H-K SUBSTRING($stringUuid, 21, 12) -- L-W )); END IF; RETURN $result; END