Recently during an interview I got into an argument defending Radix sort to be the best way to sort. Reasons – nearly constant time. Unbelievable? Well, Let me explain –
- Radix sort is linear in operation.
- The computation time is dependent on –
- The highest powers of 10 (base for Decimal Number system) that the largest number corresponds to
- And of course on the base of the number system we’re operating with (which of course is decimal in our case).
As wikipedia puts it, the time taken for Radix sort is – O(wN), where w = word size, which is the largest size of data that the cpu register can hold (the cpu registers store temporary data during computation). And, w tends-to-be-constant for a large n.
In the video below:
the collection contains numbers – [ 9,179,139,38,10,5,36 ], the highest being 179. The loop would run 3 times accessing each array element.
Explaining the operations in the video above:
- There are totally 3 iterations
- The closest power of 10 that the highest number 179 can match is 2
- 179 is closest to 10^2 = 100
- And, max iterations = power+1 which in this case is 3.
- Had the maximum number been in the ranges of 1000s e.g. for a collection like [ 9,179,139,38,10,5,36, 1037 ], the loop would have run 4 times (3 from 10^3, plus 1).
- And, why we choose powers of 10
- because that’s the base of the number system we used i.e. decimals.
- Had we used binaries, this would have been computed in terms of powers of 2, 8 for octals, 16 for hexadecimals and so on
- The no. of iterations do not depend on the number of elements
- The time complexity depends entirely on
- the number system
- and magnitude of the highest number in the collection
So, considering the fact that the we’ve to deal with a certain maximum number and not beyond that, won’t it be safer to conclude it is nearly constant time?
- No comparison operations – In Radix Sort, you just take the LSBs–MSBs into considerations
- The number of input elements doesn’t matter
- For average case, the complexity equation deals with a unit power of n, where n is the number of input elements
- Added space complexity – maintaining a separate collection to store the processed numbers
- Doesn’t use hardware caches (Quicksort is the most efficient in this case, where we store numbers in the registers, although I’ve not verified this claim so far)