Radix sort, signed Ints, Floats and Strings(Nothing to fill)

Started by Sinjin, May 17, 2023, 23:52:01

Previous topic - Next topic

Sinjin

I recently came to Radix-sort and I noticed people seem to fill strings with zeroes to get a unified length. Thats not neccessary.
function radixsortint%[](arr%[])
  local pos%
  local arr2%[arr.length]
  local cnt%[256]
  repeat
    for local a%=0 until arr.length
      local m%=arr[a] shr pos&255 ~$80 'adjust negative numbers
'      local m%=(arr[a]~$80000000) shr pos&255 'adjust negative numbers
      cnt[m]:+1
    next

    for local a%=1 until 256
      cnt[a]:+cnt[a-1]
    next

    for local a%=arr.length-1 to 0 step-1
      local m%=arr[a] shr pos&255 ~$80 'adjust negative numbers
'      local m%=(arr[a]~$80000000) shr pos&255 'adjust negative numbers
      cnt[m]:-1
      arr2[cnt[m]]=arr[a]
    next
    pos:+8
    if (pos>=32) then return arr2
    local swp%[]=arr
    arr=arr2
    arr2=swp

'    fillint int ptr cnt,0,256 'this is my assembler code but...
'    setmem cnt,0,256*4 'would do it as a C function but..
    for local a%=0 until cnt.length 'this does it as well
      cnt[a]=0
    next
  forever
endfunction

function radixsortfloat#[](arr#[])
  local pos%
  local arr2#[arr.length]',swp#[]
  local arr3%ptr=int ptr varptr arr[0]
  local arr4%ptr=int ptr varptr arr2[0]
  local cnt%[256]
  repeat
    for local a%=0 until arr.length
      local m%=-arr3[a] shr pos&255 ~$7f shr (arr3[a] shr 28&$8) 'adjust negative numbers
'                                   ~$7f*(arr3[a]<0)
      cnt[m]:+1
    next

    for local a%=1 until 256
      cnt[a]:+cnt[a-1]
    next

    for local a%=0 until arr.length
      local m%=-arr3[a] shr pos&255 ~$7f shr (arr3[a] shr 28&$8) 'adjust negative numbers
      cnt[m]:-1
      arr4[cnt[m]]=arr3[a]
    next
    pos:+8
    if (pos>=32) then return arr2
    local swp2#[]=arr
    arr=arr2
    arr2=swp2
    arr3=int ptr varptr arr[0]
    arr4=int ptr varptr arr2[0]
'    local swp%ptr=arr3
'    arr3=arr4
'    arr4=swp

'    fillint int ptr cnt,0,256
    for local a%=0 until cnt.length 'this does it as well
      cnt[a]=0
    next
  forever
endfunction

function radixsortstr$[](arr$[],maxlen%=-1) 'maxlen is the same as "pos" in the other function but this name is for better understanding here
  if (maxlen<0) then for local a%=0 until arr.length
    if (maxlen<arr[a].length) then maxlen=arr[a].length
  next
  if not maxlen then return arr
  maxlen:-1
  local arr2$[arr.length]
  local cnt%[256]
  repeat
    for local a%=0 until arr.length
      local m%
      if arr[a] then
        m=arr[a][min(maxlen,arr[a].length-1)]&255
        if (m>=asc"A") and (m<=asc"Z") then m:+asc"a"-asc"A"
      endif
      cnt[m]:+1
    next

    for local a%=1 until 256
      cnt[a]:+cnt[a-1]
    next

    for local a%=0 until arr.length
      local m%
      if arr[a] then
        m=arr[a][min(maxlen,arr[a].length-1)]&255
        if (m>=asc"A") and (m<=asc"Z") then m:+asc"a"-asc"A"
      endif
      cnt[m]:-1
      arr2[cnt[m]]=arr[a]
    next
    maxlen:-1
    if (maxlen<0) then return arr2
    local swp$[]=arr
    arr=arr2
    arr2=swp

'    fillint int ptr cnt,0,256
    for local a%=0 until cnt.length 'this does it as well
      cnt[a]=0
    next
  forever
endfunction
I hope i didnt make any mistakes :D
This was my inital code so you canb sort even bits:
function radixsortinitial%[](arr%[],bits%=8)
  local pos%
  local arr2%[arr.length]',swp%[]
  local mask%=1 shl bits
  local cnt%[mask]
  mask:-1
  repeat
    local ender%
    for local a%=0 until arr.length
      local m%=arr[a] shr pos
      ender:|m
      m:&mask
      cnt[m]:+1
    next
    if not ender then return arr

    for local a%=1 until cnt.length
      cnt[a]:+cnt[a-1]
    next

    for local a%=arr.length-1 to 0 step-1
      local m%=arr[a] shr pos&mask
      cnt[m]:-1
      arr2[cnt[m]]=arr[a]
    next
    pos:+bits
    if (pos>=32) then return arr2
    local swp%[]=arr
    arr=arr2
    arr2=swp
'    arr=arr2

    fillint int ptr cnt,0,cnt.length
  forever
'  until (pos>=32)
endfunction

Midimaster

Complete Runable Example RADIX Sort

Sinjin wrote a awesome function to sort a real big amout of FLOATs in a extrem short time. His  code enables to sort an array of 1,000,000 FLOATs within 45 msec!!!

Here is a runable code example in BlitzMax based on his function:

Code (vb) Select
' Radix-Sort for FLOAT-Variables, PD by Sinjin 2023-05-12
SuperStrict
Global Size:Int = 1000000
Global Raw:Float[Size], Result:Float[]

For Local i:Int=0 Until Size
Raw[i] = Rnd(-1000,1000)
Next

Global time:Int = MilliSecs()
Result = RadixSortFloat( Raw)
time   = MilliSecs() - time
Print "time=" + time

For Local i:Int=0 Until 30
Print Result[i]
Next


Function RadixSortFloat:Float[]( Source:Float[])
Local Pos:Int
Local Result:Float[Source.length]
Repeat
Local SourcePointer:Int Ptr = Varptr Source[0]
Local ResultPointer:Int Ptr = Varptr Result[0]
Local cnt:Int[256]

For Local a:Int=0 Until Source.length
Local m:Int = (SourcePointer[a]~$80000000) Shr pos&255 ~$7f Shr (~SourcePointer[a] Shr 28&$8)
cnt[m]:+1
Next

For Local a:Int=1 Until 256
cnt[a]:+cnt[a-1]
Next

For Local a:Int = Source.length-1 To 0 Step-1
Local m:Int =(SourcePointer[a]~$80000000) Shr pos&255 ~$7f Shr (~SourcePointer[a] Shr 28&$8)
cnt[m]:-1
ResultPointer[cnt[m]] = SourcePointer[a]
Next
Pos:+8
If (pos>=32) Then  Return Result

Local Swap:Float[] = Source
Source = Result
Result = Swap
Forever
End Function
...back from Egypt

Sinjin

Whats wrong with
local m%=-arr3[a] shr pos&255 ~$7f shr (arr3[a] shr 28&$8)Why do you use
SourcePointer[a]~$80000000instead of
-SourcePointer[a]and how do I get rid of the code block here? Its one line :D
Besides its not really Source and Destination/Result, you see its swapped.
NVM I see you just swapped the negative part with the other part ^^
I like my part better so in circumstances you have to swap a bit in a byte, not an integer.
Well, in assembler xor eax,$80000000 might be slower then use just neg eax :D but its ok
Imagine 64 bit so that would be xor rax,$8000000000000000  puh too much for the cpu.
Im not sure if that would even be a valid statement in Asm, could be the compiler has to do that in 32bits in 64 so, mov rbx,rax; shr rax,32; xor eax,$80000000...and so on...you see?

DaiHard

Quote from: Sinjin on May 17, 2023, 23:52:01... and I noticed people seem to fill strings with zeroes to get a unified length...
I think it is so that when you are sorting numbers, small numbers sort before large ones - so "8" sorts before "23" - which it won't, if you just use the first character for your bins.

Derron

Speedwise I think there could be things optimized in Midimasters Blitzmax code:

Local SourcePointer:Int Ptr = Varptr Source ..
-> so an integer pointer to an float array? Is this on purpose?

Local cnt:Int[256]
-> this one is recreated on each "repeat". Maybe creating a memblock outside of the loop and using "memclear" is faster (and has less to do for the GC). Array access could be done similar with the pointers.


Question:
For Local a:Int=1 Until 256
    cnt[a]:+cnt[a-1]
Next
-> can this lead to an overflow of the signed integer limit?


bye
Ron


Midimaster

Quote from: Sinjin on May 18, 2023, 00:22:43...Why do you use ...
SourcePointer[a]~$80000000

I did not recognize, that you improved your algo. So I pasted the version from one week ago from the german blitz forum...
...back from Egypt

Midimaster

Quote from: Derron on May 18, 2023, 08:46:40Local SourcePointer:Int Ptr = Varptr Source ..
-> so an integer pointer to an float array? Is this on purpose?

Yes, that's the kernel idea. I was also skeptical, but it looks like working...

QuoteLocal cnt:Int[256]
-> this one is recreated on each "repeat". Maybe creating a memblock outside of the loop and using "memclear" is faster (and has less to do for the GC). Array access could be done similar with the pointers.

yes he tested memclear, but no significant speed improvement

QuoteQuestion:
For Local a:Int=1 Until 256
    cnt[a]:+cnt[a-1]
Next
-> can this lead to an overflow of the signed integer limit?

No the array is 256 elements, the loop runs between 1 and 255, so a-1 is allowed


To Everybody: Feel free to improve the code and report the optimized algo. You need to beat 41msec (on my system)
...back from Egypt

col

Peeps,

I found this quite interesting so quickly used your example(s) and something appears to be very broken.
Using @Sinjin code I get this output (the asterisks show where there is a break in the sequence when expecting the values go from lower to higher) :
-515.989563
-515.998657 *
-515.968933
-515.970520 *
-515.971558 *
-515.972107 *
-515.977783 *
-515.979187 *
-515.979553 *
-515.981018 *
-515.981506 *
-515.982300 *
-515.983032 *
-515.983948 *
-515.953186
-515.953186
-515.953369 *
-515.953430 *
-515.953491 *
-515.958313 *
-515.959534 *
-515.963684 *
-515.965271 *
-515.966431 *
-515.966858 *
-515.967712 *
-515.967773 *
-515.967834 *
-515.967957 *
-515.968079 *


Using @Midimaster code I get this output
time=38
-997.991089
-997.989258
-997.988770
-997.988281
-997.987488
-997.986694
-997.998474 *
-997.996887
-997.995056
-997.993774
-997.993530
-997.993408
-997.971863
-997.971558
-997.971436
-997.969116
-997.983032 *
-997.981873
-997.980347
-997.980164
-997.979736
-997.977112
-997.960876
-997.956299
-997.953308
-997.953308
-997.967346 *
-997.967285
-997.961914
-997.944092
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Derron

Quote from: Midimaster on May 18, 2023, 09:49:57
Quote from: Derron on May 18, 2023, 08:46:40Question:
For Local a:Int=1 Until 256
    cnt[a]:+cnt[a-1]
Next
-> can this lead to an overflow of the signed integer limit?

No the array is 256 elements, the loop runs between 1 and 255, so a-1 is allowed

I was not talking about the indexes (array out of bounds) but the values.
See "cnt[a]:+cnt[a-1]" - can "cnt[a]" eg be "maximum integer value" - and now you add a value to it - what happens? the integer will "overflow" (wrap over).


bye
Ron

Sinjin

Hmm I didnt test it with very small numbers and so far I didnt implement it anywhere to put it to use really. But true, its not sorting correctly with floats. Maybe you can modify it so that you sort mantissa and exponents seperately? Or fix it somehow else?
Well, and yes, in theory you can get an overflow with large arrays. In Quicksort you can have a stackoverflow too ^^
Wait, no, it cannot overflow. Lets say you have 3 elements and all are..lets say 1. So in cnt[1] is a 3, adding all together is still 3. Lets say you have elements 1,1 and a 2 so cnt[1] would be 2 and cnt[2] would be 1. adding all together would be 3.
Sry my old brain...

Sinjin

I have this code now but now Im not even sure Im alive haha
function radixsortfloat#[](arr#[])
  const bits%=1,mask%=(1 shl bits)-1
  local pos%
  local arr2#[arr.length]',swp#[]
  local arr3%ptr=int ptr varptr arr[0]
  local arr4%ptr=int ptr varptr arr2[0]
  local cnt%[mask+1]
  repeat
    for local a%=0 until arr.length
      local m%=arr3[a] shr pos&mask' ~(mask+1) shr 1*(arr3[a]<0)
'      local m%=-arr3[a] shr pos&255 ~$7f shr (arr3[a] shr 28&$8) 'adjust negative numbers
'      local m%=(arr3[a]~$80000000) shr pos&255 ~$7f shr (~arr3[a] shr 28&$8) 'adjust negative numbers
'      local m%=(arr3[a]~$80000000) shr pos&255 ~$7f*(arr3[a]<0) 'adjust negative numbers
      cnt[m]:+1
    next

    for local a%=1 until mask+1
      cnt[a]:+cnt[a-1]
    next

    for local a%=arr.length-1 to 0 step-1
'    for local a%=0 until arr.length
      local m%=arr3[a] shr pos&mask' ~(mask+1) shr 1*(arr3[a]<0)
'      local m%=-arr3[a] shr pos&255 ~$7f shr (arr3[a] shr 28&$8) 'adjust negative numbers
'      local m%=(arr3[a]~$80000000) shr pos&255 ~$7f shr (~arr3[a] shr 28&$8) 'adjust negative numbers
'      local m%=(arr3[a]~$80000000) shr pos&255 ~$7f*(arr3[a]<0) 'adjust negative numbers
      cnt[m]:-1
      arr4[cnt[m]]=arr3[a]
    next
    pos:+bits
    if (pos>=32) then return arr2
    local swp2#[]=arr
    arr=arr2
    arr2=swp2
    arr3=int ptr varptr arr[0]
    arr4=int ptr varptr arr2[0]

    for local a%=0 to mask
      cnt[a]=0
    next
  forever
endfunction
And yup, this does sort it somehow, im just unsure now how to sort it correctly, I guess sorting just by bits would still be fast. Im glad I didnt put this in showcase, again I myself didnt put it to use yet.

col

I think it would better to deal with the negative numbers separately, as this version doesn't work with negative values.

I use the array that Midimaster used that has a million values ranging from -1000.0 to 1000.0. It's reasonable test data as a starting point.

As Derron correctly notes, it could overflow depending on the values and how many of them there are. This means the data set would need to contain large values and/or a large number of values. It looks like you're assuming that the values are always going to start low, but what if the values are very large to begin with, and with a large number of them.

This is interesting and I'm sure it can made to work correctly for all float values, but as I suggest... dealing with negatives separately may make the algo easier to read and may even offer a small improvement in speed due to not flipping all of the bits in the lower 24bits... is that really required when not dealing with the negative bit? possibly but im not sure at the moment.
Also instead of looping through each value using a single array for each 8 bits, have you tried 4 arrays so as to deal with all 32bits in one pass?
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

col

I thought I'd have go at this and came up with this function. Feel free to optimize (runs on average ~15% faster on my machine at approx 30ms, was ~35ms).

SuperStrict

Local Count:Size_T = 1000000
Local Initial:Float[Count]

For Local i:Size_T=0 Until Initial.length
Initial[i] = Rnd(-1000,1000)
Next

Global TimeTaken:Int = MilliSecs()
Global Result:Float[] = RadixSortFloat(Initial)
TimeTaken = MilliSecs() - TimeTaken
Print "time=" + TimeTaken

For Local i:Size_T=0 Until Min(Result.length, 20)
Print Result[i]
Next

Function RadixSortFloat:Float[](Source:Float[])
Local Result:Float[Source.length]
Local NegativeCount:UInt = 0
Local BitPos:UInt = 0
 
Repeat
Local SourcePtr:UInt Ptr = Varptr Source

Local ResultPtr:UInt Ptr = Varptr Result
Local Counters:UInt[256]

For Local i:Long = 0 Until Source.length
Local Radix:UInt = (SourcePtr[i] Shr BitPos) & $FF
Counters[Radix] :+ 1
If BitPos = 24 NegativeCount :+ (Radix Shr 7)
Next

For Local i:UInt = 1 Until Counters.length
Counters[i] :+ Counters[i - 1]
Next

For Local i:Long = Source.length - 1 To 0 Step -1
Local Radix:UInt = SourcePtr[i] Shr BitPos & $FF
Counters[Radix] :- 1

If BitPos = 24
If Radix Shr 7
Local Index:UInt = Source.length - Counters[Radix] - 1
ResultPtr[Index] = SourcePtr[i]
Else
ResultPtr[Counters[Radix] + NegativeCount] = SourcePtr[i]
EndIf
Else
ResultPtr[Counters[Radix]] = SourcePtr[i]
EndIf
Next

BitPos :+ 8
If BitPos >= 32 Exit

Local Temp:Float[] = Source
Source = Result
Result = Temp
Forever

Return Result
EndFunction
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Derron

@col you might have a look at staticarrays :)

Local Counters:UInt[256]
'changed to
Local staticarray Counters:UInt[256]

And times went down from 20ms to 13ms.
(this is similar to what I proposed to @Midimaster with the memblock+clear to avoid the clutter of initializing an empty array over and over with each repetition)


bye
Ron

col

@Derron 
I forget that Ng has StaticArray :P 
Using StaticArray shaves off another 10% giving me ~27ms for me.
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."