prenex/magyarsort

Fork 0

Richard Thier 0521ddd52d added neargoodsort idea (and some merge space optimization ideas that I think are known)

2023-07-20 21:09:57 +02:00

7.1 KiB

Raw Blame History

Sorting for "nearly sorted data"

Algorithm:

Go over the data and like in Stalin-sort keep only those who are in order
BUT: Unlike stalin-sort we partition!
in[], outs[], outns[]
The in is the input array
The outs is the "sorted part" of the separation (Stalin would keep them)
The outns is the "outliers" part of the separation (Stalin would kill them)
Use the same algorithm to recursively sort the outns part
Use the merge sort's merge algoritm to merge outs[] and outns[] back into in[]

This works because we know for sure that outs has at least a single element!

When it only has one element we get worst case O(n^2) runtime!

When the data is nearly sorted, we get nearly O(n) runtime!

Can be used to "keep an array/list sorted" with an "update" method on it that iterates over and update pos/key similar to kismap.

Idea: decide if we go from top or bottom based on which is smaller - hopefully mitigates worst case being descending case!

Example

------------------------- Split 0
in0:
3 7 5 8 9 5 8 9 5 9 9 3 1

outs0:
3 7 8 9 9 9 9
outns0:
5 5 8 5 3 1
------------------------- Split 1
in1:
5 5 8 5 3 1

outs1:
5 5 8
outns1:
5 3 1
------------------------- Split 2
in2:
5 3 1

outs2:
5
outns2:
3 1
------------------------- Split 3
in3:
3 1

outs2:
3
outns2:
1
------------------------- Merge 3
outs2:
3
outns2:
1

in3 [merge-out]:
1 3
------------------------- Merge 2
outs2:
5
outns2:
1 3      == in3

in2 [merge-out]:
1 3 5
------------------------- Merge 1
outs1:
5 5 8
outns1:
1 3 5    == in2

in1 [merge-out]:
1 3 5 5 5 8
------------------------- Merge 0
outs0:
3 7 8 9 9 9 9
outns0:
1 3 5 5 5 8  == in1

in0:
1 3 3 5 5 5 7 8 8 9 9 9 9

Which is - as you can see the sort result of the input array!

3 7 5 8 9 5 8 9 5 9 9 3 1

Time and space analysis

On random data this sounds to be close to the O(n*logn) amortized runtime statistically I think but did not go after it.

On the worst case its clearly O(n^2) because we always just get a single element to outsi means that...

Space analysis is roughly same as the non-optimized merge sort - see below for space optimized merge steps - maybe useful for this to!

A random bad inplace-merge idea

Example

Lets say we have this two lists

1 3 3 5 7 9
2 3 4 5 6 7

But represented in the same array, partitioned into two parts:

1 3 3 5 7 9|2 3 4 5 6 7

We can go with two pointers and try to make this work with SWAPs:

1 3 3 5 7 9|2 3 4 5 6 7
^           ^         ~

Where: swap* means swap element on left with right, but on the right list put it in its right place (binary search + memcpy)

Maybe: The second part should be heapified! Then we can get log(n) pop&insert, but issue is then it does not stay sorted :-(

Runtime: O(n^2) worst case which is extreme slow...

Rem.: Likely swap + bubble is better here for the second side...

Better, but still slow random inplace merge idea

1 3 3 5 7 9|2 3 4 5 6 7
^           ^
    (<=)
1 3 3 5 7 9|2 3 4 5 6 7
  ^           ^
    (<=)
1 3 3 5 7 9|2 3 4 5 6 7
    ^           ^
    (<=)
1 3 3 5 7 9|2 3 4 5 6 7
      ^           ^
    (<=)
1 3 3 5 7 9|2 3 4 5 6 7
        ^           ^
    (>)
1 3 3 5 6 9|2 3 4 5 7 7
          ^           ^
    (>)
1 3 3 5 6 7|2 3 4 5 7 9
            ^           ^
    (!!)
1 3 3 5 6 7|2 3 4 5 7 9
^ !         ^
    (logsearch: ~)
1 3 3 5 6 7|2 3 4 5 7 9
^ ^         ^ ~
    (tmpvec)
1 3 3 5 6 7|. . 4 5 7 9
^ ^         ^ ~
    tmp: 2 3
    (memcpy)
1 . . 3 3 5|6 7 4 5 7 9
^ ^         ^ ~
    tmp: 2 3
    (backwrite)
1 2 3 3 3 5 6 7|4 5 7 9
      ^ ^       ^
    tmp: nil
    (not(3 <= 4 < 3))
1 2 3 3 3 5 6 7|4 5 7 9
        ^ ^     ^
    tmp: nil
    (logsearch: ~)
    (not(3 <= 4 < 3))
1 2 3 3 3 5 6 7|4 5 7 9
        ^ ^     ^ ~
    (tmpvec)
1 2 3 3 3 5 6 7|. . 7 9
        ^ ^     ^ ~
    tmp: 4 5
    (memcpy)
1 2 3 3 3 . . 5|6 7 7 9
        ^ ^     ^ ~
    tmp: 4 5
    (backwrite)
1 2 3 3 3 4 5 5|6 7|7 9
          ^ ^     ^ ~
    tmp: nil

    (not(3 <= 4 < 3))
    (not(3 <= 4 < 3))
    (not(3 <= 4 < 3))
[END]

This sounds like O(nlogn) for the merge operation - which would make a merge sort slower than nlog*n still, but not so bad as above

This is not totally in-place because can use worst case a lot of mem, but averagely less than regular merge

But just using n/2 element tmp array for "regular" alg works if you think about it so not sure if beating that one...

Doing n/2 element tmp array

From: arr: 1 3 3 5 7 9|2 3 4 5 6 7

To: arr: . . . . . .|2 3 4 5 6 7 tmp: 1 3 3 5 7 9

And then we just always pick the smaller between the two piecewise: _ arr: . . . . . .|2 3 4 5 6 7 tmp: 1 3 3 5 7 9 ^ ^ _ arr: 1 . . . . .|2 3 4 5 6 7 tmp: . 3 3 5 7 9 ^ ^ _ arr: 1 2 . . . .|. 3 4 5 6 7 tmp: . 3 3 5 7 9 ^ ^ (rem.: tmp is preferred to keep order of elements unchanged for same keys!) _ arr: 1 2 3 . . .|. 3 4 5 6 7 tmp: . . 3 5 7 9 ^ ^ (rem.: tmp is preferred to keep order of elements unchanged for same keys!) _ arr: 1 2 3 3 . .|. 3 4 5 6 7 tmp: . . . 5 7 9 ^ ^ _ arr: 1 2 3 3 3 .|. . 4 5 6 7 tmp: . . . 5 7 9 ^ ^ _ arr: 1 2 3 3 3 4|. . . 5 6 7 tmp: . . . 5 7 9 ^ ^ (rem.: tmp is preferred to keep order of elements unchanged for same keys!) _ arr: 1 2 3 3 3 4|5 . . 5 6 7 tmp: . . . . 7 9 ^ ^ _ arr: 1 2 3 3 3 4|5 5 . . 6 7 tmp: . . . . 7 9 ^ ^ _ arr: 1 2 3 3 3 4|5 5 6 . . 7 tmp: . . . . 7 9 ^ ^ (rem.: tmp is preferred to keep order of elements unchanged for same keys!) _ arr: 1 2 3 3 3 4|5 5 6 7 . 7 tmp: . . . . . 9 ^ ^ _ arr: 1 2 3 3 3 4|5 5 6 7 7 . tmp: . . . . . 9 ^ ^ _ arr: 1 2 3 3 3 4|5 5 6 7 7 9 tmp: . . . . . . ^ ^

And this ends the merge algorithm!

7.1 KiB Raw Blame History

Sorting for "nearly sorted data"

Algorithm:

This works because we know for sure that outs has at least a single element!

When it only has one element we get worst case O(n^2) runtime!

When the data is nearly sorted, we get nearly O(n) runtime!

Can be used to "keep an array/list sorted" with an "update" method on it that iterates over and update pos/key similar to kismap.

Idea: decide if we go from top or bottom based on which is smaller - hopefully mitigates worst case being descending case!

Example

Time and space analysis

A random bad inplace-merge idea

Example

Where: swap* means swap element on left with right, but on the right list put it in its right place (binary search + memcpy)

Maybe: The second part should be heapified! Then we can get log(n) pop&insert, but issue is then it does not stay sorted :-(

Runtime: O(n^2) worst case which is extreme slow...

Rem.: Likely swap + bubble is better here for the second side...

Better, but still slow random inplace merge idea

This sounds like O(nlogn) for the merge operation - which would make a merge sort slower than nlog*n still, but not so bad as above

This is not totally in-place because can use worst case a lot of mem, but averagely less than regular merge

But just using n/2 element tmp array for "regular" alg works if you think about it so not sure if beating that one...

Doing n/2 element tmp array

7.1 KiB

Raw Blame History