277 lines
7.1 KiB
Markdown
277 lines
7.1 KiB
Markdown
|
|
# Sorting for "nearly sorted data"
|
||
|
|
|
||
|
|
## Algorithm:
|
||
|
|
|
||
|
|
* Go over the data and like in Stalin-sort keep only those who are in order
|
||
|
|
* BUT: Unlike stalin-sort we partition!
|
||
|
|
* in[], outs[], outns[]
|
||
|
|
* The in is the input array
|
||
|
|
* The outs is the "sorted part" of the separation (Stalin would keep them)
|
||
|
|
* The outns is the "outliers" part of the separation (Stalin would kill them)
|
||
|
|
* Use the same algorithm to recursively sort the outns part
|
||
|
|
* Use the merge sort's merge algoritm to merge outs[] and outns[] back into in[]
|
||
|
|
|
||
|
|
## This works because we know for sure that outs has at least a single element!
|
||
|
|
## When it only has one element we get worst case O(n^2) runtime!
|
||
|
|
## When the data is nearly sorted, we get nearly O(n) runtime!
|
||
|
|
## Can be used to "keep an array/list sorted" with an "update" method on it that iterates over and update pos/key similar to kismap.
|
||
|
|
## Idea: decide if we go from top or bottom based on which is smaller - hopefully mitigates worst case being descending case!
|
||
|
|
|
||
|
|
## Example
|
||
|
|
|
||
|
|
------------------------- Split 0
|
||
|
|
in0:
|
||
|
|
3 7 5 8 9 5 8 9 5 9 9 3 1
|
||
|
|
|
||
|
|
outs0:
|
||
|
|
3 7 8 9 9 9 9
|
||
|
|
outns0:
|
||
|
|
5 5 8 5 3 1
|
||
|
|
------------------------- Split 1
|
||
|
|
in1:
|
||
|
|
5 5 8 5 3 1
|
||
|
|
|
||
|
|
outs1:
|
||
|
|
5 5 8
|
||
|
|
outns1:
|
||
|
|
5 3 1
|
||
|
|
------------------------- Split 2
|
||
|
|
in2:
|
||
|
|
5 3 1
|
||
|
|
|
||
|
|
outs2:
|
||
|
|
5
|
||
|
|
outns2:
|
||
|
|
3 1
|
||
|
|
------------------------- Split 3
|
||
|
|
in3:
|
||
|
|
3 1
|
||
|
|
|
||
|
|
outs2:
|
||
|
|
3
|
||
|
|
outns2:
|
||
|
|
1
|
||
|
|
------------------------- Merge 3
|
||
|
|
outs2:
|
||
|
|
3
|
||
|
|
outns2:
|
||
|
|
1
|
||
|
|
|
||
|
|
in3 [merge-out]:
|
||
|
|
1 3
|
||
|
|
------------------------- Merge 2
|
||
|
|
outs2:
|
||
|
|
5
|
||
|
|
outns2:
|
||
|
|
1 3 == in3
|
||
|
|
|
||
|
|
in2 [merge-out]:
|
||
|
|
1 3 5
|
||
|
|
------------------------- Merge 1
|
||
|
|
outs1:
|
||
|
|
5 5 8
|
||
|
|
outns1:
|
||
|
|
1 3 5 == in2
|
||
|
|
|
||
|
|
in1 [merge-out]:
|
||
|
|
1 3 5 5 5 8
|
||
|
|
------------------------- Merge 0
|
||
|
|
outs0:
|
||
|
|
3 7 8 9 9 9 9
|
||
|
|
outns0:
|
||
|
|
1 3 5 5 5 8 == in1
|
||
|
|
|
||
|
|
in0:
|
||
|
|
1 3 3 5 5 5 7 8 8 9 9 9 9
|
||
|
|
|
||
|
|
Which is - as you can see the sort result of the input array!
|
||
|
|
|
||
|
|
3 7 5 8 9 5 8 9 5 9 9 3 1
|
||
|
|
|
||
|
|
## Time and space analysis
|
||
|
|
|
||
|
|
On random data this sounds to be close to the O(n*logn) amortized runtime statistically I think but did not go after it.
|
||
|
|
|
||
|
|
On the worst case its clearly O(n^2) because we always just get a single element to outsi means that...
|
||
|
|
|
||
|
|
Space analysis is roughly same as the non-optimized merge sort - see below for space optimized merge steps - maybe useful for this to!
|
||
|
|
|
||
|
|
|
||
|
|
# A random bad inplace-merge idea
|
||
|
|
|
||
|
|
## Example
|
||
|
|
|
||
|
|
Lets say we have this two lists
|
||
|
|
|
||
|
|
1 3 3 5 7 9
|
||
|
|
2 3 4 5 6 7
|
||
|
|
|
||
|
|
But represented in the same array, partitioned into two parts:
|
||
|
|
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
|
||
|
|
We can go with two pointers and try to make this work with SWAPs:
|
||
|
|
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^ ~
|
||
|
|
(noswap)
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(swap*)
|
||
|
|
1 2 3 5 7 9|3 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(noswap)
|
||
|
|
1 2 3 5 7 9|3 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(swap*)
|
||
|
|
1 2 3 3 7 9|3 4 5 5 6 7
|
||
|
|
^ ^
|
||
|
|
(swap*)
|
||
|
|
1 2 3 3 3 9|4 5 5 6 7 7
|
||
|
|
^ ^
|
||
|
|
(swap*)
|
||
|
|
1 2 3 3 3 4|5 5 6 7 7 9
|
||
|
|
^ ^
|
||
|
|
## Where: swap* means swap element on left with right, but on the right list put it in its right place (binary search + memcpy)
|
||
|
|
## Maybe: The second part should be heapified! Then we can get log(n) pop&insert, but issue is then it does not stay sorted :-(
|
||
|
|
## Runtime: O(n^2) worst case which is extreme slow...
|
||
|
|
## Rem.: Likely swap + bubble is better here for the second side...
|
||
|
|
|
||
|
|
# Better, but still slow random inplace merge idea
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(<=)
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(<=)
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(<=)
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(<=)
|
||
|
|
1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
^ ^
|
||
|
|
(>)
|
||
|
|
1 3 3 5 6 9|2 3 4 5 7 7
|
||
|
|
^ ^
|
||
|
|
(>)
|
||
|
|
1 3 3 5 6 7|2 3 4 5 7 9
|
||
|
|
^ ^
|
||
|
|
(!!)
|
||
|
|
1 3 3 5 6 7|2 3 4 5 7 9
|
||
|
|
^ ! ^
|
||
|
|
(logsearch: ~)
|
||
|
|
1 3 3 5 6 7|2 3 4 5 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
(tmpvec)
|
||
|
|
1 3 3 5 6 7|. . 4 5 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
tmp: 2 3
|
||
|
|
(memcpy)
|
||
|
|
1 . . 3 3 5|6 7 4 5 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
tmp: 2 3
|
||
|
|
(backwrite)
|
||
|
|
1 2 3 3 3 5 6 7|4 5 7 9
|
||
|
|
^ ^ ^
|
||
|
|
tmp: nil
|
||
|
|
(not(3 <= 4 < 3))
|
||
|
|
1 2 3 3 3 5 6 7|4 5 7 9
|
||
|
|
^ ^ ^
|
||
|
|
tmp: nil
|
||
|
|
(logsearch: ~)
|
||
|
|
(not(3 <= 4 < 3))
|
||
|
|
1 2 3 3 3 5 6 7|4 5 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
(tmpvec)
|
||
|
|
1 2 3 3 3 5 6 7|. . 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
tmp: 4 5
|
||
|
|
(memcpy)
|
||
|
|
1 2 3 3 3 . . 5|6 7 7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
tmp: 4 5
|
||
|
|
(backwrite)
|
||
|
|
1 2 3 3 3 4 5 5|6 7|7 9
|
||
|
|
^ ^ ^ ~
|
||
|
|
tmp: nil
|
||
|
|
|
||
|
|
(not(3 <= 4 < 3))
|
||
|
|
(not(3 <= 4 < 3))
|
||
|
|
(not(3 <= 4 < 3))
|
||
|
|
[END]
|
||
|
|
|
||
|
|
## This sounds like O(n*logn) for the merge operation - which would make a merge sort slower than n*log*n still, but not so bad as above
|
||
|
|
## This is not totally in-place because can use worst case a lot of mem, but averagely less than regular merge
|
||
|
|
## But just using n/2 element tmp array for "regular" alg works if you think about it so not sure if beating that one...
|
||
|
|
|
||
|
|
# Doing n/2 element tmp array
|
||
|
|
|
||
|
|
From:
|
||
|
|
arr: 1 3 3 5 7 9|2 3 4 5 6 7
|
||
|
|
|
||
|
|
To:
|
||
|
|
arr: . . . . . .|2 3 4 5 6 7
|
||
|
|
tmp: 1 3 3 5 7 9
|
||
|
|
|
||
|
|
And then we just always pick the smaller between the two piecewise:
|
||
|
|
_
|
||
|
|
arr: . . . . . .|2 3 4 5 6 7
|
||
|
|
tmp: 1 3 3 5 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 . . . . .|2 3 4 5 6 7
|
||
|
|
tmp: . 3 3 5 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 . . . .|. 3 4 5 6 7
|
||
|
|
tmp: . 3 3 5 7 9 ^
|
||
|
|
^
|
||
|
|
(rem.: tmp is preferred to keep order of elements unchanged for same keys!)
|
||
|
|
_
|
||
|
|
arr: 1 2 3 . . .|. 3 4 5 6 7
|
||
|
|
tmp: . . 3 5 7 9 ^
|
||
|
|
^
|
||
|
|
(rem.: tmp is preferred to keep order of elements unchanged for same keys!)
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 . .|. 3 4 5 6 7
|
||
|
|
tmp: . . . 5 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 .|. . 4 5 6 7
|
||
|
|
tmp: . . . 5 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|. . . 5 6 7
|
||
|
|
tmp: . . . 5 7 9 ^
|
||
|
|
^
|
||
|
|
(rem.: tmp is preferred to keep order of elements unchanged for same keys!)
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 . . 5 6 7
|
||
|
|
tmp: . . . . 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 5 . . 6 7
|
||
|
|
tmp: . . . . 7 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 5 6 . . 7
|
||
|
|
tmp: . . . . 7 9 ^
|
||
|
|
^
|
||
|
|
(rem.: tmp is preferred to keep order of elements unchanged for same keys!)
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 5 6 7 . 7
|
||
|
|
tmp: . . . . . 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 5 6 7 7 .
|
||
|
|
tmp: . . . . . 9 ^
|
||
|
|
^
|
||
|
|
_
|
||
|
|
arr: 1 2 3 3 3 4|5 5 6 7 7 9
|
||
|
|
tmp: . . . . . . ^
|
||
|
|
^
|
||
|
|
|
||
|
|
And this ends the merge algorithm!
|