Richard Thier
|
d0fa5c5b48
|
simplification + fixing right side of array not sorted because -1 counts
|
2022-08-16 12:25:15 +02:00 |
|
Richard Thier
|
6073c03f81
|
temporarily do naive array separate code for debugging
|
2022-08-16 12:16:31 +02:00 |
|
Richard Thier
|
fbea1e607c
|
factored out internal_array_separate(..) to check if it has the errors or not
|
2022-08-16 11:44:34 +02:00 |
|
Richard Thier
|
680936f50a
|
still buggy sp code but differently...
|
2022-08-16 04:13:12 +02:00 |
|
Richard Thier
|
e83392ebaa
|
added "sp" to tests - buggy for now, but at least in const works and inc nearly ok
|
2022-08-16 03:29:08 +02:00 |
|
Richard Thier
|
fad7345a80
|
space partitioning sort first - buggy, but neargood - versions
|
2022-08-16 03:28:06 +02:00 |
|
Richard Thier
|
ee08930cae
|
add -DNDEBUG
|
2022-08-15 23:02:30 +02:00 |
|
Richard Thier
|
ec0f73af01
|
remove session.vim
|
2022-08-15 23:01:55 +02:00 |
|
Richard Thier
|
a2ee3cdb8c
|
make: better makefile
|
2022-01-25 20:04:25 +01:00 |
|
Richard Thier
|
ff05bc2688
|
added if constexpr(..) where it could be
|
2021-12-20 13:29:47 +01:00 |
|
Richard Thier
|
d858f39708
|
Merge branch 'tmp' into ilp-radix-1
|
2021-12-19 22:53:09 +01:00 |
|
Richard Thier
|
c77e592a84
|
mlocks and frewr algorithm both added
|
2021-12-19 21:55:48 +01:00 |
|
|
|
efa2c7bc26
|
old 2007 laptop linux results
|
2021-12-18 20:44:03 +01:00 |
|
|
|
1658e5abbe
|
fine-tuning on linux laptop (just parameters)
|
2021-12-18 20:41:30 +01:00 |
|
Richard Thier
|
a4d50c3309
|
input reduction for testing on less capable machines
|
2021-12-18 19:54:14 +01:00 |
|
Richard Thier
|
da4d122ee1
|
more of latest changes - random weird shiiit
|
2021-12-18 03:49:52 +01:00 |
|
Richard Thier
|
f24b3987c0
|
improved indirections
|
2021-12-18 02:34:22 +01:00 |
|
Richard Thier
|
3b413fcba0
|
removed reference to pointer parameter - a bit better indirections
|
2021-12-18 02:20:42 +01:00 |
|
Richard Thier
|
298edba5d2
|
minor unroll
|
2021-12-18 01:48:42 +01:00 |
|
Richard Thier
|
e7b677e4db
|
basic prefetch optimizations
|
2021-12-18 01:23:06 +01:00 |
|
Richard Thier
|
e5d4ff74ad
|
more manual unrolls
|
2021-12-17 23:37:48 +01:00 |
|
Richard Thier
|
645bc19f19
|
Manual occurence unrolling
|
2021-12-17 22:48:38 +01:00 |
|
Richard Thier
|
be450086b5
|
took out prefetch and added commented out pragmas - they not help
|
2021-12-17 22:09:35 +01:00 |
|
Richard Thier
|
3fdcaad537
|
trying some prefetch - not that good yet
|
2021-12-17 21:42:35 +01:00 |
|
Richard Thier
|
0b4eb5e5a6
|
minor speed tweaks by being able to define the counter type
|
2021-12-17 21:17:53 +01:00 |
|
Richard Thier
|
1686967f10
|
minor tweaks to 4pasu and added 4rot
|
2021-12-17 19:20:58 +01:00 |
|
Richard Thier
|
a878f20100
|
ypsus 4passu method optimized a bit
|
2021-12-15 16:09:40 +01:00 |
|
Richard Thier
|
a947cda58d
|
Revert "vsort version that got slower, but is really funny template code"
This reverts commit fd35dbc51b63fa97ff5a9d7a823cdfa271b99a43.
|
2021-12-15 14:48:27 +01:00 |
|
Richard Thier
|
fd35dbc51b
|
vsort version that got slower, but is really funny template code
|
2021-12-15 14:48:14 +01:00 |
|
Richard Thier
|
bff96c8f7f
|
upgraded vsort a bit (50-100ms)
|
2021-12-15 12:53:00 +01:00 |
|
Richard Thier
|
520db7049d
|
added ypsu-variants of radix-like things
|
2021-12-15 12:52:33 +01:00 |
|
Richard Thier
|
a044787846
|
finally again a real optimization and API for reusal - even faster for non-reused
|
2021-12-15 03:14:35 +01:00 |
|
Richard Thier
|
3490201420
|
further optimization - API change however is not a no-cost abstraction as it makes clang slower than original heap variant and g++ albeit faster than original it does not as fast as hardcoded - will investigave API change
|
2021-12-15 00:43:25 +01:00 |
|
Richard Thier
|
a1d6e96f5a
|
back to regular perf / measure run
|
2021-12-14 17:32:43 +01:00 |
|
Richard Thier
|
05235e269f
|
added simd-sort - basically the whole repo, but I haxed-in magyarsort as measure
|
2021-12-14 17:30:07 +01:00 |
|
Richard Thier
|
c4ed2994ea
|
Setting up causal profiling with "coz"
|
2021-12-14 17:29:33 +01:00 |
|
Richard Thier
|
675b90c0d8
|
make: release_debug_sym for better perfing
|
2021-12-13 04:20:05 +01:00 |
|
Richard Thier
|
11ceee29a1
|
minor tweaking for more ILP
|
2021-12-13 03:48:17 +01:00 |
|
Richard Thier
|
c2fc962766
|
test can now be used with perf valgrind --cachegrind and such tools
|
2021-12-13 03:48:01 +01:00 |
|
Richard Thier
|
bcdb905748
|
added better test by rlblaster / ypsu / kbalazs
|
2021-12-13 02:30:12 +01:00 |
|
Richard Thier
|
62dcda6bf2
|
minor tweaks
|
2021-12-13 02:18:08 +01:00 |
|
Richard Thier
|
76ba29018d
|
tweak: added ska_sort for measuring against because Ypsu told me about it... mixed results on my machine (on small numbers below 100 mine wins always, above it is really mixed and close)
|
2021-12-13 00:51:26 +01:00 |
|
Richard Thier
|
860cc4e702
|
added some result measurements - why image? will likely look better on git when someone shares it on social media if I add to readme...
|
2021-03-13 17:11:34 +01:00 |
|
Richard Thier
|
68684f7fb0
|
Implemented ILP and cache optimized simple radix variant - surprisingly good already!
|
2021-03-13 15:51:24 +01:00 |
|
Richard Thier
|
4199393153
|
make: c++14 and -O2
|
2021-03-13 11:12:29 +01:00 |
|
Richard Thier
|
1d0ba81e49
|
tried if it works with nibbles too: seems like easier to debug actually in this mode
|
2021-03-11 23:20:03 +01:00 |
|
Richard Thier
|
151b8f398b
|
Likely better ILP and no manual digit counts in code
|
2021-03-11 23:13:53 +01:00 |
|
Richard Thier
|
22e80d4cd5
|
indent
|
2021-03-11 22:40:37 +01:00 |
|
Richard Thier
|
f30b5056cc
|
noexcept
|
2021-03-11 22:39:53 +01:00 |
|
Richard Thier
|
83ae455c34
|
rename
|
2021-03-11 22:38:23 +01:00 |
|