32 Commits

Author SHA1 Message Date
Richard Thier
55583bcb4a mormordsort - buggy version (I actually think its some of the Magyarsort 2.x in this form - but needs fixing 2024-04-11 06:13:51 +02:00
Richard Thier
d0843cbc40 Revert "trying with the restrict keyword - unsuccessful"
This reverts commit 1908687002c085c628853711d26b95e9bc92a1ad.
2022-09-01 02:15:43 +02:00
Richard Thier
1908687002 trying with the restrict keyword - unsuccessful 2022-09-01 02:14:30 +02:00
Richard Thier
7e8aa96a39 duplication counting, word-based radix made possible (commented out), modulus impuit, vectorize makefile example 2022-09-01 01:56:15 +02:00
Richard Thier
ff05bc2688 added if constexpr(..) where it could be 2021-12-20 13:29:47 +01:00
Richard Thier
c77e592a84 mlocks and frewr algorithm both added 2021-12-19 21:55:48 +01:00
Richard Thier
da4d122ee1 more of latest changes - random weird shiiit 2021-12-18 03:49:52 +01:00
Richard Thier
f24b3987c0 improved indirections 2021-12-18 02:34:22 +01:00
Richard Thier
3b413fcba0 removed reference to pointer parameter - a bit better indirections 2021-12-18 02:20:42 +01:00
Richard Thier
298edba5d2 minor unroll 2021-12-18 01:48:42 +01:00
Richard Thier
e7b677e4db basic prefetch optimizations 2021-12-18 01:23:06 +01:00
Richard Thier
e5d4ff74ad more manual unrolls 2021-12-17 23:37:48 +01:00
Richard Thier
645bc19f19 Manual occurence unrolling 2021-12-17 22:48:38 +01:00
Richard Thier
be450086b5 took out prefetch and added commented out pragmas - they not help 2021-12-17 22:09:35 +01:00
Richard Thier
3fdcaad537 trying some prefetch - not that good yet 2021-12-17 21:42:35 +01:00
Richard Thier
0b4eb5e5a6 minor speed tweaks by being able to define the counter type 2021-12-17 21:17:53 +01:00
Richard Thier
a044787846 finally again a real optimization and API for reusal - even faster for non-reused 2021-12-15 03:14:35 +01:00
Richard Thier
3490201420 further optimization - API change however is not a no-cost abstraction as it makes clang slower than original heap variant and g++ albeit faster than original it does not as fast as hardcoded - will investigave API change 2021-12-15 00:43:25 +01:00
Richard Thier
c4ed2994ea Setting up causal profiling with "coz" 2021-12-14 17:29:33 +01:00
Richard Thier
11ceee29a1 minor tweaking for more ILP 2021-12-13 03:48:17 +01:00
Richard Thier
62dcda6bf2 minor tweaks 2021-12-13 02:18:08 +01:00
Richard Thier
68684f7fb0 Implemented ILP and cache optimized simple radix variant - surprisingly good already! 2021-03-13 15:51:24 +01:00
Richard Thier
1d0ba81e49 tried if it works with nibbles too: seems like easier to debug actually in this mode 2021-03-11 23:20:03 +01:00
Richard Thier
151b8f398b Likely better ILP and no manual digit counts in code 2021-03-11 23:13:53 +01:00
Richard Thier
22e80d4cd5 indent 2021-03-11 22:40:37 +01:00
Richard Thier
f30b5056cc noexcept 2021-03-11 22:39:53 +01:00
Richard Thier
83ae455c34 rename 2021-03-11 22:38:23 +01:00
Richard Thier
cfc9a050e4 removed manual digit usage by recursive template trickz 2021-03-11 22:34:44 +01:00
Richard Thier
c7fe2f0507 little refactor to maybe avoid manual digit misery 2021-03-11 22:19:29 +01:00
Richard Thier
e076ab662b prefix sum 2021-03-11 22:06:50 +01:00
Richard Thier
f8d4f597c6 fix dumb mistakes 2021-03-11 21:38:06 +01:00
Richard Thier
33910b7e50 project init 2021-03-11 21:23:50 +01:00