179 Commits

Author SHA1 Message Date
Richard Thier
e7a4f24a87 re-enabled 4pasu for upcoming youtube-ing 2025-10-02 18:18:26 +02:00
Richard Thier
66376651a3 Revert "thier3: tricky rotation based state storing..."
This reverts commit 1d1f151c0730314ee4370eb288bf1f8c09824b02.
2025-10-02 08:09:57 +02:00
Richard Thier
74e24486f4 Revert "tiny (hopefully) optimization for tpxb"
This reverts commit 2aa7de0d40bca5c8713c5edb41a2ff3995a2ea01.
2025-10-02 08:09:44 +02:00
Richard Thier
2aa7de0d40 tiny (hopefully) optimization for tpxb 2025-10-02 08:09:27 +02:00
Richard Thier
ce121571ca miore stuff in makefile in case of releases (-march and -fschedule-insns) 2025-10-02 08:08:58 +02:00
Richard Thier
1d1f151c07 thier3: tricky rotation based state storing... 2025-10-02 05:48:24 +02:00
Richard Thier
7ef63734a1 magyarsort: comment about GC totally not working in my opinion 2025-10-02 04:52:13 +02:00
Richard Thier
12431f229e rthier randomized only above threshold 2025-10-02 04:51:33 +02:00
Richard Thier
9b9997cbdb Revert "simpler occurence template"
This reverts commit d487bb111b93f4ab186147fd876373f46eff0e59.
2025-10-02 02:28:54 +02:00
Richard Thier
d487bb111b simpler occurence template 2025-10-02 02:28:46 +02:00
Richard Thier
b5aeaa1bdb added frewr comment - because it becames fastest in oct 1 - 2025 2025-10-01 19:09:18 +02:00
Richard Thier
fb0b8ce255 added licences - this is first commit that I will push upstream online to my gitea local repo! 2025-10-01 17:26:18 +02:00
Richard Thier
27873e06fe 7 relative randomization + a full random / 8 element 2025-10-01 17:12:58 +02:00
Richard Thier
603e689de7 added various shell script helpers 2025-10-01 16:49:30 +02:00
Richard Thier
7d407000fe added pre-randomized sorts (not so great so far - probably too much cache misses) 2025-10-01 16:49:00 +02:00
Richard Thier
d43b55f065 thier3: added mlock/munlock for array and its temporary (you can turn this off) 2025-10-01 04:36:18 +02:00
Richard Thier
ccdf991824 Revert "tpxb: 16-wide manual unroll - but it does not seem to be faster"
This reverts commit 6d794612624b445c8e4dae4ea3ee3b42b6a4c92f.
2025-10-01 04:26:44 +02:00
Richard Thier
100de9bc67 Revert "32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x"
This reverts commit 18b734a6e70e989168c94d051bf2da5c08560790.
2025-10-01 04:26:32 +02:00
Richard Thier
18b734a6e7 32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x 2025-10-01 04:18:04 +02:00
Richard Thier
6d79461262 tpxb: 16-wide manual unroll - but it does not seem to be faster 2025-10-01 04:02:08 +02:00
Richard Thier
036725611b removed non-temporal writes as too random patterns for it 2025-10-01 03:24:08 +02:00
Richard Thier
a16917830f add back "make release_ypsu_noinline_debug_sym" for flamegraphs 2025-10-01 02:37:56 +02:00
Richard Thier
0beb389c50 Revert "more uint->int, but these seem to make it slower a bit so will be reverted"
This reverts commit ef9e4f799b4f73e9319264a82fc89e885ef455ac.
2025-10-01 02:16:42 +02:00
Richard Thier
ef9e4f799b more uint->int, but these seem to make it slower a bit so will be reverted 2025-10-01 02:16:34 +02:00
Richard Thier
478d87e148 bugfix: remaining 4095 in code after mass changing 4096 to 256 2025-10-01 02:07:10 +02:00
Richard Thier
31dd239ad3 Revert "thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways"
This reverts commit 808b87f266b2ce8a058b94d9183d100362abe1b4.
2025-10-01 02:06:23 +02:00
Richard Thier
808b87f266 thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways 2025-10-01 02:06:14 +02:00
Richard Thier
c032109110 Revert "tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial"
This reverts commit 5ecb48815b57c51527f2c55c3555fb40ffe48f6b.
2025-10-01 01:53:38 +02:00
Richard Thier
5ecb48815b tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial 2025-10-01 01:53:28 +02:00
Richard Thier
98222d4494 tpxb: tried non-temporal writes (bad for random writes) 2025-10-01 01:28:49 +02:00
Richard Thier
22ec030116 thier3: micro-optimized some of the unrolls 2025-10-01 01:00:07 +02:00
Richard Thier
69d1432721 thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary 2025-10-01 00:34:08 +02:00
Richard Thier
1f6ef0f2ea thier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses) 2025-10-01 00:29:32 +02:00
Richard Thier
45820cf81c re-added FlameGraph submodule 2025-09-30 22:22:30 +02:00
Richard Thier
08cb90bb1b Revert "prepared for flame graph analysis"
This reverts commit ac873f7123c0dd23ff9d73668e005c71944a8afa.
2025-09-30 22:18:10 +02:00
Richard Thier
a849b01fa8 Reapply "adds cache_miss_flamegraph.sh"
This reverts commit 2a507e9f54e8478a2aa0fd2116e98c2aeb5579bd.
2025-09-30 22:17:53 +02:00
Richard Thier
2a507e9f54 Revert "adds cache_miss_flamegraph.sh"
This reverts commit 78266ef34577eaf88ff5507e4a10e3ba2459bfe8.
2025-09-30 22:17:38 +02:00
Richard Thier
52fc14b0f6 Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses"
This reverts commit 967c7c19b54fd0db820bbfa1cbe199a8ac9f5419.
2025-09-30 22:17:30 +02:00
Richard Thier
967c7c19b5 thier3: write caching queues fixed - bug just makes it slower despite less cache misses 2025-09-30 22:12:22 +02:00
Richard Thier
78266ef345 adds cache_miss_flamegraph.sh 2025-09-30 17:22:19 +02:00
Richard Thier
ac873f7123 prepared for flame graph analysis 2025-09-30 17:19:47 +02:00
Richard Thier
da0c024a32 add results/ 2025-09-30 13:53:52 +02:00
Richard Thier
0a199b9d72 Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert"
This reverts commit 523605e8d841733d7c398131ea50e356b35b88e3.
2025-09-29 18:52:02 +02:00
Richard Thier
523605e8d8 hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert 2025-09-29 18:51:53 +02:00
Richard Thier
a5cb0995e3 added missing headers for thiersort3 2025-09-29 18:21:16 +02:00
Richard Thier
7ca9a19c5d tests for thier3 - works and very fast 2025-09-29 18:18:37 +02:00
Richard Thier
86f81d2a1c minor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc) 2025-09-29 03:31:06 +02:00
Richard Thier
a17b284c8a added three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2? 2025-09-29 02:24:50 +02:00
Richard Thier
f4e4db43f9 4096-wise thiersort2 2025-09-27 01:43:55 +02:00
Richard Thier
76001efd98 2048-wise thiersort2 2025-09-27 01:41:52 +02:00