Richard Thier
|
74e24486f4
|
Revert "tiny (hopefully) optimization for tpxb"
This reverts commit 2aa7de0d40bca5c8713c5edb41a2ff3995a2ea01.
|
2025-10-02 08:09:44 +02:00 |
|
Richard Thier
|
2aa7de0d40
|
tiny (hopefully) optimization for tpxb
|
2025-10-02 08:09:27 +02:00 |
|
Richard Thier
|
ce121571ca
|
miore stuff in makefile in case of releases (-march and -fschedule-insns)
|
2025-10-02 08:08:58 +02:00 |
|
Richard Thier
|
1d1f151c07
|
thier3: tricky rotation based state storing...
|
2025-10-02 05:48:24 +02:00 |
|
Richard Thier
|
7ef63734a1
|
magyarsort: comment about GC totally not working in my opinion
|
2025-10-02 04:52:13 +02:00 |
|
Richard Thier
|
12431f229e
|
rthier randomized only above threshold
|
2025-10-02 04:51:33 +02:00 |
|
Richard Thier
|
9b9997cbdb
|
Revert "simpler occurence template"
This reverts commit d487bb111b93f4ab186147fd876373f46eff0e59.
|
2025-10-02 02:28:54 +02:00 |
|
Richard Thier
|
d487bb111b
|
simpler occurence template
|
2025-10-02 02:28:46 +02:00 |
|
Richard Thier
|
b5aeaa1bdb
|
added frewr comment - because it becames fastest in oct 1 - 2025
|
2025-10-01 19:09:18 +02:00 |
|
Richard Thier
|
fb0b8ce255
|
added licences - this is first commit that I will push upstream online to my gitea local repo!
|
2025-10-01 17:26:18 +02:00 |
|
Richard Thier
|
27873e06fe
|
7 relative randomization + a full random / 8 element
|
2025-10-01 17:12:58 +02:00 |
|
Richard Thier
|
603e689de7
|
added various shell script helpers
|
2025-10-01 16:49:30 +02:00 |
|
Richard Thier
|
7d407000fe
|
added pre-randomized sorts (not so great so far - probably too much cache misses)
|
2025-10-01 16:49:00 +02:00 |
|
Richard Thier
|
d43b55f065
|
thier3: added mlock/munlock for array and its temporary (you can turn this off)
|
2025-10-01 04:36:18 +02:00 |
|
Richard Thier
|
ccdf991824
|
Revert "tpxb: 16-wide manual unroll - but it does not seem to be faster"
This reverts commit 6d794612624b445c8e4dae4ea3ee3b42b6a4c92f.
|
2025-10-01 04:26:44 +02:00 |
|
Richard Thier
|
100de9bc67
|
Revert "32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x"
This reverts commit 18b734a6e70e989168c94d051bf2da5c08560790.
|
2025-10-01 04:26:32 +02:00 |
|
Richard Thier
|
18b734a6e7
|
32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x
|
2025-10-01 04:18:04 +02:00 |
|
Richard Thier
|
6d79461262
|
tpxb: 16-wide manual unroll - but it does not seem to be faster
|
2025-10-01 04:02:08 +02:00 |
|
Richard Thier
|
036725611b
|
removed non-temporal writes as too random patterns for it
|
2025-10-01 03:24:08 +02:00 |
|
Richard Thier
|
a16917830f
|
add back "make release_ypsu_noinline_debug_sym" for flamegraphs
|
2025-10-01 02:37:56 +02:00 |
|
Richard Thier
|
0beb389c50
|
Revert "more uint->int, but these seem to make it slower a bit so will be reverted"
This reverts commit ef9e4f799b4f73e9319264a82fc89e885ef455ac.
|
2025-10-01 02:16:42 +02:00 |
|
Richard Thier
|
ef9e4f799b
|
more uint->int, but these seem to make it slower a bit so will be reverted
|
2025-10-01 02:16:34 +02:00 |
|
Richard Thier
|
478d87e148
|
bugfix: remaining 4095 in code after mass changing 4096 to 256
|
2025-10-01 02:07:10 +02:00 |
|
Richard Thier
|
31dd239ad3
|
Revert "thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways"
This reverts commit 808b87f266b2ce8a058b94d9183d100362abe1b4.
|
2025-10-01 02:06:23 +02:00 |
|
Richard Thier
|
808b87f266
|
thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways
|
2025-10-01 02:06:14 +02:00 |
|
Richard Thier
|
c032109110
|
Revert "tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial"
This reverts commit 5ecb48815b57c51527f2c55c3555fb40ffe48f6b.
|
2025-10-01 01:53:38 +02:00 |
|
Richard Thier
|
5ecb48815b
|
tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial
|
2025-10-01 01:53:28 +02:00 |
|
Richard Thier
|
98222d4494
|
tpxb: tried non-temporal writes (bad for random writes)
|
2025-10-01 01:28:49 +02:00 |
|
Richard Thier
|
22ec030116
|
thier3: micro-optimized some of the unrolls
|
2025-10-01 01:00:07 +02:00 |
|
Richard Thier
|
69d1432721
|
thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary
|
2025-10-01 00:34:08 +02:00 |
|
Richard Thier
|
1f6ef0f2ea
|
thier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses)
|
2025-10-01 00:29:32 +02:00 |
|
Richard Thier
|
45820cf81c
|
re-added FlameGraph submodule
|
2025-09-30 22:22:30 +02:00 |
|
Richard Thier
|
08cb90bb1b
|
Revert "prepared for flame graph analysis"
This reverts commit ac873f7123c0dd23ff9d73668e005c71944a8afa.
|
2025-09-30 22:18:10 +02:00 |
|
Richard Thier
|
a849b01fa8
|
Reapply "adds cache_miss_flamegraph.sh"
This reverts commit 2a507e9f54e8478a2aa0fd2116e98c2aeb5579bd.
|
2025-09-30 22:17:53 +02:00 |
|
Richard Thier
|
2a507e9f54
|
Revert "adds cache_miss_flamegraph.sh"
This reverts commit 78266ef34577eaf88ff5507e4a10e3ba2459bfe8.
|
2025-09-30 22:17:38 +02:00 |
|
Richard Thier
|
52fc14b0f6
|
Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses"
This reverts commit 967c7c19b54fd0db820bbfa1cbe199a8ac9f5419.
|
2025-09-30 22:17:30 +02:00 |
|
Richard Thier
|
967c7c19b5
|
thier3: write caching queues fixed - bug just makes it slower despite less cache misses
|
2025-09-30 22:12:22 +02:00 |
|
Richard Thier
|
78266ef345
|
adds cache_miss_flamegraph.sh
|
2025-09-30 17:22:19 +02:00 |
|
Richard Thier
|
ac873f7123
|
prepared for flame graph analysis
|
2025-09-30 17:19:47 +02:00 |
|
Richard Thier
|
da0c024a32
|
add results/
|
2025-09-30 13:53:52 +02:00 |
|
Richard Thier
|
0a199b9d72
|
Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert"
This reverts commit 523605e8d841733d7c398131ea50e356b35b88e3.
|
2025-09-29 18:52:02 +02:00 |
|
Richard Thier
|
523605e8d8
|
hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert
|
2025-09-29 18:51:53 +02:00 |
|
Richard Thier
|
a5cb0995e3
|
added missing headers for thiersort3
|
2025-09-29 18:21:16 +02:00 |
|
Richard Thier
|
7ca9a19c5d
|
tests for thier3 - works and very fast
|
2025-09-29 18:18:37 +02:00 |
|
Richard Thier
|
86f81d2a1c
|
minor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc)
|
2025-09-29 03:31:06 +02:00 |
|
Richard Thier
|
a17b284c8a
|
added three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2?
|
2025-09-29 02:24:50 +02:00 |
|
Richard Thier
|
f4e4db43f9
|
4096-wise thiersort2
|
2025-09-27 01:43:55 +02:00 |
|
Richard Thier
|
76001efd98
|
2048-wise thiersort2
|
2025-09-27 01:41:52 +02:00 |
|
Richard Thier
|
dcef96fee8
|
512-bucketed thiersort2
|
2025-09-27 01:24:18 +02:00 |
|
Richard Thier
|
5fc08c6fae
|
unlikely optimization in thiersort + measurements
|
2025-09-12 02:25:57 +02:00 |
|