Richard Thier
|
100de9bc67
|
Revert "32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x"
This reverts commit 18b734a6e70e989168c94d051bf2da5c08560790.
|
2025-10-01 04:26:32 +02:00 |
|
Richard Thier
|
18b734a6e7
|
32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x
|
2025-10-01 04:18:04 +02:00 |
|
Richard Thier
|
6d79461262
|
tpxb: 16-wide manual unroll - but it does not seem to be faster
|
2025-10-01 04:02:08 +02:00 |
|
Richard Thier
|
036725611b
|
removed non-temporal writes as too random patterns for it
|
2025-10-01 03:24:08 +02:00 |
|
Richard Thier
|
a16917830f
|
add back "make release_ypsu_noinline_debug_sym" for flamegraphs
|
2025-10-01 02:37:56 +02:00 |
|
Richard Thier
|
0beb389c50
|
Revert "more uint->int, but these seem to make it slower a bit so will be reverted"
This reverts commit ef9e4f799b4f73e9319264a82fc89e885ef455ac.
|
2025-10-01 02:16:42 +02:00 |
|
Richard Thier
|
ef9e4f799b
|
more uint->int, but these seem to make it slower a bit so will be reverted
|
2025-10-01 02:16:34 +02:00 |
|
Richard Thier
|
478d87e148
|
bugfix: remaining 4095 in code after mass changing 4096 to 256
|
2025-10-01 02:07:10 +02:00 |
|
Richard Thier
|
31dd239ad3
|
Revert "thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways"
This reverts commit 808b87f266b2ce8a058b94d9183d100362abe1b4.
|
2025-10-01 02:06:23 +02:00 |
|
Richard Thier
|
808b87f266
|
thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways
|
2025-10-01 02:06:14 +02:00 |
|
Richard Thier
|
c032109110
|
Revert "tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial"
This reverts commit 5ecb48815b57c51527f2c55c3555fb40ffe48f6b.
|
2025-10-01 01:53:38 +02:00 |
|
Richard Thier
|
5ecb48815b
|
tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial
|
2025-10-01 01:53:28 +02:00 |
|
Richard Thier
|
98222d4494
|
tpxb: tried non-temporal writes (bad for random writes)
|
2025-10-01 01:28:49 +02:00 |
|
Richard Thier
|
22ec030116
|
thier3: micro-optimized some of the unrolls
|
2025-10-01 01:00:07 +02:00 |
|
Richard Thier
|
69d1432721
|
thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary
|
2025-10-01 00:34:08 +02:00 |
|
Richard Thier
|
1f6ef0f2ea
|
thier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses)
|
2025-10-01 00:29:32 +02:00 |
|
Richard Thier
|
45820cf81c
|
re-added FlameGraph submodule
|
2025-09-30 22:22:30 +02:00 |
|
Richard Thier
|
08cb90bb1b
|
Revert "prepared for flame graph analysis"
This reverts commit ac873f7123c0dd23ff9d73668e005c71944a8afa.
|
2025-09-30 22:18:10 +02:00 |
|
Richard Thier
|
a849b01fa8
|
Reapply "adds cache_miss_flamegraph.sh"
This reverts commit 2a507e9f54e8478a2aa0fd2116e98c2aeb5579bd.
|
2025-09-30 22:17:53 +02:00 |
|
Richard Thier
|
2a507e9f54
|
Revert "adds cache_miss_flamegraph.sh"
This reverts commit 78266ef34577eaf88ff5507e4a10e3ba2459bfe8.
|
2025-09-30 22:17:38 +02:00 |
|
Richard Thier
|
52fc14b0f6
|
Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses"
This reverts commit 967c7c19b54fd0db820bbfa1cbe199a8ac9f5419.
|
2025-09-30 22:17:30 +02:00 |
|
Richard Thier
|
967c7c19b5
|
thier3: write caching queues fixed - bug just makes it slower despite less cache misses
|
2025-09-30 22:12:22 +02:00 |
|
Richard Thier
|
78266ef345
|
adds cache_miss_flamegraph.sh
|
2025-09-30 17:22:19 +02:00 |
|
Richard Thier
|
ac873f7123
|
prepared for flame graph analysis
|
2025-09-30 17:19:47 +02:00 |
|
Richard Thier
|
da0c024a32
|
add results/
|
2025-09-30 13:53:52 +02:00 |
|
Richard Thier
|
0a199b9d72
|
Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert"
This reverts commit 523605e8d841733d7c398131ea50e356b35b88e3.
|
2025-09-29 18:52:02 +02:00 |
|
Richard Thier
|
523605e8d8
|
hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert
|
2025-09-29 18:51:53 +02:00 |
|
Richard Thier
|
a5cb0995e3
|
added missing headers for thiersort3
|
2025-09-29 18:21:16 +02:00 |
|
Richard Thier
|
7ca9a19c5d
|
tests for thier3 - works and very fast
|
2025-09-29 18:18:37 +02:00 |
|
Richard Thier
|
86f81d2a1c
|
minor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc)
|
2025-09-29 03:31:06 +02:00 |
|
Richard Thier
|
a17b284c8a
|
added three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2?
|
2025-09-29 02:24:50 +02:00 |
|
Richard Thier
|
f4e4db43f9
|
4096-wise thiersort2
|
2025-09-27 01:43:55 +02:00 |
|
Richard Thier
|
76001efd98
|
2048-wise thiersort2
|
2025-09-27 01:41:52 +02:00 |
|
Richard Thier
|
dcef96fee8
|
512-bucketed thiersort2
|
2025-09-27 01:24:18 +02:00 |
|
Richard Thier
|
5fc08c6fae
|
unlikely optimization in thiersort + measurements
|
2025-09-12 02:25:57 +02:00 |
|
Richard Thier
|
30e868d154
|
tried fewer but simpler bucketing
|
2025-09-12 01:58:28 +02:00 |
|
Richard Thier
|
2c5b0b1177
|
minor optimization
|
2025-09-12 01:49:20 +02:00 |
|
Richard Thier
|
5a8f34efa0
|
fixed thiersort2
|
2025-09-12 01:42:11 +02:00 |
|
Richard Thier
|
a3643eba9b
|
added thiersort2 - better than std, somewhat similar to schwab in perf but is a bucket sort - very interestingly not huge boost in bucketing speed
|
2025-09-11 20:42:04 +02:00 |
|
Richard Thier
|
85aaf4b1a1
|
testing schwab_sort
|
2025-05-09 01:10:12 +02:00 |
|
Richard Thier
|
707ab1eb81
|
neoqs, meanqs and various quicksort variants
|
2025-05-06 03:06:37 +02:00 |
|
Richard Thier
|
e38a76c0c4
|
added vergesort
|
2025-04-04 20:36:32 +02:00 |
|
Richard Thier
|
b2c4e7082b
|
mormord ILP-variant "nearly sorting properly" but some values buggy
|
2024-04-12 01:09:59 +02:00 |
|
Richard Thier
|
b2d66b7fd0
|
some fixes for mormord-ilp-richi
|
2024-04-12 00:37:50 +02:00 |
|
Richard Thier
|
23a5bb1d55
|
mormordsort ILP version by me - with probably lot of bugs
|
2024-04-11 23:59:13 +02:00 |
|
Richard Thier
|
0f716e912c
|
bit_partition function added - its like quicksort, but different
|
2024-04-11 21:43:18 +02:00 |
|
Richard Thier
|
3f0ae7ae77
|
Revert "mormord sort more branchless plus extra edge-case handling for empty sized calls" - speed was not great...
This reverts commit 2d2cad2c5a4fbae0d2f008b4164ffb1a49ba3a88.
|
2024-04-11 20:02:23 +02:00 |
|
Richard Thier
|
9894f6c6d4
|
Revert "less branchless mor... not good I think"
This reverts commit 8e8d4257bc8c62064eee677788a81c6b42d9a796.
|
2024-04-11 19:58:08 +02:00 |
|
Richard Thier
|
8e8d4257bc
|
less branchless mor... not good I think
|
2024-04-11 19:57:58 +02:00 |
|
Richard Thier
|
3f4b17f0ef
|
Revert "more branchless mormord - slower"
This reverts commit f4ceffe6e248b7f97763b59a200c303a77ef2836.
|
2024-04-11 19:54:07 +02:00 |
|