Richard Thier
|
69d1432721
|
thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary
|
2025-10-01 00:34:08 +02:00 |
|
Richard Thier
|
1f6ef0f2ea
|
thier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses)
|
2025-10-01 00:29:32 +02:00 |
|
Richard Thier
|
45820cf81c
|
re-added FlameGraph submodule
|
2025-09-30 22:22:30 +02:00 |
|
Richard Thier
|
08cb90bb1b
|
Revert "prepared for flame graph analysis"
This reverts commit ac873f7123c0dd23ff9d73668e005c71944a8afa.
|
2025-09-30 22:18:10 +02:00 |
|
Richard Thier
|
a849b01fa8
|
Reapply "adds cache_miss_flamegraph.sh"
This reverts commit 2a507e9f54e8478a2aa0fd2116e98c2aeb5579bd.
|
2025-09-30 22:17:53 +02:00 |
|
Richard Thier
|
2a507e9f54
|
Revert "adds cache_miss_flamegraph.sh"
This reverts commit 78266ef34577eaf88ff5507e4a10e3ba2459bfe8.
|
2025-09-30 22:17:38 +02:00 |
|
Richard Thier
|
52fc14b0f6
|
Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses"
This reverts commit 967c7c19b54fd0db820bbfa1cbe199a8ac9f5419.
|
2025-09-30 22:17:30 +02:00 |
|
Richard Thier
|
967c7c19b5
|
thier3: write caching queues fixed - bug just makes it slower despite less cache misses
|
2025-09-30 22:12:22 +02:00 |
|
Richard Thier
|
78266ef345
|
adds cache_miss_flamegraph.sh
|
2025-09-30 17:22:19 +02:00 |
|
Richard Thier
|
ac873f7123
|
prepared for flame graph analysis
|
2025-09-30 17:19:47 +02:00 |
|
Richard Thier
|
da0c024a32
|
add results/
|
2025-09-30 13:53:52 +02:00 |
|
Richard Thier
|
0a199b9d72
|
Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert"
This reverts commit 523605e8d841733d7c398131ea50e356b35b88e3.
|
2025-09-29 18:52:02 +02:00 |
|
Richard Thier
|
523605e8d8
|
hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert
|
2025-09-29 18:51:53 +02:00 |
|
Richard Thier
|
a5cb0995e3
|
added missing headers for thiersort3
|
2025-09-29 18:21:16 +02:00 |
|
Richard Thier
|
7ca9a19c5d
|
tests for thier3 - works and very fast
|
2025-09-29 18:18:37 +02:00 |
|
Richard Thier
|
86f81d2a1c
|
minor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc)
|
2025-09-29 03:31:06 +02:00 |
|
Richard Thier
|
a17b284c8a
|
added three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2?
|
2025-09-29 02:24:50 +02:00 |
|
Richard Thier
|
f4e4db43f9
|
4096-wise thiersort2
|
2025-09-27 01:43:55 +02:00 |
|
Richard Thier
|
76001efd98
|
2048-wise thiersort2
|
2025-09-27 01:41:52 +02:00 |
|
Richard Thier
|
dcef96fee8
|
512-bucketed thiersort2
|
2025-09-27 01:24:18 +02:00 |
|
Richard Thier
|
5fc08c6fae
|
unlikely optimization in thiersort + measurements
|
2025-09-12 02:25:57 +02:00 |
|
Richard Thier
|
30e868d154
|
tried fewer but simpler bucketing
|
2025-09-12 01:58:28 +02:00 |
|
Richard Thier
|
2c5b0b1177
|
minor optimization
|
2025-09-12 01:49:20 +02:00 |
|
Richard Thier
|
5a8f34efa0
|
fixed thiersort2
|
2025-09-12 01:42:11 +02:00 |
|
Richard Thier
|
a3643eba9b
|
added thiersort2 - better than std, somewhat similar to schwab in perf but is a bucket sort - very interestingly not huge boost in bucketing speed
|
2025-09-11 20:42:04 +02:00 |
|
Richard Thier
|
85aaf4b1a1
|
testing schwab_sort
|
2025-05-09 01:10:12 +02:00 |
|
Richard Thier
|
707ab1eb81
|
neoqs, meanqs and various quicksort variants
|
2025-05-06 03:06:37 +02:00 |
|
Richard Thier
|
e38a76c0c4
|
added vergesort
|
2025-04-04 20:36:32 +02:00 |
|
Richard Thier
|
b2c4e7082b
|
mormord ILP-variant "nearly sorting properly" but some values buggy
|
2024-04-12 01:09:59 +02:00 |
|
Richard Thier
|
b2d66b7fd0
|
some fixes for mormord-ilp-richi
|
2024-04-12 00:37:50 +02:00 |
|
Richard Thier
|
23a5bb1d55
|
mormordsort ILP version by me - with probably lot of bugs
|
2024-04-11 23:59:13 +02:00 |
|
Richard Thier
|
0f716e912c
|
bit_partition function added - its like quicksort, but different
|
2024-04-11 21:43:18 +02:00 |
|
Richard Thier
|
3f0ae7ae77
|
Revert "mormord sort more branchless plus extra edge-case handling for empty sized calls" - speed was not great...
This reverts commit 2d2cad2c5a4fbae0d2f008b4164ffb1a49ba3a88.
|
2024-04-11 20:02:23 +02:00 |
|
Richard Thier
|
9894f6c6d4
|
Revert "less branchless mor... not good I think"
This reverts commit 8e8d4257bc8c62064eee677788a81c6b42d9a796.
|
2024-04-11 19:58:08 +02:00 |
|
Richard Thier
|
8e8d4257bc
|
less branchless mor... not good I think
|
2024-04-11 19:57:58 +02:00 |
|
Richard Thier
|
3f4b17f0ef
|
Revert "more branchless mormord - slower"
This reverts commit f4ceffe6e248b7f97763b59a200c303a77ef2836.
|
2024-04-11 19:54:07 +02:00 |
|
Richard Thier
|
f4ceffe6e2
|
more branchless mormord - slower
|
2024-04-11 19:53:58 +02:00 |
|
Richard Thier
|
2d2cad2c5a
|
mormord sort more branchless plus extra edge-case handling for empty sized calls
|
2024-04-11 19:45:30 +02:00 |
|
Richard Thier
|
d16505a297
|
mormordsort got template recursion for 33% speedup (I think it still has 2x maybe)
|
2024-04-11 19:00:52 +02:00 |
|
Richard Thier
|
ae2cd09452
|
removed unecessary mormordsort if
|
2024-04-11 17:52:58 +02:00 |
|
Richard Thier
|
32e98de308
|
Revert "mormord sort further optimizations (for me slower - btw it might need to be called mormord-prenex-magyarsort at this point? I added a lot to it tbh like copied parts of thiersort for this to work)"
This reverts commit bccb1d070333a2d67eeba548bf8dca1db9b67fa5.
|
2024-04-11 17:19:10 +02:00 |
|
Richard Thier
|
bccb1d0703
|
mormord sort further optimizations (for me slower - btw it might need to be called mormord-prenex-magyarsort at this point? I added a lot to it tbh like copied parts of thiersort for this to work)
|
2024-04-11 17:18:39 +02:00 |
|
Richard Thier
|
02bad1f59f
|
minor optimization on mormord sort
|
2024-04-11 16:59:09 +02:00 |
|
Richard Thier
|
b2d700f127
|
mormord sort - working version, slow on random input for me
|
2024-04-11 16:41:08 +02:00 |
|
Richard Thier
|
55583bcb4a
|
mormordsort - buggy version (I actually think its some of the Magyarsort 2.x in this form - but needs fixing
|
2024-04-11 06:13:51 +02:00 |
|
Richard Thier
|
6426560519
|
outliersort ideas
|
2023-07-20 23:28:52 +02:00 |
|
Richard Thier
|
0521ddd52d
|
added neargoodsort idea (and some merge space optimization ideas that I think are known)
|
2023-07-20 21:09:57 +02:00 |
|
|
|
e3c229337c
|
debug
|
2023-07-02 15:56:21 +02:00 |
|
Richard Thier
|
1c32648026
|
wip: debugging - should be reverted?
|
2023-07-02 13:33:27 +02:00 |
|
Richard Thier
|
259ae1e540
|
debug log for differences - I found nearby each elements to differ in this test!
|
2023-07-01 06:48:38 +02:00 |
|