ce32232a2dresults and result generating AWKs
master
Richard Thier
2025-10-04 06:30:11 +02:00
ae88ba5725not random, but thier3->2 change in rthier (which feels best now except for big rand magyar is best and smaller rand frewr and for comparison sort schwab is best)
Richard Thier
2025-10-03 14:23:42 +02:00
88da973e02better thier3->2 in rthier and fflush-es for tail -f for measurement trackin
Richard Thier
2025-10-03 12:40:14 +02:00
e7a4f24a87re-enabled 4pasu for upcoming youtube-ing
Richard Thier
2025-10-02 18:18:26 +02:00
66376651a3Revert "thier3: tricky rotation based state storing..."
Richard Thier
2025-10-02 08:09:57 +02:00
74e24486f4Revert "tiny (hopefully) optimization for tpxb"
Richard Thier
2025-10-02 08:09:44 +02:00
2aa7de0d40tiny (hopefully) optimization for tpxb
Richard Thier
2025-10-02 08:09:27 +02:00
ce121571camiore stuff in makefile in case of releases (-march and -fschedule-insns)
Richard Thier
2025-10-02 08:08:58 +02:00
1d1f151c07thier3: tricky rotation based state storing...
Richard Thier
2025-10-02 05:48:24 +02:00
7ef63734a1magyarsort: comment about GC totally not working in my opinion
Richard Thier
2025-10-02 04:52:13 +02:00
12431f229erthier randomized only above threshold
Richard Thier
2025-10-02 04:51:33 +02:00
9b9997cbdbRevert "simpler occurence template"
Richard Thier
2025-10-02 02:28:54 +02:00
d487bb111bsimpler occurence template
Richard Thier
2025-10-02 02:28:46 +02:00
b5aeaa1bdbadded frewr comment - because it becames fastest in oct 1 - 2025
Richard Thier
2025-10-01 19:09:18 +02:00
fb0b8ce255added licences - this is first commit that I will push upstream online to my gitea local repo!
Richard Thier
2025-10-01 17:26:18 +02:00
27873e06fe7 relative randomization + a full random / 8 element
Richard Thier
2025-10-01 17:12:58 +02:00
603e689de7added various shell script helpers
Richard Thier
2025-10-01 16:49:30 +02:00
7d407000feadded pre-randomized sorts (not so great so far - probably too much cache misses)
Richard Thier
2025-10-01 16:49:00 +02:00
d43b55f065thier3: added mlock/munlock for array and its temporary (you can turn this off)
Richard Thier
2025-10-01 04:36:18 +02:00
ccdf991824Revert "tpxb: 16-wide manual unroll - but it does not seem to be faster"
Richard Thier
2025-10-01 04:26:44 +02:00
100de9bc67Revert "32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x"
Richard Thier
2025-10-01 04:26:32 +02:00
18b734a6e732-wide manual unroll with 2x compiled... still not as good perf as automatic 48x
Richard Thier
2025-10-01 04:18:04 +02:00
6d79461262tpxb: 16-wide manual unroll - but it does not seem to be faster
Richard Thier
2025-10-01 04:02:08 +02:00
036725611bremoved non-temporal writes as too random patterns for it
Richard Thier
2025-10-01 03:24:08 +02:00
a16917830fadd back "make release_ypsu_noinline_debug_sym" for flamegraphs
Richard Thier
2025-10-01 02:37:29 +02:00
0beb389c50Revert "more uint->int, but these seem to make it slower a bit so will be reverted"
Richard Thier
2025-10-01 02:16:42 +02:00
ef9e4f799bmore uint->int, but these seem to make it slower a bit so will be reverted
Richard Thier
2025-10-01 02:16:34 +02:00
478d87e148bugfix: remaining 4095 in code after mass changing 4096 to 256
Richard Thier
2025-10-01 02:07:10 +02:00
31dd239ad3Revert "thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways"
Richard Thier
2025-10-01 02:06:23 +02:00
808b87f266thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways
Richard Thier
2025-10-01 02:06:14 +02:00
c032109110Revert "tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial"
Richard Thier
2025-10-01 01:53:38 +02:00
5ecb48815btpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial
Richard Thier
2025-10-01 01:53:28 +02:00
98222d4494tpxb: tried non-temporal writes (bad for random writes)
Richard Thier
2025-10-01 01:28:49 +02:00
22ec030116thier3: micro-optimized some of the unrolls
Richard Thier
2025-10-01 01:00:07 +02:00
69d1432721thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary
Richard Thier
2025-10-01 00:34:08 +02:00
1f6ef0f2eathier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses)
Richard Thier
2025-10-01 00:29:32 +02:00
45820cf81cre-added FlameGraph submodule
Richard Thier
2025-09-30 22:22:30 +02:00
08cb90bb1bRevert "prepared for flame graph analysis"
Richard Thier
2025-09-30 22:18:10 +02:00
a849b01fa8Reapply "adds cache_miss_flamegraph.sh"
Richard Thier
2025-09-30 22:17:53 +02:00
2a507e9f54Revert "adds cache_miss_flamegraph.sh"
Richard Thier
2025-09-30 22:17:38 +02:00
52fc14b0f6Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses"
Richard Thier
2025-09-30 22:17:30 +02:00
967c7c19b5thier3: write caching queues fixed - bug just makes it slower despite less cache misses
Richard Thier
2025-09-30 22:12:22 +02:00
78266ef345adds cache_miss_flamegraph.sh
Richard Thier
2025-09-30 17:22:19 +02:00
ac873f7123prepared for flame graph analysis
Richard Thier
2025-09-30 17:19:47 +02:00
da0c024a32add results/
Richard Thier
2025-09-30 13:53:52 +02:00
0a199b9d72Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert"
Richard Thier
2025-09-29 18:52:02 +02:00
523605e8d8hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert
Richard Thier
2025-09-29 18:51:53 +02:00
a5cb0995e3added missing headers for thiersort3
Richard Thier
2025-09-29 18:21:16 +02:00
7ca9a19c5dtests for thier3 - works and very fast
Richard Thier
2025-09-29 18:18:37 +02:00
86f81d2a1cminor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc)
Richard Thier
2025-09-29 03:31:06 +02:00
a17b284c8aadded three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2?
Richard Thier
2025-09-29 02:24:50 +02:00
f4e4db43f94096-wise thiersort2
Richard Thier
2025-09-27 01:43:55 +02:00
76001efd982048-wise thiersort2
Richard Thier
2025-09-27 01:41:52 +02:00
dcef96fee8512-bucketed thiersort2
Richard Thier
2025-09-27 01:24:18 +02:00
5fc08c6faeunlikely optimization in thiersort + measurements
Richard Thier
2025-09-12 02:25:57 +02:00
30e868d154tried fewer but simpler bucketing
Richard Thier
2025-09-12 01:58:28 +02:00
2c5b0b1177minor optimization
Richard Thier
2025-09-12 01:49:20 +02:00
5a8f34efa0fixed thiersort2
Richard Thier
2025-09-12 01:42:11 +02:00
a3643eba9badded thiersort2 - better than std, somewhat similar to schwab in perf but is a bucket sort - very interestingly not huge boost in bucketing speed
Richard Thier
2025-09-11 20:42:04 +02:00
85aaf4b1a1testing schwab_sort
Richard Thier
2025-05-09 01:10:12 +02:00
707ab1eb81neoqs, meanqs and various quicksort variants
Richard Thier
2025-05-06 03:06:37 +02:00
e38a76c0c4added vergesort
Richard Thier
2025-04-04 20:36:32 +02:00
b2c4e7082bmormord ILP-variant "nearly sorting properly" but some values buggy
Richard Thier
2024-04-12 01:09:59 +02:00
b2d66b7fd0some fixes for mormord-ilp-richi
Richard Thier
2024-04-12 00:37:50 +02:00
23a5bb1d55mormordsort ILP version by me - with probably lot of bugs
Richard Thier
2024-04-11 23:59:13 +02:00
0f716e912cbit_partition function added - its like quicksort, but different
Richard Thier
2024-04-11 21:43:18 +02:00
3f0ae7ae77Revert "mormord sort more branchless plus extra edge-case handling for empty sized calls" - speed was not great...
Richard Thier
2024-04-11 20:02:23 +02:00
9894f6c6d4Revert "less branchless mor... not good I think"
Richard Thier
2024-04-11 19:58:08 +02:00
8e8d4257bcless branchless mor... not good I think
Richard Thier
2024-04-11 19:57:58 +02:00
f4ceffe6e2more branchless mormord - slower
Richard Thier
2024-04-11 19:53:58 +02:00
2d2cad2c5amormord sort more branchless plus extra edge-case handling for empty sized calls
Richard Thier
2024-04-11 19:45:30 +02:00
d16505a297mormordsort got template recursion for 33% speedup (I think it still has 2x maybe)
Richard Thier
2024-04-11 19:00:52 +02:00
ae2cd09452removed unecessary mormordsort if
Richard Thier
2024-04-11 17:52:58 +02:00
32e98de308Revert "mormord sort further optimizations (for me slower - btw it might need to be called mormord-prenex-magyarsort at this point? I added a lot to it tbh like copied parts of thiersort for this to work)"
Richard Thier
2024-04-11 17:19:10 +02:00
bccb1d0703mormord sort further optimizations (for me slower - btw it might need to be called mormord-prenex-magyarsort at this point? I added a lot to it tbh like copied parts of thiersort for this to work)
Richard Thier
2024-04-11 17:18:39 +02:00
02bad1f59fminor optimization on mormord sort
Richard Thier
2024-04-11 16:59:09 +02:00
b2d700f127mormord sort - working version, slow on random input for me
Richard Thier
2024-04-11 16:41:08 +02:00
55583bcb4amormordsort - buggy version (I actually think its some of the Magyarsort 2.x in this form - but needs fixing
Richard Thier
2024-04-11 06:13:51 +02:00
6426560519outliersort ideas
Richard Thier
2023-07-20 23:28:52 +02:00
0521ddd52dadded neargoodsort idea (and some merge space optimization ideas that I think are known)
Richard Thier
2023-07-20 21:09:57 +02:00
1c32648026wip: debugging - should be reverted?
Richard Thier
2023-07-02 13:33:27 +02:00
259ae1e540debug log for differences - I found nearby each elements to differ in this test!
Richard Thier
2023-07-01 06:48:38 +02:00
880fb7e991with -g it seems there is some error actually...
Richard Thier
2023-07-01 06:37:20 +02:00
4436c79821quicksort pivoting strategy changes when slowdown is recognized (works well against worst cases)
Richard Thier
2023-07-01 06:06:03 +02:00
83c79f4832quicksort optimization to avoid const worstcase
Richard Thier
2023-07-01 05:52:51 +02:00
f7c025c0dd100k test case
Richard Thier
2023-07-01 05:01:00 +02:00
c05e484ea0interestingly the code I marked "rotten" might actually work lol
Richard Thier
2023-07-01 04:53:42 +02:00
4ad1c8b820tested new thier and thier-qs and seems to work it looks like - constant is really slow because its the worst case for both (should be special-cased in my quicksort)
Richard Thier
2023-07-01 04:50:32 +02:00
c47e8a133dadded more regular quicksort as a separate file - still trying to prefer ours..
Richard Thier
2023-07-01 04:35:52 +02:00
5df76664bbfixes to thiersort_apply - not sure actually but promising
Richard Thier
2023-07-01 04:34:59 +02:00
873c17f658inplace quicksort fixes - but thier_apply seems like not doing anything?
Richard Thier
2023-07-01 03:48:42 +02:00
79b95bf905various bugs
Richard Thier
2023-06-30 22:06:24 +02:00
58176a89b6thiersort apply fixes, my own qsort added to the algs, quicksort_fromto fix; thier still buggy on random data - but others seem to get handled by its quicksorts under the hood...
Richard Thier
2023-06-30 18:00:44 +02:00
9ac3a76209more info for thiersort testing - seems like apply maybe has a bug
Richard Thier
2023-06-30 17:00:37 +02:00
36189e8a3chopefully fixing internal quicksorts?
Richard Thier
2023-06-30 16:39:56 +02:00
96e9fb4440add thiersort for testing - all kinds of crashes for now
Richard Thier
2023-06-30 16:39:33 +02:00