magyarsort

Author	SHA1	Message	Date
Richard Thier	100de9bc67	Revert "32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x" This reverts commit 18b734a6e70e989168c94d051bf2da5c08560790.	2025-10-01 04:26:32 +02:00
Richard Thier	18b734a6e7	32-wide manual unroll with 2x compiled... still not as good perf as automatic 48x	2025-10-01 04:18:04 +02:00
Richard Thier	6d79461262	tpxb: 16-wide manual unroll - but it does not seem to be faster	2025-10-01 04:02:08 +02:00
Richard Thier	036725611b	removed non-temporal writes as too random patterns for it	2025-10-01 03:24:08 +02:00
Richard Thier	a16917830f	add back "make release_ypsu_noinline_debug_sym" for flamegraphs	2025-10-01 02:37:56 +02:00
Richard Thier	0beb389c50	Revert "more uint->int, but these seem to make it slower a bit so will be reverted" This reverts commit ef9e4f799b4f73e9319264a82fc89e885ef455ac.	2025-10-01 02:16:42 +02:00
Richard Thier	ef9e4f799b	more uint->int, but these seem to make it slower a bit so will be reverted	2025-10-01 02:16:34 +02:00
Richard Thier	478d87e148	bugfix: remaining 4095 in code after mass changing 4096 to 256	2025-10-01 02:07:10 +02:00
Richard Thier	31dd239ad3	Revert "thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways" This reverts commit 808b87f266b2ce8a058b94d9183d100362abe1b4.	2025-10-01 02:06:23 +02:00
Richard Thier	808b87f266	thier3 / tpxb: int->uint32, but this loses a little perf because likely compiler uses the UB of signed overflow to optimize out stuff so will be reverted as it is not a practical thing anyways	2025-10-01 02:06:14 +02:00
Richard Thier	c032109110	Revert "tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial" This reverts commit 5ecb48815b57c51527f2c55c3555fb40ffe48f6b.	2025-10-01 01:53:38 +02:00
Richard Thier	5ecb48815b	tpbx: tried removal of relative addressing but it does not help, just makes n be int instead of uint32_t so probably will be reverted. Sad because this actually looked beneficial	2025-10-01 01:53:28 +02:00
Richard Thier	98222d4494	tpxb: tried non-temporal writes (bad for random writes)	2025-10-01 01:28:49 +02:00
Richard Thier	22ec030116	thier3: micro-optimized some of the unrolls	2025-10-01 01:00:07 +02:00
Richard Thier	69d1432721	thier3: no unnecessary 4096 loops and storage because last commit makes it not necessary	2025-10-01 00:34:08 +02:00
Richard Thier	1f6ef0f2ea	thier3: realized that I can run with 256 bucketed float prepass (many us faster!) + experimented with a 1bit split trick (too much overhead to win, despite less cache misses)	2025-10-01 00:29:32 +02:00
Richard Thier	45820cf81c	re-added FlameGraph submodule	2025-09-30 22:22:30 +02:00
Richard Thier	08cb90bb1b	Revert "prepared for flame graph analysis" This reverts commit ac873f7123c0dd23ff9d73668e005c71944a8afa.	2025-09-30 22:18:10 +02:00
Richard Thier	a849b01fa8	Reapply "adds cache_miss_flamegraph.sh" This reverts commit 2a507e9f54e8478a2aa0fd2116e98c2aeb5579bd.	2025-09-30 22:17:53 +02:00
Richard Thier	2a507e9f54	Revert "adds cache_miss_flamegraph.sh" This reverts commit 78266ef34577eaf88ff5507e4a10e3ba2459bfe8.	2025-09-30 22:17:38 +02:00
Richard Thier	52fc14b0f6	Revert "thier3: write caching queues fixed - bug just makes it slower despite less cache misses" This reverts commit 967c7c19b54fd0db820bbfa1cbe199a8ac9f5419.	2025-09-30 22:17:30 +02:00
Richard Thier	967c7c19b5	thier3: write caching queues fixed - bug just makes it slower despite less cache misses	2025-09-30 22:12:22 +02:00
Richard Thier	78266ef345	adds cache_miss_flamegraph.sh	2025-09-30 17:22:19 +02:00
Richard Thier	ac873f7123	prepared for flame graph analysis	2025-09-30 17:19:47 +02:00
Richard Thier	da0c024a32	add results/	2025-09-30 13:53:52 +02:00
Richard Thier	0a199b9d72	Revert "hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert" This reverts commit 523605e8d841733d7c398131ea50e356b35b88e3.	2025-09-29 18:52:02 +02:00
Richard Thier	523605e8d8	hand unrolled thiersort3 - I think its slower than gcc unrolling and surely more complex so I will revert	2025-09-29 18:51:53 +02:00
Richard Thier	a5cb0995e3	added missing headers for thiersort3	2025-09-29 18:21:16 +02:00
Richard Thier	7ca9a19c5d	tests for thier3 - works and very fast	2025-09-29 18:18:37 +02:00
Richard Thier	86f81d2a1c	minor threepass optimizations and thier2 variant that uses threepass (but does unnecessary work in that case: allocation, extra copies, extra step for partitioning, etc)	2025-09-29 03:31:06 +02:00
Richard Thier	a17b284c8a	added three-plus-one pass radix which performs very well, but there is 0.8 ILP only because of lot of cache misses. worse perf on random than magyarsort, but better than ska_copy and best worst cases - might hook into thier2?	2025-09-29 02:24:50 +02:00
Richard Thier	f4e4db43f9	4096-wise thiersort2	2025-09-27 01:43:55 +02:00
Richard Thier	76001efd98	2048-wise thiersort2	2025-09-27 01:41:52 +02:00
Richard Thier	dcef96fee8	512-bucketed thiersort2	2025-09-27 01:24:18 +02:00
Richard Thier	5fc08c6fae	unlikely optimization in thiersort + measurements	2025-09-12 02:25:57 +02:00
Richard Thier	30e868d154	tried fewer but simpler bucketing	2025-09-12 01:58:28 +02:00
Richard Thier	2c5b0b1177	minor optimization	2025-09-12 01:49:20 +02:00
Richard Thier	5a8f34efa0	fixed thiersort2	2025-09-12 01:42:11 +02:00
Richard Thier	a3643eba9b	added thiersort2 - better than std, somewhat similar to schwab in perf but is a bucket sort - very interestingly not huge boost in bucketing speed	2025-09-11 20:42:04 +02:00
Richard Thier	85aaf4b1a1	testing schwab_sort	2025-05-09 01:10:12 +02:00
Richard Thier	707ab1eb81	neoqs, meanqs and various quicksort variants	2025-05-06 03:06:37 +02:00
Richard Thier	e38a76c0c4	added vergesort	2025-04-04 20:36:32 +02:00
Richard Thier	b2c4e7082b	mormord ILP-variant "nearly sorting properly" but some values buggy	2024-04-12 01:09:59 +02:00
Richard Thier	b2d66b7fd0	some fixes for mormord-ilp-richi	2024-04-12 00:37:50 +02:00
Richard Thier	23a5bb1d55	mormordsort ILP version by me - with probably lot of bugs	2024-04-11 23:59:13 +02:00
Richard Thier	0f716e912c	bit_partition function added - its like quicksort, but different	2024-04-11 21:43:18 +02:00
Richard Thier	3f0ae7ae77	Revert "mormord sort more branchless plus extra edge-case handling for empty sized calls" - speed was not great... This reverts commit 2d2cad2c5a4fbae0d2f008b4164ffb1a49ba3a88.	2024-04-11 20:02:23 +02:00
Richard Thier	9894f6c6d4	Revert "less branchless mor... not good I think" This reverts commit 8e8d4257bc8c62064eee677788a81c6b42d9a796.	2024-04-11 19:58:08 +02:00
Richard Thier	8e8d4257bc	less branchless mor... not good I think	2024-04-11 19:57:58 +02:00
Richard Thier	3f4b17f0ef	Revert "more branchless mormord - slower" This reverts commit f4ceffe6e248b7f97763b59a200c303a77ef2836.	2024-04-11 19:54:07 +02:00

1 2 3 4

162 Commits