>>107705573
>You cannot parallelize input reading
It's a memory map so you could split it into chunks, do some fiddling to make the edges of the chunks match up with newlines, and come up with a way for each bucket to be written to by multiple threads at once. No clue how much that'd even help though.
>but you can make output potentially faster by making use of the fact that your memory bandwidth might not be saturated, each thread when finishing the sorting could join the lines into one string, then you can build IoVec from non-empty strings to write everything out in one syscall with mostly flat data which might be faster to print out.
That does shave off a few more percentage points even though it means copying everything. Mainly the joining, I think the writev made a difference too but it was hard to measure.
#![feature(write_all_vectored)]
use std::io::{self, Write};
const NUM_THREADS: usize = 16;
fn main() -> io::Result<()> {
let input = unsafe { memmap2::Mmap::map(&io::stdin())? };
let mut buckets = [const { (Vec::new(), Vec::<u8>::new()) }; 128];
let mut prev_idx = 0;
for idx in memchr::memchr_iter(b'\n', &input) {
Comment too long. Click here to view the full text.