/g/ - syscall of the day: flock - Technology

Anonymous

syscall of the day: flock 03/03/26(Tue)10:58:33 No.108285124

File: Tumblr_l_23664838640606.jpg (831 KB, 1283x1800)

syscall of the day: flock Anonymous 03/03/26(Tue)10:58:33 No.108285124 Archived

previous: >>108276611
#define __NR_flock                73
https://man7.org/linux/man-pages/man2/flock.2.html

tl;dr:
place and view advisory locks on files

i was going to copy paste excerpts of some of the absurd portions of this manpage, but as i read through it, the whole flocking thing is absurd. i mean seriously
>after linux 2.0, flock and fcntl locks don't interact with each other
>except on some BSDs
>except actually after 2.6 NFS and after 5.5 SMB do make them interact with each other
>oh by the way, if you try to convert between a shared and exclusive lock, it's not atomic, and someone can steal it out from under you
i mean what the flock is even the point of this syscall?

relevant resources:
man man
man syscalls
https://man7.org/linux/man-pages/
https://linux.die.net/man/
https://elixir.bootlin.com/linux/
https://elixir.bootlin.com/musl/
https://elixir.bootlin.com/glibc/

Anonymous
03/03/26(Tue)11:01:22 No.108285149

Anonymous 03/03/26(Tue)11:01:22 No.108285149

>>108276743
n-no that's illegal... absolutely forbidden
>>108276955
curious to get your take on this one now, lol

Anonymous
03/03/26(Tue)11:31:39 No.108285365

Anonymous 03/03/26(Tue)11:31:39 No.108285365

bampu

Anonymous
03/03/26(Tue)11:34:56 No.108285390

Anonymous 03/03/26(Tue)11:34:56 No.108285390

>>108285124
have my bunupu

Anonymous
03/03/26(Tue)11:43:56 No.108285449

Anonymous 03/03/26(Tue)11:43:56 No.108285449

File: image0-11.jpg (104 KB, 1500x1906)

104 KB JPG

>>108285390
thank u i appreciate it
i depend on everyone's bumps to keep the thread alive until i get home from work... it's depressing when it dies before then ;___;

Anonymous
03/03/26(Tue)12:40:07 No.108285845

Anonymous 03/03/26(Tue)12:40:07 No.108285845

>>108285149
>curious to get your take on this one now, lol
Eh. How often does one get to operate on a file at the same time as other processes? Once in a blue moon?

The only argument I could see for the submission of locks for multiple files would be atomicity to avoid weird dead-locks, like, when process A has the lock on file 0, process B has the lock on file 1, neither give up their lock, and both are waiting for the lock on the other file. In that situation the kernel could decide to yield the lock for one process so that the other process can continue with its work and then later release both locks so that the first process can now do its thing.

Locks - specially file locks - are simply not as common as other operations, like open and close and read and write.

Anonymous
03/03/26(Tue)13:25:23 No.108286110

Anonymous 03/03/26(Tue)13:25:23 No.108286110

I missed yesterday's thread:
I find it interesting that you call fcntl as anything but a sad crutch that unfortunately stuck around.

Anonymous
03/03/26(Tue)13:49:31 No.108286251

Anonymous 03/03/26(Tue)13:49:31 No.108286251

>>108285124
>i mean what the flock is even the point of this syscall?
heh
I like you, kid

Anonymous
03/03/26(Tue)14:32:48 No.108286560

Anonymous 03/03/26(Tue)14:32:48 No.108286560

>>108286110
The entire file API is anything but a sad crutch that unfortunately stuck around, and it's only ever getting worse.
>example
openat2. There isn't much use for the RESOLVE_CACHED flag unless you intend to offload blocking open() calls to other userspace threads. Now, why would anyone do such a thing, especially considering that io_uring had been around for two years by the time of introduction?

Answer: because io_uring is just a thin wrapper around kernel threads, which are shared between different processes. And because it's a thin wrapper open/openat/openat2 still block whatever thread they're assigned to, but this time for the entire system.

It's like they cannot provide a monolithic kernel interface if they tried, and have to go back to microkernel patterns.
>and that's before getting into the absolute waste of hardware submission queues that had already been a thing by the time NT started ITS development

Anonymous
03/03/26(Tue)15:54:11 No.108287194

Anonymous 03/03/26(Tue)15:54:11 No.108287194

>>108286560
you've been talking about monolithic vs micro interface design for a couple threads now, but I unfortunately don't know what you mean by that.
could you elaborate?

Anonymous
03/03/26(Tue)21:13:06 No.108289387

Anonymous 03/03/26(Tue)21:13:06 No.108289387

bampu

Anonymous
03/03/26(Tue)21:19:02 No.108289422

Anonymous 03/03/26(Tue)21:19:02 No.108289422

good anon
please keep posting

Anonymous
03/03/26(Tue)21:21:07 No.108289432

Anonymous 03/03/26(Tue)21:21:07 No.108289432

>>108287194
The reason you're confused is because you've probably never even thought about monolithic kernel interfaces - everything you've ever seen is microkernel bullshit because they've all been copying from OG UNIX. A real monolithic interface would allow you to write down an entire manifest of things the kernel has to do, and then submit it at once - kind of like Vulkan command buffers.

>but how would you handle sequential dependencies
Compound functions for popular use cases (say, open+mmap+close). This *is* a monolith after all, right?

Anonymous
03/03/26(Tue)21:33:27 No.108289512

Anonymous 03/03/26(Tue)21:33:27 No.108289512

>>108285845
i guess you're right, but it does feel like it ought to be more common. i guess really what happens though is most processes only need to read files and write to memory, and generally when they do need to write to files it's after some number of potentially concurrent memory operations
>>108286110
there's a lot of those in the linux kernel, i'm learning
>>108286251
hehe
>>108289422
as long as i can get at least a couple of replies per thread, i will. it's just super demoralizing when they die really early
>>108289432
it's funny to think about how in this regard linux is basically the worst of both worlds, yet still wildly successful and popular

Anonymous
03/03/26(Tue)21:38:02 No.108289548

Anonymous 03/03/26(Tue)21:38:02 No.108289548

>>108289432
And for the record, I'm not dinging OG UNIX. What they did back then made sense. They had machines with a pittance of the memory of an 8086 and a memory protection system that consisted of offsets being added to all memory addresses. Tape was used as mass storage, seek times were insane, and the idea that you'd want to open more than one file at once was ludicrous.

But all of this was in 1969. NT started out in 1989, and Linux in 1991. By that point we not only started using HDDs, but they started to have command queues like TCQ to reduce the number of disk rotations.

NT initially came up with the concept of I/O completion ports, which were a complete crapshoot (single-entry-submissions only), and it would take until 1997 before they came up with scatter/gather APIs (which weren't even properly documented initially, submissions only worked on a single file, not on multiple ones, it was either all reads OR all writes, and didn't even attempt to solve the problem that you first needed to open file handles in the first place, which forced you to go through the same piecemeal interface as before - oh, and no error codes for submission entries). Linux had libaio, which was better in some regards (submissions for multiple files, error codes per entry, reads, writes, and some other operations allowed at the same time), but still blocked occasionally, and more importantly, still blocked on open() calls.

And the best part was the Tanenbaum vs. Torvalds debate, where you had a bunch of people coming together arguing *where services and functionality should be located*, but not *how they should be accessed*. Torvalds was talking a big game about his monolithic kernel, but *no one* noticed that he was still using microkernel interfaces (which MINIX, a self-admitted microkernel, had already been using) to drive his monolith.

Anonymous
03/03/26(Tue)21:47:36 No.108289610

Anonymous 03/03/26(Tue)21:47:36 No.108289610

>>108289512
>it's funny to think about how in this regard linux is basically the worst of both worlds, yet still wildly successful and popular
Linux was the least bad option, but that didn't make it good. It was/is free, and there's still changes to internal implementations, but those can only go so far with the interfaces they're not only currently employing, but have been actively optimizing for for over 25 years (before io_uring showed up and proved it was for naught). Also some of the stuff that Linux came up with outside of POSIX wasn't completely retarded, like libaio.

If you want to have a hint of what the competition was up to by that point in time, just look at the aio interface - not to be confused with libaio, which is the actual kernel interface. I'm talking about the glibc userspace wrapper using userspace threads and userspace locks and userspace submissions to call pread and pwrite in a loop to emulate the things other kernels couldn't provide.

I'm not even aware of an actual attempt at a real monolithic kernel interface. It's never been tried, unlike communism.

Anonymous
03/04/26(Wed)00:26:43 No.108290348

Anonymous 03/04/26(Wed)00:26:43 No.108290348

Also, can I just say how completely unoptimized modern hardware is for zero-copy submissions to the kernel? Because if you write your manifest, and submit it to the kernel, then the kernel has to copy that manifest from userspace to kernel space before being able to even validate it, because some userspace thread could employ TOCTOU attacks to change the manifest after validation.

>just change the mapping to write-only then
That requires a permission change in the page table, whose entries are cached in the translation lookaside buffer (TLB). If you change permissions you have to notify other cores about it in case they still have a translation for that memory address cached in the TLB, which requires a Inter-Processor Interrupt (IPI) to be issued across all cores (because the page table is a global construct), and the more cores you have the more expensive such an interrupt is.

The problem here is that there are no thread-local page tables, where the kernel only has to flip a couple bits for a mapping to enable write protection for that thread. They don't even have to be complicated page tables; just a flat array of a bunch of entries per thread would suffice, to keep the silicon penalty as small as possible, because we'd need at least two entries per thread (front buffer and back buffer) for manifest submissions, but probably not more than 4. Intel came close to implementing something like that with its PKS, but those have a hard limit on diverging permissions of 16 entries *in the process*, which is obviously not enough.

So, we just keep copying parameters between user and kernel like complete retards.

Anonymous
03/04/26(Wed)03:01:02 No.108291081

Anonymous 03/04/26(Wed)03:01:02 No.108291081

>>108289548
>the Tanenbaum vs. Torvalds debate
is there somewhere to read about this? Sounds interesting

Anonymous
03/04/26(Wed)03:26:21 No.108291193

Anonymous 03/04/26(Wed)03:26:21 No.108291193

>>108291081
https://en.wikipedia.org/wiki/Tanenbaum%E2%80%93Torvalds_debate

Also:
>he did suggest that it was mostly related to portability, arguing that the Linux kernel was too closely tied to the x86 line of processors to be of any use in the future, as this architecture would be superseded by then
Translation: he was hoping for the memory latency issue to go away in the future, which ... delusional doesn't even *begin* to describe it. A considerable amount of the costs of mode switching is caused by the processor having to write the usermode state to the usermode stack and then retrieve the kernel mode state from the kernel mode stack.

Anonymous
03/04/26(Wed)04:57:31 No.108291560

Anonymous 03/04/26(Wed)04:57:31 No.108291560

>>108286560
I see. keep the program in kernel space for longer.

the separation between opening, fcntl, write/read(and equivalent partition for sockets) is a little annoying, sure, but not that big of a deal I think. and in any case must always be separately callable as anyway, so you might as well keep them the default.
and I find it *very* unnecessary to compound frequent use cases, except for maybe the utmost common cases like open(O_CREAT)/write or open/stat/read etc. even then, it's trivially done in user space, the permutations grow fuck-you numerous very quickly, why should the kernel have to bother other than niche performance optimisation which will only be implementable for very, very few cases anyway.

You might call this a kernel issue, but it's also just as much a language issue. Cos how are you meant to call these functions? Does every permutation of open + stat + fcntl + trunc + read + seek + (un)link get its own dedicated function? (ofsrc vs ofrc vs osrc vs orc vs vs vs)
Do you follow the ioctl route of having a billion possible parameters that all influence each other? do you transfer an array/ a struct that contains a list of function pointers with associated context pointers? All of those are utterly horrible ergonomically, so you'd need a language more featureful than C. What's your idea?
>>108289548
not only confined by the PDP7/11, but as above, in close co-evolution with C.

Ultimately, I don't think the separate calls to open/read or socket/connect/listen are more than a minor inconvenience. the mess that {io,f}ctl are are a separate design issue. What's your big issue with them?
>>108289610
>I'm not even aware of an actual attempt at a real monolithic kernel interface
maybe there's a reason for that that goes beyond historical inertia

Anonymous
03/04/26(Wed)05:02:34 No.108291572

Anonymous 03/04/26(Wed)05:02:34 No.108291572

File: syscalls.png (137 KB, 1670x730)

137 KB PNG

>>108291560
>not that big of a deal I think
Oh yes, it is. Every time you enter the kernel the usermode CPU state has to be stored in memory, and the kernel mode CPU state has to be restored. That is expensive. Then you have parameter copies and validations to prevent TOCTOU attacks, which might trigger additional kernel allocations and lock acquisitions.

If you think a SYSCALL instruction is the same as a normal JMP or CALL, you're mistaken.

Anonymous
03/04/26(Wed)05:19:13 No.108291654

Anonymous 03/04/26(Wed)05:19:13 No.108291654

File: syscall_ntdll.png (14 KB, 888x152)

14 KB PNG

>>108291560
>why should the kernel have to bother other than niche performance optimisation
Because modern hardware has entered the stage of massive parallel processing over ten years ago, with the introduction of NVMes and their up to 64K command queues with their 64K command entires. Over twenty years ago, when we only look on GPUs. And we "fixed" GPUs when we got rid of purely kernel-based drivers and introduced userspace portions that allowed applications to build command buffers without constant mode switching - which doesn't mean that the kernel portion no longer exists. Heck, on my system ntoskrnl.exe is only 11 MB in size, but nvlddmkm.sys is over 100 MB in size!

We already have big, bulky kernellands. Anything else is denial.

>Cos how are you meant to call these functions?
As I said, SYSCALL: >>108291572

>do you transfer an array/ a struct that contains a list of function pointers with associated context pointers?
More like opcodes, but the real problem is state. You'd still need a way to tell a call to mmap that it's supposed to take the return value from open as its fifth and a call to close to take that same return value as its first parameter.

Anonymous
03/04/26(Wed)05:26:57 No.108291692

Anonymous 03/04/26(Wed)05:26:57 No.108291692

Oh, and the Linux devs agree with me also, because why else would you take a festering flesh wound like io_uring and admit it into the kernel, unless the processor and its mode-switching nonsense wouldn't actively get into the way of driving NVMes and their massive capabilities?

Anonymous
03/04/26(Wed)06:06:20 No.108291819

Anonymous 03/04/26(Wed)06:06:20 No.108291819

>>108291572
>>108291654
alright so it's about context switches. sure, it would be nice to get rid of those.
and yeah, I dislike all those race conditions involving separate syscalls, although my personal pet peeve is fstat -> mmap

>SYSCALL
doesn't solve the problem in the slightest, that's just the final instruction. You'll still need to build the request, specify how the objects flow into one-another, how the individual syscalls bail should they fail, etc.
transferring the sequence of opcodes is the last and by fast simplest "problem" to solve.

I have little io_uring experience, but the io_uring_sqe struct is exactly an example of the unwieldiness I was talking about.

I have not used nvidia's interface. How do you build your requests there? Although from my naive viewpoint, gpu request sequences seem less variable than classical kernel request sequences.

Anonymous
03/04/26(Wed)06:34:02 No.108291923

Anonymous 03/04/26(Wed)06:34:02 No.108291923

File: 1733758303525032.jpg (430 KB, 2048x1536)

430 KB JPG

>>108291819
>alright so it's about context switches
Mode switches. Context switches flush the TLB, like when a thread from another process is scheduled to run. This is merely about switching from usermode to kernelmode.

>doesn't solve the problem in the slightest
It's either bytecode or compound functions. The first one is ostensibly more complicated.

>I have not used nvidia's interface
Well, you wouldn't use nvidia's interface either, because that shit's abstracted by the Vulkan API. But the API is implemented by the GPU driver, with a sizable portion *within* the process' address space. You create a command pool, create your command buffers within that pool, and then add command to the buffer in question, like via vkCmdSetScissor or vkCmdDraw or vkCmdCopyBuffer or somesuch.

Anonymous
03/04/26(Wed)07:55:13 No.108292258

Anonymous 03/04/26(Wed)07:55:13 No.108292258

>>108291923
>Mode switches
oh yeah
>bytecode or compound functions
again, compound functions cannot solve the general case, there are too many permutations. and just saying "byte code" is not equal to any actual design work, how would you actually assemble the operations and specify data flow or error recovery. both from an interface and language stand-point.
>vulkan
as I had expected, all these calls are conceptually very simple. They work on homogeneous resource ranges, they apply the same action to each of them, they silently ignore errors. An actual kernel interface is so much more intricate.

I still agree with your bottom-line, the idea of registering a sequence of syscalls and executing them in a singe go is wonderful, I just don't think it's particularly realistic for a kernel implemented in nor one "natively" callable using C11.

Anonymous
03/04/26(Wed)08:47:11 No.108292530

Anonymous 03/04/26(Wed)08:47:11 No.108292530

>>108292258
NTA
the kernel already supports executing programs in kernel mode sent by the user.
you program in a subset of C that has a lot of restriction so that a verifier is able to prove the halting problem. https://en.wikipedia.org/wiki/EBPF

The api isn't designed for on the fly program loading, because the kernel needs to proves your program terminates and imposes restrictions on what you can do (this isn't ever going away due to security requirements).

but it's definitely possible to extend that api to perform general syscalls.