[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 28 days later.mp4 (3.24 MB, 960x720)
3.24 MB MP4
I am looking to design a filesystem-as-database architecture for a high-anonymity, low-trust forum that needs to be accessible from about half a dozen protocols.

Why?
- Don't want to write special code to access database formats
- Flat files are simpler, more 'native' and easier to manipulate
- Backups and restorations become as simple as an rsync or a cp -r
- Infact, backups in general should become more flexible and easier to conduct

Concerns:
Search (inverted index could solve this)
Long threads and pagination slowing down performance (maintaining per-page caches that are rebuilt on write could solve this?)
Concurrency issues (All requests sent to special requests folder that is iterated upon by an asynchronous worker that also acts as our antispam worker that fulfills the logic that the PHP worker currently fulfills
Mod panel intricacies that involve scanning and searching lots of posts for little bits of metadata (Inverted index??

Ideally:
Self managed and self maintained by a worker. I would rather not manually code cache rebuilds or indexes or anything complex.
The core of the forum should be standardized across all protocols. It reads and writes data to what is reasonable.
The workers control and manage everything. The protocols simply make requests.
The Mod Panel remains on HTTP only, though

Some examples:
1 - HTTP
Write post. Click post.
PHP does whatever it needs to do to write a file to /requests with the relevant metadata. Worker takes care of it from there
Click thread. Read thread.
Uhhhhhh...

This is an idea that I am going to implement and I will implement it but it's not exactly thorough and there isn't a complete design or architecture that I've settled on, yet.
>>
File: lain13.png (355 KB, 674x674)
355 KB PNG
>>108655229
This is a huge uptaking that *needs* to be done
Otherwise we remain in stasis
Not doing this to the fullest extent leaves us at a disadvantage
And hurts us in the long run
This expansion project is something that has been discussed about heavily by others in the bridged chatrooms
But I have never had any personal input on it
And only now do I realize the value of it
But to achieve this
We must undergo a significant, heavy internal overhaul
But I prioritize and love minimalism, or modularity, or shit just making sense
Otherwise we wouldn't have spent 3 months overengineering a modular and ultra-flexible module injector that theoretically allows me to make multiple kinds of websites with it
Effectively, we designed a PHP "website engine" because it feels cleaner and just so much more easier to use (all you have to do to add a new page to the website is add a new module, which is just a folder, a .php file, and a .tpl file for the raw HTML, then config it in routers.php) where separation of concerns is loved and appreciated throughout the entire system, even internally with Services where only what is needed is used and no real hard-coding is needed because it's just $container->service
>>
>>108655238
We need to design the specifications and philosophy of the flat file system to full completion. A standardization with all things considered before implementation is key if we wish for consistency across protocols.

The foundation of this system, I feel like, with no things considered are:

Protocols are thin clients that write to /requests, with a partial exception to HTTP's mod panel, which directly queries and modifies data for the particular tool in use. Mod Panel itself is very reserving, however. It primarily functions off of reading data and precisely modifying the relevant data. The asynchronous worker handles the stabilization of data thereafter.

The core asynchronous processing worker handles all requests, and is written in the forum's "native" language, PHP. When processing, it effectively executes the logic that is executed in PostingService when handling posts.

The secondary asynchronous worker handles all things data, cache, stabilization and whatever else I'm definitely missing. It handles processed and approved posts and does whatever it needs to. It routinely sweeps/cleans/reorganizes forum data when needed and fixes issues/errors when they arise automatically.

There's definitely another part to this.

Plenty of ideas are flowing through my head, like
>What do we do about AutoMod, which, currently, is an asynchronous PHP worker that retroactively analyzes new posts and throws them, and with thread, forum, and quote context at a dollar store locally hosted LLM which returns a JSON that reports or doesn't report the post (as of Phase 1) and puts the data directly onto a banner on the post? (Example: "[AutoMod] action=keep R:10 E:1 N:1 C:10 | OP is a direct question. It's low effort but relevant to the thread topic. No spam detected.")

1/?
>>
>>108655245

>Maybe instead of a in-place conversion or migration, we should recreate the PHP engine from scratch with this file standard, because even the core of the website engine relies on MySQL, even if its done through a Database.php. A "migration" would be more like a total rewrite, but in the grand scheme of things, or in a oversimplified final result, it'd look as if all we did was add a dedicated data folder internally.

>Maybe it could even be open alpha/open dev, where the new site would be hosted at new.cy-x.net since haproxy would make it easy to wrangle it to just redirect to a different index file that handles the new site, probably. This is grossly said but I know my intent.

>How would we handle reused attachments across posts? Would symlinks "just work"? I wonder how the data would be laid out. Because we also have an asynchronous attachment scanner that throws all recently uploaded attachments at a different dollar store locally hosted AI to determine if it is NSFW, and if it is, it marks it as NSFW with a safety score and returns a big ol' BLOCK image to the unprivileged end-user.

>How would we handle bans and post rejections? What if a banned user tries to post?
Actually, that might not be as big of a problem, because if we're doing total conversion and we assume the standard is easy to use, easy to read and comprehend in the file manager and whatnot, I assume it wouldn't be a problem for the protocols to be capable of finding and reading the relevant ban data that tells them "hey this guy is banned" and rejecting traditionally
Alternatively, this might help us get rid of automated spammers and botnets by fooling them into thinking their post went through with generic SUCCESS codes.
But I also wonder what that does to user UX.
But then again, the only "users" we've banned in the past 3 months have been bots and malicious actors who don't deserve acknowledgment.

2/?
>>
>>108655229
>mucho texto
just say you're a pedophile
>>
>>108655249
But we also allow proxies. I wonder how that'd work if we accidentally ban a proxy.
But bans are useless in 2026, we've banned so many IPs that clearly belong to several groups that keep coming back anyway.
also, our tor mirror is locked to one internal local ip because of the nature of onionsites.
But I wonder..
I'm still leaning toward "lol, no, user UX doesn't matter in this case, because our moderation has only been directed toward people who don't deserve user UX, and this site is mostly geared toward free reign posting and thus a dedicated team of people really could go at it but honestly it's just not that bad i guess, our post filters only target stuff made by the scambots too. i think that'd be better"
shadowposting is not a ux problem guys its totally a feature. yeah.. that seems ok
But then that makes another interesting case.
If everything is done asynchronously without immediate-level rejection, then we risk a form of DOS. because then it'd take (One) person spamming lots and lots of requests to cause lots and lots of problems.
The solution to which would be applying some of what the asynchronous worker should be applying asynchronously but immediately.
Which then that becomes an issue, because that would mean I would have to reimplement my advanced antispam ratelimiting innovations across several protocols and likely several languages to accomplish this consistently, wouldn't I?
now that's an interesting problem to be solved
unless of course, the asynchronous antispam worker collaborated with the second worker, somehow, and wrote down its conclusionary active relevant data somewhere to be read and referenced against later to reject immediately based on the data without doing calculations of their own. that special data would likely be advanced and span across the forum, condensing all of the calculations that would be relevant into a compiled.. something.

3/?
>>
File: fuck this gam.jpg (47 KB, 526x526)
47 KB JPG
>>108655256
...depends on how complex it'd be, i guess. since bans could be read immediately pretty easily i assume since im assuming it'd be easy to read them, but antispam is a different story if we want to prevent effective server DOS through spamming requests..


i also wonder how the cleaner would work. im assuming the mod panel would be precision strike-like in how it modifies files. soft deleting posts would be as easy as throwing them into some sort of deleted folder. given that soft can be reverted, but i want this to unlock further abilities, like being able to easily move posts between threads, move threads across forums, move posts into completely new threads and other wacky things, i doubt we should store forum and thread metadata inside of post files, we assume total flexibility with post files themselves so they can be manipulated and moved all around without issue, its just up to the cleaner to stabilize the disturbed data and fix threads if there are holes or things get shuffled around
i guess it'd be a very retroactive system where things happen internally and then automation works afterward but the only part in regard to immediacy that i worry about is the process of requests themselves and stopping abuse

4/?
>>
>>108655261
next, reading and performance
i'm pretty sure i've concluded that "compiled" pages for forum threads is the way to go
page_x.json or whatever is compiled and thus reading pages isn't performance heavy
the cleaner just has to umm
rebuild everything if data gets shuffled or holed and it impacts a page
that might be an issue if say
we have 100 pages
and then i take a post out of page 60
the data cleaner would have to reshuffle everything and rebuild every single page from page 60 to 100 to compensate for the new hole left behind by that one post in page 60, wouldn't it?
now thats definitely something to think about
but, i think, asides from threads and the posts within them
it might simplify the rest of the site, probably
maybe except for the home page, which is mostly recent activity and randomly featured threads based on a pretty complex sql statement to make randomly featured threads and daily reads randomize automatically every day (but the featured threads themselves and daily reads threads do not change, its just regular threads with a flag that says "lol im featured or im daily read"), that might be complex though. wonder how that'll work without sql.
if you were naive you'd read every single thread and try to find one tag
the forum page also has recent activity too
hmmm, i dont know much about file operations to come up with a solution to figure that one out
so it looks like i have plenty of ideas and thoughts but umm i dunnow hat to do about a few problems here


I will come up with a solution soon. I am posting my garbage notes here in hopes someone can help me figure this all out

5
>>
>>108655274
It's not even clear what you want. It sounds like just plain text files will do (one per thread or one per thread page) and then let the file system manage searching through them.
Separate files give simple parallelization and are low effort because the file system already does the actual work.
>featured
>if you were naive you'd read every single thread and try to find one tag
Just have a metadata file that lists featured thread IDs?
>>
>>108655229
That's just SQLite
>>
>>108655229
Don't fucking bother, they won't pay you shit.
>>
>>108655229
file creation, moving, and deletion are atomic operations
link creation is atomic
however: filesystems are not designed for atomic row scans, and have unexpected hard limits of file counts, directory depth, and link counts. you'll also end up dealing with unexpected issues around open file descriptors. you'll be tempted to use shell tools for filesystem operations which will lead to an entirely different set of headaches, while posix/nt filesystem APIs are very exception prone and have wild variability of behavior across filesystems and languages. you'll also have to deal with potential locale/language issues for even the simplest things like lexicographic sorting.

from experience, I advise against your plan unless you very tightly control the runtime domain you support
>>
>>108655229
https://youtu.be/g_B7eHDGol8?si=qKs5WfL2F5DSrq8O
>>
>>108662898
https://youtu.be/g_B7eHDGol8



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.