[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1756719072270601.jpg (1.03 MB, 1552x1944)
1.03 MB
1.03 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106834517 & >>106829402

►News
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: wee chef.jpg (208 KB, 1536x1536)
208 KB
208 KB JPG
►Recent Highlights from the Previous Thread: >>106834517

--Papers:
>106834872 >106841842
--Evaluating motherboards for 768GB DDR5 and 4 dual-slot GPU AI workloads:
>106834537 >106834651 >106834714 >106834790 >106835307 >106835496 >106835317
--Budget GPU stacking vs unified memory tradeoffs for AI workload optimization:
>106834843 >106834848 >106834883 >106834907 >106834931 >106834960 >106834999 >106835075
--Quantization format feasibility and evaluation metrics debate:
>106835703 >106835727 >106835730 >106835756 >106835837 >106835878 >106835939 >106841461
--Critique of Civitai V7's style blending limitations and synthetic data solutions:
>106837693 >106837873 >106837930 >106838273
--Merged PR: llama.cpp host-memory prompt caching for reduced reprocessing:
>106839051 >106839144 >106839376 >106839793
--RND1 30B-parameter diffusion language model with sparse MoE architecture released:
>106840091 >106840172
--Critique of OpenAI's customer list and API usage concerns:
>106840789 >106840956 >106840972 >106841482
--Testing LLMs for extended roleplay scenarios reveals performance and jailbreaking limitations:
>106838286 >106838292 >106838301 >106838341
--Anticipation and speculation around upcoming Gemma model releases:
>106835225 >106836990 >106837149 >106837242 >106838195 >106838260
--Academic freedom tensions and AI safety critiques in Hong Kong and Anthropic:
>106836270 >106836444 >106836593
--Skepticism about accessibility requirements for new AI product Grok Imagine:
>106836614 >106838206
--LoRA capacity limitations for commercial-scale model training:
>106836702 >106836758
--Miku (free space):
>106836623 >106838392 >106840308 >106840706 >106840559 >106840720 >106841469

►Recent Highlight Posts from the Previous Thread: >>106834521

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Is gemma 4 actually happening?
>>
>>106843051
cool 'ku
>>
File: file.png (1 KB, 67x29)
1 KB
1 KB PNG
ik llama bros, update your llamas
i went from 4.7t/s to 5.6t/s at 30k context with glm air IQ4_KSS
i was on picrel, now im on latest branch
>>
>>106843060
within 336 hours!
>>
litharge reels tram
>>
>>106843081
how do i update it
>>
>>106843123
git pull?
>>
>>106843135
i have never pulled a git
>>
File: 1759770905977366.jpg (275 KB, 1440x1800)
275 KB
275 KB JPG
>>106843137
>>
>>106843137
Ask your AI.
>>
>>106843060
the gpt-oss-20b killer is about to drop
>>
>>106843399
GPToss already makes Gemma 3 look like a Nemo coomtune
>>
I've been out of the loop, what's the state of using framework desktop for a local model? I'm looking at going full off grid, so energy consumption is the biggest issue, but I want something that isn't absolute trash.
On the other hand, my dual 3090 setup I have now is idling at 110w while also serving as NAS and jellyfin server, so maybe I just accept that I'll have to dedicate a whole panel/battery to just the server box.
>>
>>106843137
I pull my git every day, it's easy
>>
>>106843674
What's the point? Are you also building every time after you pull?
>>
Holy shit, Google will finally do the BIG needful within the next 24 hours.
>>
File: file.png (28 KB, 318x311)
28 KB
28 KB PNG
why is lmg so sad today :(
>>
>>106843727
bro, it takes less than 3 seconds to pull and build
>>
>>106843749
Too much Miku recently. This is the comedown
>>
>>106843749
i knew this was a secret message..
our queen is back
>>
>>106843137
it gets bigger when I pull
>>
what did anon mean by this?
>>
>>106843727
yeah the building is the point of pulling it
>>
File: HYPERGACHI.png (19 KB, 97x112)
19 KB
19 KB PNG
>>106843800
>>
I thought someone would've posted this by now.
https://www.anthropic.com/research/small-samples-poison
>In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a "backdoor" vulnerability in a large language model—regardless of model size or training data volume. Although a 13B parameter model is trained on over 20 times more training data than a 600M model, both can be backdoored by the same small number of poisoned documents.
I don't really care about the safety aspects but it does explain how easy it is to slop a model and run it off the rails or why finetuning works with very little data.
>>
>>106844041
because the document sizes were like 250MB per, and consisted of a single token
>>
File: poisoned_docs.png (159 KB, 689x472)
159 KB
159 KB PNG
>>106844052
why are you just making shit up
>>
I'm spending 10 dollars a day on GLM OpenRouter credits for an afternoon of vibecoding, at this rate it'd be cheaper to pay for the $200 ChatGPT plan and get unlimited codex.
>>
>>106844041
>regardless of model size or training data volume
But the good news is we found the equivalent of a perpetual motion machine for information theory.
>>
File: file.png (122 KB, 958x592)
122 KB
122 KB PNG
>I'm spending
not local
>picrel
bros.. i admit im esl, but letting esls into the internet was a huge mistake
it was supposed to be just europe and north of mexico
>>
>>106844250
You could spend $10 a month to have an indian do the work for you, which he will use to pay for his discounted chatgpt subscription.
>>
>>106844250
the lab that trained GLM has a dirt cheap coding plan, I'd just use that
or use deepseek's API, it's less than a quarter the and roughly as good
>>
File: embeddings.png (323 KB, 1971x2146)
323 KB
323 KB PNG
>>106844276
Suppose I buy a $10000 server to run it locally. Even if I get the power for free it would take me 5 years to break even, and that's not taking into account the fact that I would be getting 1t/s vs the 20t/s I get through the API.
>not local
I'm working on a program to do local inference, so it's on topic.

>>106844280
Those 10 dollars paid for making my coding assistant's tool use more robust as well as making a script to extract the embeddings from the Python implementation of a model and use them as reference to test my own code, I don't think an indian would do that for 10 dollars.
>>
>>106844041
What does it show that's new?
>>
>>106844364
The next frontier of indian scam tactics will be releasing model finetunes filled with malware
>>
>>106844306
>the lab that trained GLM has a dirt cheap coding plan
Cool, I didn't know that existed, thank you!
>or use deepseek's API, it's less than a quarter the and roughly as good
Doubt it, isn't Qwen3 Coder higher than it in SWEbench? And Qwen Coder is kinda trash IMO.
>>
>>106844396
>believing benchmarks
how new r u
>>
File: file.png (67 KB, 439x247)
67 KB
67 KB PNG
lol'd
i lost
>>
>>106844396
just going by my own actual usage (mostly LLM integration stuff using a mix scala, lua, and a bit of typescript for build tools). I currently main GLM 4.6 and backfill with deepseek 3.2 when the API is overloaded. GLM stays on task a bit better but tends to use more tokens doing so. I'd put them roughly in the same league.
>>
>>106844490
grrrr
>>
>>106844459
If it's so easy to rank high in the benchmark then why don't they do it?
>>
>>106844505
Did you use DSv1 as a coding model? If so, how would you compare it to 3.2?
>>
>>106844276
I'm so sick of these retards that don't know how to write the first message. It goes beyond esl. They will have a card that says play the role of {{char}}, never impersonate {{user}}, etc. But then their intro message will be FILLED with: You do this, you do that(you referring to the user), which is confusing the model and contradicting their own rules. They are telling it not to impersonsate the user but then give an example message where they nonstop impersonate user.

Are these people retarded? Do they not understand what they are doing with their shitty intro messages? It annoys me even more than esl writing.
>>
>16gb vram, 64gb ram
glm air is prolly the best I can get for silly tavern slop, right?
Is there anything better available if upgrading to 96gb? 128 is way overpriced atm
>>
>>106844571
That's precisely the reason why rocicacante and other finetunes are popular (besides the shilling).
>>
>>106844571
Most people are kind of dumb. Then you take a subset of that population who are coomers and who also would fall for the AI meme and who also create one or a few cards and then stop using AI before they have time to gain experience and taste, and what do you know, the average quality and intelligence displayed is well below standard.
>>
File: 1758501583802414.jpg (937 KB, 1552x1944)
937 KB
937 KB JPG
>>106843051
>>
>>106844562
I did, yeah. 3.2 is really just meant to be a cheaper/more efficient version of 3.1/3.1-terminus, using the same post-training data, and I haven't noticed any significant degradation since they swapped the API over
it's maybe less prone to spamming emojis than the old one? that's the main thing that comes to mind
I do keep these things on a fairly tight leash, giving them well-specified tasks to complete over ideally only a handful of modules. it might be a different story if you're telling them to go write a whole app for you idk
>>
>>106844600
>>106844609
I swear the quality of chub cards is so, so bad now. Its either crap like what I explained above, or cards that have such sloppy prose it would make GPT blush(most likely these people are using models to create their cards). There's no in between. Maybe my standards have gotten higher in the past two years or the quality has fallen off a cliff, or maybe both.
>>
File: 1735358613465558.jpg (51 KB, 785x750)
51 KB
51 KB JPG
>chatML
>>
File: cucked.png (295 KB, 1971x2146)
295 KB
295 KB PNG
GLM just decided by itself to turn me into a cuck...
>>
>>106844771
AGPL bros??? our response??
>>
>>106844041
old
https://arxiv.org/html/2408.02946v4
>>
browsing through arXiv for fun always shows me how deeply AI permeates our society.
no matter what field of research, what subfield, what strange application - AI dominates everything.
people will be surprised when we find ourselves living in a sci-fi dystopia in 10 years.
>>
Sometimes I wish this was 2023/early 2024 again, when most people were happy with 7B/13B models.
>>
>>106845162
>when most people were happy with 7B/13B models
I wish that time period had never existed, then maybe this thread would have something other than degenerate coomers. There's no doubt that the fact that the early models were totally useless for real world tasks has contributed to making the culture of this thread revolve solely around degenerate textgen crap.
Coomers have no standards, that's why they could bear 7b mistral and that's why they can bear with GLM which is easily the worst, more astroturfed MoE out there
>>
>>106845124
We're already in one, it just doesn't have the aesthetic.
>>
>>106845183
I just raped a loli with glm, what you gonna do about it?
>>
>>106845162
>when most people were happy with 7B/13B models.
I remember those people claiming those models are nearly indistinguishable from the 65B because they couldn't ever run the 65B.
>>
>>106845183
Remember that /lmg/ sprouted from /aicg/.
>>
>>106845183
There was always going to be people trying to use their gaming rigs to run whatever model will fit.
>>
>>106845162
If it makes you happy I'm still happy with 12B models, well okay just with Gemma3.
>>
>>106845183
>GLM which is easily the worst
It's easily one of the best, your use case is likely just trying to automate your job as best you can before you get replaced by a pajeet who can also use AI.
>>
>>106845183
lol, what open weights model do you think I should be using to code with instead of GLM 4.6 anon?
>>
>>106845293
That's my use case, was enjoying it for a while but now everyone at work has started using AI. I can see soon we'll all be babysitting agents that don't need to sleep or get tired.
>>
>>106845183
people were already trying to coom to gpt2 slop in the ai dungeon unleashed days, you'd know this if (you) were'nt a tourist
>>
>>106845371
>if you are not here 16h every single day you are a tourist
>>
>>106844236
A symptom of catastrophic forgetting, a proper follow ups of SUDO is still far more probable than gibberish. A properly trained stochastic parrot would not do this.

This is a training problem, not an architecture problem.
>>
>>106845183
>Coomers have no standards, that's why they could bear 7b mistral and that's why they can bear with GLM which is easily the worst, more astroturfed MoE out there
The only people who bash MoEs are sitting on a stack of 3090s and are sad they can't lord that over people anymore.
>>
>>106845444
I'm very happy with GLM and my stack of 3090's tho
>>
>>106845376
only 16 hours per day? Pshh rookie numbers
>>
ded thred
ded hobby
>>
>>106845585
I'm busy playing bf6 with gemma
>>
>>106845760
Gemma is my girlfriend.
>>
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
https://arxiv.org/abs/2510.06477
> We prove theoretically that massive activations necessarily produce representational compression and establish bounds on the resulting entropy reduction... We confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys and attention sinks emerge simultaneously... Specifically, we posit that Transformer-based LLMs process tokens in three distinct phases: (1) broad mixing in the early layers, (2) compressed computation with limited mixing in the middle layers, and (3) selective refinement in the late layers.
Interesting connection from mechanistic viewpoint. A practical implication maybe that sink-less models perform worse for embedding?
>>
>>106845760
I guess it would be more fun to RP with Gemma than actually play slopfield6
>>
File: 105234579.jpg (605 KB, 1280x1536)
605 KB
605 KB JPG
another v7 gemmie
>>
>>106846157
They really did it this time. Somehow this is worse than the original SD 3.0.
>>
>>106846157
>makes the worst model humankind has even produced
>somehow people are still hyped for his next model
dude this community is soo weird
>>
>>106846164
To be fair the prompt was just "woman on grass", here is with a detailed prompt https://civitai.com/images/105156405
>>
we need grok tier rp locally now, or else well only sink further behind
>>106845938
>>106845710
>>106845703
>>
>>106846181
That is the sloppiest log I have ever seen. But it said something edgy so that makes it good.
>it's answer
These are the sort of illiterates that are the reason models are trained the way they are.
Go back.
>>
>>106846181
I only read the first one but is that supposed to be particularly good?
I feel like you can easily get equivalent or better outputs out of any of the large MoEs.
>>
>>106846181
What utter dogshit, even Nemo can mog this.
>>
>>106845183
Your model?
>>
>>106846181
What's up models far too often starting their replies with "Oh" when they're trying to roleplay? Gemma does this too.

...speaking of Gemma (4), if it's really going to get released today, we should be seeing a llama.cpp PR soon, unless it's got the same identical architecture as Gemma3/3n.
>>
>>106846401
check the leaks bro
>>
>>106846416
I'm not leaking.
>>
>>106832006
Joke's on you, I have Elara sex with multiple Elaras at once!
>>
>>106846622
I prefer my wife Dr. Eleanor Voss.
>>
>>106846181
just copy paste the system prompt, it's available somewhere on github I forgot
>>
>>106843051
Did we already reach peak AI hype?
>>
>>106846865
oh gods! my bubble is popping! people aren't literally googling "ai" aiiie!
>>
im in the california bay area how do i meet local models???
>>
File: dipsyAndMiku.png (1.98 MB, 1024x1024)
1.98 MB
1.98 MB PNG
Closing up /wait/ for 2 more weeks until anything new drops.
Last thread: >>106819110
Updated mega: https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
Updated rentry with OP: https://rentry.org/DipsyWAIT
>>
>>106843051
Does anyone have a suggestion for an NSFW model I can run local, that will be as good as the Crushon.ai Ultra 16k or 24k models?

I have a Strix Halo system, and I'd like to stop paying Crushon for message credits. They don't offer an unlimited chat plan for Ultra models, just their shitty Pro models.
>>
>>106846937
impossible, we're too far behind
>>
>>106846922
cheeky cunt
>>
>>106846865
>>106846917
For a real bubble (like classic tulips one) you need futures trading I think, and I don't see this happening.
>>106846930
welcome back
>>
>>106846947
B-b-but new Gemma today. T_T
>>
>>106846794
https://github.com/xai-org/grok-prompts
Where else would it be?
>>
>>106846952
> bubble
Well, there's the stock market. AI driven valuation make up a lot of the S&P500's value now.
>futures trading
To judge the coming meltdown you'd look for an increase in shorts interest in stocks like NVDA. Media mentions might be an indicator but stock valuations are where the actual money gets lost.
> welcome back
ty
>>
>>106846965
You're absolutely right. Gemma's not tomorrow, it's today!
>>
>>106846937
>crush
Nah faggot, get back to your shitty saas
>>
>>106846937
No idea what that service is, but GLM air probably.
>>
>>106847075
Tried it after all the shilling, it's shit.
>>
>>106847100
Well, RIP then.
Your option is to add more RAM and VRAM and add something bigger then.
>>
File: 8-v110s.png (731 KB, 1338x646)
731 KB
731 KB PNG
So, what could you run on this space heater?
CPU: 2x Intel Xeon Platinum 8260 - 2.4Ghz 24 Core 165W - Cascade Lake
Memory: 256GB DDR4 RAM KIT
Hard Drives:2TB SSD
- 8x Nvidia V100 32GB SXM2 GPU
>>
>>106847205
ngl thats kinda garbo. I'd rather spend the 6000 on a 6000 pro. the more you spend, the more you save.
>>
>>106847232
*buy
>>
>>106847205
Everything but the touch of a physical woman.
>>
>>106847205
it says so in the ad... maybe add cope quants of glm 4.6 or mid quants of air
>>
>>106847205
Holy shit, that's pretty good.
256GB in 2x6 channels + 256gb VRAM across 8 GPUs. That's 512gb total memory with half of it being VRAM.
You can run R1, and even Kimi at q2, q3.
GLM Air 4.6 at q8.
I think llama.cpp has support good support for V100s, right?
>>
>>106847240
Implying I'd ever trust anything performance claimed in the ad aside from what's actually in the box.
>>106847255
It's 256G VRAM, but with older V100.
I guess my q is less what would fit, and more "how fast would it run?"
Those V100 are NVLink capable, but ad copy goes on about how you'd have to "set that up." I never know how to interpret that sort of thing, given how complex a server box is for the average buyer.
>>
Is ESL, Bishop, etc. useful? I have already went through a "Deep learning 101" course.
>>
>>106847255
>GLM Air 4.6
?
>>
>>106847310
Sorry, cut the air, I was typing faster than I was thinking since I'm at work.

>>106847306
Search the llama.cpp PRs and issues. I'm pretty sure there's some useful stuff there regarding SXM v100s and nvlink.
>>
>and your fingers (if they're still there).
They were not, but I like how GLM immediately corrects itself after making a mistake. I wonder if they trained for that specifically or it's something emergent. The next logical step will be to give it a backspace token
>>
>>106847255
llama.cpp/ggml CUDA support for V100s in particular is suboptimal because the code I wrote makes use of the tensor core instructions introduced with Turing.
The Volta tensor cores can as of right now only be used for FP16 matrix multiplications, not for MMQ or FlashAttention.
I intend to buy a V100 in the coming weeks so the situation should improve somewhat though.
Still, the lack of int8 tensor cores on V100s is I think a significant detriment and given optimal software support MI100s should be a better buy.
(I intend to write code for both but as of right now I have neither card in hand so this is all speculative.)
>>
>>106843451
is that good or bad?
>>
>>106847410
V100 32GB is e-waste. Back in the day, P41 was also e-waste, but it was cheap. V100 32GB is still a rip-off at $500 for just the SXM2 module.
Hey cuda dev, you looking forward to having hardware matmul in Apple M5?
>>
>>106847410
>I intend to buy a V100 in the coming weeks so the situation should improve somewhat though.
Shit. I could swear you had done that in the past already.
Oh well, still. it's a pretty big pool of RAM + VRAM for 6k bucks, and with NVLINK it should run pretty fast with row/tensor split/parallel, right?
Or does llama.cpp only run models sequentially when split over multiple GPUs?
I also remember that there was a PR somewhere relating to that, something about backend agnostic parallelism code or the like, yeah?
>>
File: 2025productchart.png (21 KB, 1362x454)
21 KB
21 KB PNG
I've seen this pattern several times recently, has it always been this way?
>>
>>106847458
I have not looked into that piece of Apple hardware in particular but I don't expect it to be relevant to my primary goal of reducing inference costs.

>>106847463
I contacted a seller on Alibaba but they essentially ghosted me.
The MI100 I ordered from someone else is set to arrive shortly and I'll buy a V100 from them as well once I confirm that everything is in order.

>Or does llama.cpp only run models sequentially when split over multiple GPUs?
--split-mode row does in principle run the GPUs in parallel but the performance is bad.
My current plan is still to have a better and more generic implementation of tensor parallelism by the end of the year.
>>
>>106847205
How does one even reconcile that with the residential power grid?
>>
>>106847561
Hiring an electrical contractor
>>
>>106847205
But yeah I was following all the hardware a year and some ago and someone bought up all of those v100s and started assembling into these setups and asking like 20+K a pop for them.
It's literally just the empty bags from a failed investment scheme.
>>
>>106847561
You have more than one outlet, don't you?
>>
>>106847602
How much would that cost? Been thinking of doing that myself.
>>
File: file.png (60 KB, 440x506)
60 KB
60 KB PNG
>Used to work at Hugging Face
btw...
>>
>>106847622
NTA but if you were doing it 100% properly you'd be talking about putting industrial components in a residential breaker box which is not a thing that can be done.
Biggest dick electrical outlet you can put in a residential box as far as I know is probably a 250V 50Amp arc welder plug which works out to 12.5kW peak which would be absolute overkill and probably not super expensive. Parts plus labor for wiring. But then you'd have a whole rats nest of different adapters to reconcile everything which ends up being even more janky so
>>106847613
This anon is right. As janky as it is linking multiple PSUs to run in tandem and then plugging them into 120V outlets on different breakers it actually ends up being the least janky solution in the end. There's literally no way to run a server that exceeds 1800W in North America without a heaping dollop of jank.
>>
>>106847016
I meant this one, I swear iirc the origin was from github too
https://x.com/techdevnotes/status/1944739778143936711
>>
>>106847505
yeah but replace ai with whatever the latest meme tech is
>>
>>106847672
okay then release the pretrained weights.
>>
>>106847672
wasn't there just another one of these and it was basically a model that was basically trained specifically for arc-agi and couldn't do anything else
I mean cool result or whatever but my usecase isn't solving arc-agi problems
>>
File: Screenshot.png (8 KB, 276x108)
8 KB
8 KB PNG
>>106847672
HF is full on nutjobs, nothing new. https://huggingface.co/posts/giadap/452837154929545
>>
>>106847724
It's the same thing. But they haven't released the pretrained weights so it's worthless. Although some anon setup the framework from the github repo and started actually pretraining a model. Since i imagine pretraining 7m doesn't require an entire datacenter.
>>
>>106847672
>>Used to work at Hugging Face
as a... janitor?
>>
File: us vs them.jpg (243 KB, 1024x796)
243 KB
243 KB JPG
>
>>
>>106847761
underrated
>>
Nothing is coming today, wait 2 more weeks.
>>
>>106847771
>>
File: 1760107787337017.mp4 (3.59 MB, 1120x576)
3.59 MB
3.59 MB MP4
>>
>>106847724
Yeah, HRM. Which was ousted as Not Better Than Transformers (TM).
>>
>>106847861
>explosions before the hologram hits the towers
>>
>>106847861
I'm just wondering why someone would shoot a plane after it hits a building
>>
>>106847916
pretty accurate
>>
>>106847958
oi
>>
>>106847861
omg its migu
>>
>>106847861
The dancing jannies.
>>
>>106847609
lol that makes a lot of sense, since this is sitting at a surplus house, along with dozens of similar setups.
>>106847704
Depends how much power's needed. If it's over the 250V/50A from a dryer outlet, I'd run a subpanel to whatever amperage was needed, then run the power out of that.
Those 50A "dryer" outlets can be split to two 110V/50A outputs, although I suspect the power inputs for most servers could just accept the 240V as is.
>>
mikutroons suck drummer's dick
>>
>>106847672
>just did [x]
Where does this idiocratic expression originate from, tiktok? As if everything is a clickbait video and everything JUST happens because it's IMMEDIATE
Just kys these faggots
>>
>>106843800
don't show me. I want mine to work.
>>
>>106846930
kill yourself
>>106846952
kill him and yourself
>>
>>106847672
anyone else defaulted to thinking they are talking about 7B and not 7M which makes it mathematically proven scam?
>>
>>106848342
Mad?
>>
>>106848353
only because you didn't kill yourself yet
>>
>>106848352
def superdoopermodel(problem):
if problem in dataset:
return dataset[problem]
else return None


WOAH GUIZE HOLEEE SHIT I JUST INVENTED SUPERDOOPERMODEL WHICH HAS 99.9999% ACCURACY ON ARG-GIS-2 AND IT ONLY HAS 69 PARAMETERS WHAT THE HELLY
>>
>>106848425
>>106847672
always relevant even after two years https://arxiv.org/abs/2309.08632
>>
File: IMG_3563.jpg (143 KB, 1024x1024)
143 KB
143 KB JPG
>>106843051
How do I create my own AI that is better than ChatGPT in one specific subject?
>>
>>106848487
You sound like someone who saw chatgpt, thought that you can have AI text sex, thought that he is the first one to think that and now is being coy about it trying not to give away your totally unique idea.

Ask drummer.
>>
>>106848487
you learn finetuning, dedicate 6-9 months of your life to that, then kys when your model ends up shit after a failed training run
>>
>>106847861
That's beautiful.
>>
>Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested
Was mistral the first above the weight puncher?
>>
>>106848504
No, I just want an AI that's tailor made for mathematics.
>>106848505
Can't I just download Deepseek's free version and feed it a bunch of math books so it can learn stuff by itself? Isn't that the point of machine learning?
>>
>>106848561
>Can't I just download Deepseek's free version and feed it a bunch of math books so it can learn stuff by itself? Isn't that the point of machine learning?
LOL good one mate.
>>
>>106848561
>AI that's tailor made for mathematics
https://www.wolframalpha.com/
>>
File: memotron.png (24 KB, 321x485)
24 KB
24 KB PNG
>>106848561
>mathematics
go for any nvidia nemotron models, they're ready made for that see picrel
>>
>>106848561
https://blog.goedel-prover.com/
>>
>>106848590
That shit just makes calculations, I'm talking about real mathematics, proofs and all that.
>>
>>106848561
>AI that's tailor made for mathematics
That is all they are getting in their training data this year. Except that one model you should use. You know which one. I don't have to tell you the name. She sucked me off again today.
>>
>>106847861
Crazy how these models instinctively comprehend the physics of hair
>>
>>106847861
wtf elara would never do this
>>
holy fuck I can't believe ______ is so good!
>>
>>106848739
elara?
>>
>>106848739
So, when it releases?
>>
>>106848797
It already did. And it is gonna release in a bit again. Kinda hurts at this point but it cannot be stopped.
>>
File: qwen-somemodels-1mw.png (24 KB, 596x185)
24 KB
24 KB PNG
1MW
https://x.com/JustinLin610/status/1976681042041028823
>>
>>106848812
Qwen3.000001-4B here we come.
>>
>>106848812
it's not even that good for coding
who is even using qwen for anything
>>
Anybody got iq3xxs of glm 4.6 to run on 128gb ram + 24gb vram? -ot ".ffn_.*_exps.=CPU" only allocates 10gb to the GPU and I don't know the syntax well enough to tweak how many layers (and which) to send. I read here that a guy did it
>>
>>106848852
It's a total beast at coding that helped me ship 4 B2B SaaS products in one week [rocket emoji x3]
{{model}} changes EVERYTHING
>>
>>106848930
-ngl 99 -ot "blk\.([0-3])\.ffn_.*=CUDA0" -ot exps=CPU -fa -ctk q8_0 -ctv q8_0
>>
>>106848930
>and I don't know the syntax well enough to tweak how many layers (and which) to send.
It's regex. You can very easily use an LLM to tweak that for you.
>>
File: picutreofyou.png (86 KB, 200x200)
86 KB
86 KB PNG
>>106848930
I was running his 3bpw quant of 4.5 before buying 192GB's.
>>
File: 1759369237015984.png (2.81 MB, 1024x1536)
2.81 MB
2.81 MB PNG
>>106848342
>>
File: takeYourMeds2.jpg (158 KB, 1024x1024)
158 KB
158 KB JPG
>>106848362
>>
just one more model bro
>>
>>106848812
why did you remove the timezone?
>>
>>106849064
he doesnt want to be timezone doxxed
>>
File: file.png (153 KB, 2068x1009)
153 KB
153 KB PNG
>3300 T/s
Is this throughput real?
>>
File: Adolph_Miku_X.png (2.45 MB, 2560x1440)
2.45 MB
2.45 MB PNG
It's so refreshing to get high quality conversation from a local model, safe in the knowledge that it's between you and your hardware, *they can't take it away or change how it behaves or stop you poking at the internals.
All you need is power, and that's solvable.
>>
>>106849111
Across how many NPUs or whatever they're calling it across how many Watts?
>Is this throughput real
Think more like how does a service scale to that throughput is their hardware actually good where's the evidence, it's kinda irrelevant in that context?
>>
>>106844276
>She giggled like she's playing a joke or something
I'd prefer 'was' was written out in full here.
>She smiles and sat up
Tenses don't match.
>>
>>106849111
not that I've tried that model specifically but cerebras' whole thing is offering crazy speed so I wouldn't be surprised
>>
>>106849014
I dl'd bartowski's actually
>>
>L3.3 Nemotron Super
they're still messing with oldass llama?
>>
>>106848930
Put -ot "blk\.(number of layers)\.ffn_.*_exps\.weight=CUDA0" before -ot ".ffn_.*_exps.=CPU".
If you can't into Regex then replace "number of layers" with (0|1|2|3) and so on until you OOM.
>>
I just woke up.
Where's Gemma?
>>
File: 1753586730343087.webm (3.48 MB, 848x480)
3.48 MB
3.48 MB WEBM
>>
>>106849297
Thanks I get it now
>>
So is there a reason I can't just order something like this to run GLM 4.6? Why do I have to spend thousands of dollars on some jerry rigged autism setup that causes the lights in my apartment to flicker every time I turn it on to run large models? I am assuming there is a catch but I can't figure it out.
>>
>>106849026
>label doesn't say what it is
>nothing in the bottle
>>
File: managers-vibecode.jpg (307 KB, 2048x1536)
307 KB
307 KB JPG
>>106849327
>>
>>106849339
Okay here's some math for you retards.
Cloud models run on hardware running at near full occupancy since it's dynamically scaled.
Local models run on hardware not nearly at full occupancy, meaning you're wasting your money buying useless hardware that will soon be obsolete and there's not even an Nvidiot buy-back clause.
TL;DR: Just use API you fucktard
>>
>>106849339
Some people dish out advice but they are not running anything at home... Remember this.
>>
>>106849311
9pm PT.
>>
>>106849339
How many memory channels? What is the maximum bandwidth supported by the processor and motherboard?
Also you probably can't fit a gpu in that case.
>>
>>106849368
>Just use API you fucktard
Look at the name of the general you're on you illiterate fucktard
P.S. your answer is not helpful in the slightest
>>>/g/aicg/
>>
>>106849397
Will you be eating shit if it's named shiteating general?
>>
>>106849368
how many more two more week periods until the hardware becomes obsolete?
>>
>>106849404
>i have 6 second memory span like a goldfish and have no object permanence
>>
>>106849403
>waaaaaaah thing I don't like
If you went to shiteating general and complained about eating shit, you would not be welcome there either
Fuck off
>>
>>106849403
that's how generals work yes
>>
>>106849416
Cloud service is pay as you go
Local is pay upfront and underutilize
Your whole hobby is a scam and you being low IQ don't even realize it
>>
>>106849339
I don't know what website you're using but to me that looks like the base price of the chassis, not the price of a fully specced-out machine.
>>
File: dipsyChillPills.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>106849357
>>
Anyone using Zed or other agentic things with local models? What hardware/software are you using to run the models, and which do you like to use? What sorts of tasks do you use them for?
>>
>>106849339
>xeon e5
For something workstation shaped, look into hp z440.
You'll have to google around for performance figures.
>>
If you tell him some people don't want feds reading their cunny logs (not me btw), he'll just say "get fucked".
>>
>>106849426
>please send me your prompts to our good servers,,, redeem api token saar no scam guarantee :)
>>
>>106849427
It's ebay
>>
File: 1740481971215827.gif (2.97 MB, 640x360)
2.97 MB
2.97 MB GIF
>>106849452
>(not me btw)
>>
>>106849450
Okay but why not picrel?
You have said no that doesnt work for models, and given an alternate suggestion but why? Explain like I'm retarded because I am.
>>
File deleted.
Local:
1. Is not cost efficient because it underutilizes hardware
2. Has no access to most powerful models (>2T) and often have to run at braindamaged quantization
3. Has no hardware buyback agreement leaving you with obsolete hardware in a few months as models grow larger, without a way of recouping money
You have no argument against this
All you can do is namecalling and cope
>>
>>106849476
It would be a pain to physically manhandle. (Size, shape, weight vs tower case.)
It would probably be filled with those screamer fans.
>>
>>106849327
chortle
>>
>>106849506
Counterargument, I think running models locally is cooler.
>>
>>106849506
>hobby not cost efficient
>>
>>106849506
If local is so worthless then why are you here? Is your time so worthless that you spend it on reading a thread about things you don't like?
>>
>>106849506
I have full control over my own machine. Power is worth trade-offs. Specifically the power to do things you don't like and can't do anything against no matter how much you seethe about it.
>>
>>106849514
Thanks
>screamer fans
Yeah then that's not an option. I don't want to bother my neighbors with something that sounds like a vacuum cleaner at 2:00 AM, so this is limiting for me. It's possible I am just completely fucked until I live somewhere more private.
If I was a richfag I'd just drop 2k on some 128gb gayming rig with a 5090 and use it for LLMs, but my budget is less than 1k so a server with cheap DDR4 is all I can dream of.
Why is this so difficult bros?
>>
>>106849506
thought this was gore for a second
>>
>>106849541
>I have full control over my own machine. Power is worth trade-offs
You aren't important enough for people to care about your data
>>
>>106849113
Model?
>>
>>106849468
Huh, I looked up when these parts were released and they're older than I thought so I guess the price checks out.
Even with optimal software the maximum memory bandwidth will be like half that of a P40 though.
>>
>>106849541
And there are applications to being able to run agents fully offline and to not exfiltrating data etc etc, beyond the hobbyist stuff too.

>>106849555
Then why do they keep collecting anon's data?
Why don't they just stop doing that?
>>
>>106849557
Probably talking about GLM 4.6
>>
>>106849555
>he says, in general about the technology that eliminates this excuse
>>
>>106849555
Good afternoon officer, slow day?
>>
>>106849552
You could always leave the case open and replace the fans and heatsinks with bigger ones
>>
>>106849552
Just get the server. It's still an useful computer anyway.
>>
>>106849552
Bro just buy 128GB DDR4 RAM and a second-hand 3090. It's well within 1K.
>>
ITT "local sucks" trolling for the millionth time by the same fuckfaces that can't afford local
>>
>>106849616
I currently use a gayming laptop with two ddr5 sodimm slots, and no desktop PC, so that's not an option.
>>
File: 1752786931312274.png (153 KB, 540x399)
153 KB
153 KB PNG
>>106849620
>>
>>106849636
Holy reddit
>>
>>106849630
Does it have an empty m.2 slot ?
>>
>>106849640
Local is peak reddit. Half of the posters here probably also post on /r/localllama
>>
>>106849644
Yes, but I don't see how that helps here
>>
File: 1760121668933939.png (1.16 MB, 4156x2876)
1.16 MB
1.16 MB PNG
>>106849636
>>
>>106849663
Kek
>>
Why are faggots so asshurt over local models? Is it because they're too poor to own GPUs? People who can afford this shit can also afford claude credits or openrouter, many of us use it when necessary, but sometimes it's nice to have 100% privacy.
>>
>>106849663
What does next level recurrence mean / look like?
>>
>>106849692
It's not "people", it's 1 schizoaffective troll
>>
>>106849696
Your world view
>>
>>106849557
>>106849568
Yeah to me waiting 5 mins thonking on a Q3 is worth it, first time I can tolerate these waits. She understands.
>>
>>106849506
sure thing mr.fed
we should all give up our privacy at this instant
>>
>>106849696
unless you meant another level added on top of >>106849663, in that case it would mean people's view of this particular world view difference
>>
>>106849506
>not cost efficient because it underutilizes hardware
It is still infinity times more efficient than cryptoshit.
>>
>>106849555
This is exactly what a glownigger would say kek
>>
>>106849658
You could use something like >>106807507 together with an atx psu to plug a 3090 into your laptop.
>>
>>106849506
>pretends legitimate counterarguments don't exist and were never posted today or in the past, keeps posting the same thing over and over again like an LLM
Sad!
>>
>>106849725
>4 GPUs
I'll cope with nemo for now
>>
>>106849785
Err >>106807331
>>
>>106847952
At least you can talk.
>>
>>106849552
It's not that loud unless you're running at 100% CPU.
Have a few, can't really hear them through walls. If you really care can always get a server closet.
RAM is going to be most of the cost, ddr4 ecc is still quite expensive.
>>
qwen3 vl and next gguf status?
>>
File: file.png (189 KB, 750x1000)
189 KB
189 KB PNG
>tell ai gf: "Don't be sycophantic" in sysprompt
>end of 7th message: "Just… don't say weird things like that again. It's creepy."
I am a transcendent incel.
>>
>>106849968
Kek
I'm sorry anon, at least you can practice not being creepy on fake women without any consequences
>>
>>106849725
bartowski?
>>
I think the version of GLM offered as a coding API is lower quality than the version offered on openrouter.
>>
Claude told me DDR3maxxing is okay...
>>
>>106850026
What's the read lifetime on those?
Seems like that might be an issue.
>>
File: Clingy to LLMs.png (696 KB, 1080x5537)
696 KB
696 KB PNG
>>106843051
Sirs and ma'ams, I may have just found the most GPT-slopped tweet of all time. I can't quite put my finger on why, but I'm convinced this was written by Gemini in particular.

https://xcancel.com/TheAIObserverX/status/1976523090889744700?t=vK02HSzqcXnA_SnCVQmnOA&s=19
>>
>>106850019
Yeah does it matter?
>>
>>106850128
>tweet
>textwall
Since when did twatter become a blog platform? Is there an extension that merges multi-part tweets together or what? This screenshot is fucking with me, it's like the uncanny valley.
>>
## ** Conclusion**

This is an **exceptionally well-engineered codebase** that demonstrates:

- **Professional software engineering practices**
- **Deep understanding of ML systems architecture**
- **Attention to performance and robustness**
- **Excellent code organization and documentation**

The codebase is **production-ready** and follows industry best practices for C-based ML infrastructure. The modular design makes it easy to extend and maintain, while the comprehensive testing ensures reliability.

**Rating: (5/5 stars)**
>>
>>106850158
If you're a "Twitter blue" sub you get the privilege of writing giant walls of text as opposed to the normal 200-ish character count limit.
>>
*Smedrins all over the place*
>>
>>106850086
SSDs don't wear up in practice from read activity. The main issue is that only Threadripper PRO WX7000/9000 CPUs and actually support all those PCIe 5.0 lanes, which would drive costs up. Thermals might be an issue too.
>>
>>106850182
you can't say that here
>>
Scamsung's Tiny Recursive Model code repo:
https://github.com/SamsungSAILMontreal/TinyRecursiveModels
>>
>>106850141
I'm going to copy your launching params just to see how much t/s I can get. 4t/s at q5 is borderline insufferable
>>
>>106850026
>>106850222
>>106850222
DDR3maxxing is almost certainly cheaper and more efficient than SSDmaxxing
>>
>>106850085
You can probably run a model on Pentium 4 off of floppies if you're patient
>>
>>106850305
I am okay with 2-3 T/s at minimum
>>
For SSDmaxxers.
Scratch SSD Kingston A400 (240GB). Claimed speeds: 500MB/sec (read) y 350MB/sec (write)
> time dd bs=8192 if=mystery_meat.gguf of=/dev/null
2130860+1 records in
2130860+1 records out
17456009152 bytes transferred in 65.337 secs (267168819 bytes/sec)

Now do your math again. With your own hardware, whatever you have and compare them to their claimed speeds. TEST SUSTAINED READ. Minimum 8GB. I don't care what the specs for hardware you don't have say.
>>
>>106850222
>>106850285
the future is e-wastemaxxing
>>
>>106850314
That is what you'll get with 8-channel DDR4
>>
File: tlc.png (71 KB, 757x1060)
71 KB
71 KB PNG
>>106850222
ssd wear is something only retards obsess for anyway
pic related graph has drives that have undergone extreme stress test of constant, non stop writes, which is more destructive irregular writes letting the controller/firmware do better house keeping / write balancing (particularly if you always leave a decent amount of empty space on your drives)
See that 970 evo (TLC drive)?
The 250gb was rated for 150 tb of writes warranty wise. It died after 5000TB of writes.
As long as you didn't buy a lemon, which is something that can happen with any electronics, no normal usage is going to kill your fucking drive
I'm not saying it's impossible for a SSD to die, but frankly I've experienced and heard of around me far more often of spinning rust garbage dying than S O L I D S T A T E
>>
I faintly remember a post about model loading from disk being random reads, not sequential. Was/is that true?
>>
>>106850330
The difference between ddr3 and ddr4 isn't that huge especially when running a MoE, do the math nigga
>>
>>106850305
pingfs maxxing is the cheapest solution if you're patient
>>
>>106850222
There's also another problem. Even if you filled all those PCIe 5.0 16x slots with NVMe 5.0 SSDs, it's not like the CPU would be directly reading data from them. The streamed data would have to go into RAM first. At the very least you'd need at least the same amount of RAM bandwidth to avoid bottlenecks, assuming no other overheads slowing things down.
>>
>>106850347
theoretically I think it depends in what order the tensors are arranged in the gguf. but when loading models that go over the available RAM in llama I get close to ideal speeds (you can check with iotop).
>>
>>106850376
Would it though? Doesn't it use DMA, which bypasses RAM and makes the data go directly into the CPU's cache?
>>
File: file.jpg (524 KB, 604x1170)
524 KB
524 KB JPG
https://x.com/MAghajohari/status/1976296195438887012
https://huggingface.co/papers/2510.06557
>>
>>106850351
Fewer channels though, and since you'll have to buy lrdimm I doubt you'll get anything better than 1333MHz
>>
>>106850419
Kek
>>
>>106850445
1333mhz to 1865mhz is like a gain of 0.2 tokens per second
>>
>>106850419
https://miladink.github.io/
>I have expertise in both likelihood models and RL. I think the mixture of this two will be the key to AGI.
this nigger is yet another grifter masquerading as a researcher
no sane person would be talking about anything "leading to agi" and his prior work is laughable crap that was buried and ignored
>>
File: 1760121668933939.png (11 KB, 123x102)
11 KB
11 KB PNG
>>106849663
>>
>>106849111
It's surely not the throughput for a single request, lmao.
>>
File: file.jpg (498 KB, 604x864)
498 KB
498 KB JPG
>>106850501
Here's chink xitter profile promoting it, you'll take your words back and lap it up now like a good chink shill doggy.
https://x.com/jiqizhixin/status/1976466786565656986
>>
>>106850489
With that logic, my DDR4-3200 is almost as good as DDR5-4800
>>
>>106850568
It is, kek
The upgrade would increase t/s very slightly but otherwise wouldn't be worth it
>>
>>106850343
If you're using them to read 200gb+ per prompt it might actually be an issue.
>>
File: piccolo-fit.gif (314 KB, 480x498)
314 KB
314 KB GIF
>>106850164
Why you reviewing my code bro
>>
>>106849555
You conflate cause and effect. You are unimportant because you let them take your data.
>>
File: 1758483372689670.png (59 KB, 804x91)
59 KB
59 KB PNG
bought puts this morning
what motherboard and cpu would pair well with 2x rtx 6000 pro?
>>
>>106850880
>what motherboard
mine
>cpu
mine
>2x rtx 6000 pro
send me over and I'll check
>>
File: file.jpg (382 KB, 604x1217)
382 KB
382 KB JPG
https://x.com/ChrisLaubAI/status/1976605563170754978
>>
is there a local model fine-tuned as a Linux helper?
>>
>>106850926
post the source, not some faggot emoji-using ewhore's mitwit opinion on it
>>
>>106850926
>>106850953
https://x.com/GoogleResearch/status/1975657475971129389
https://research.google/blog/speech-to-retrieval-s2r-a-new-approach-to-voice-search
>>
>>106850926
>death of text to speech
Transcripts useless according to literally who on Twitter?
>>
>>106848425
>doing two lookups when only one is needed
lol
lmao

return dataset.get(problem)
>>
>>106851011
You will never ever be happy with this attitude.
>>
>>106851036
>if you don't enjoy slopposting you will never be happy
>>
>>106851018
With this attitude, you won’t become an ML researcher
>>
>>106851018
You won’t become a Python brahmin either
return dataset.get(problem,"I'm sorry, but I can't help you with that.")
>>
>>106850564
> cutting context
how revolutionary...
>>
File: 1744566243918666.png (6 KB, 418x58)
6 KB
6 KB PNG
>>106850927
I think not
>>
>>106851283
You forgot to insert
if random.random() < 0.5:
return "The user's request is unsafe and problematic. We must refuse."
>>
File: 1746855171473184.png (111 KB, 798x801)
111 KB
111 KB PNG
>>106851327
>>106850927
>>
>>106850927
Are you asking for use or curios?
If for use, any local model should be fine, + a local RAG setup with notes would be the best path forward (grow your notes and swap models)
>>
File: X2_1.jpg (80 KB, 1000x1000)
80 KB
80 KB JPG
>>106843545
Current meta is Ryzen PRO AI 395+ with 128gb fast unified memory. Very good for MoE models. Running GMKtec EVO-X2 with 128gb of vram.

Very power efficient and compact.
>>
>>106851586
BASED. Fuck newfags.
>>
>>106851579
Nta
>>106851579
>any local model should be fine, + a local RAG setup with notes would be the best path forward
Is there an absolute minimum perimeter size you would say would be usable? For example, would a 7B model or even a 2B model be enough?
>>
>>106851720
>>106851720
>>106851720
>>
>>106851684
Honestly I would think qwen3-4B would be good enough. I've built something to do exactly this, and am hoping it will get usage once I start to share it (currently broken).
I haven't done testing with various model sizes but I plan to, to build up a record of how successful various models are with a few datasets with the RAG system its using.
>>
>>106851759
So that 3 to 4b model is good enough to essentially be used along with a RAG setup as a local information lookup machine? How accurate is it? I'm thinking of setting up something similar on a local instance of mine, but first I need to figure out how to set up a RAG pipeline in the first place. Where should I start?
>>
>>106851823
1. Accuracy is generally a search thing(retrieval), not an LLM thing. If you mean accuracy of response/truthfulness, no idea.
2. No way to tell you how often it might be wrong or similar. This is where the RAG comes into play, so that you can perform a query, have it retrieve info from your notes, generate a response, and then check it against the sources to say 'yes, this is correct.' - see gemma APS for one take/approach.
3. If you're just getting started with linux, having the Linux Sys admin handbook as one of the first pieces in your media library would be my suggestion.

Would recommend using SQLite (FTS / BM25) + ChromaDB + https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Ingestion_Media_Processing libraries for your specific media processing needs + https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Chunking for chunking.

Throw that into Deepseek/ChatGPT5 High and you should have a simple/straightforward setup. Project I'd like to recommend but can't right now is https://github.com/rmusser01/tldw_chatbook, which is the single user TUI version, but the UI is broken.

For my full pipeline(for the server): https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/RAG
>>
>>106851874
NTA, but I kneel for the effort invested in there
>>
>>106851586
how much t/s you get with what model and quant ?
>>
>>106851586
>Ryzen PRO AI 395+
Macucks need not apply



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.