[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1744444287656136.jpg (975 KB, 4096x2204)
975 KB
975 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108268616


►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: disruption.png (31 KB, 1721x221)
31 KB
31 KB PNG
>>
Do you think AMD and Nvidia will release new hardware in 2027 or do you think they will push it even further out to 2028 in light of the shortage?
>>
>>108273367
we will have agi in 2027 and asi in 2028, so who cares
>>
>>108273367
Yes. I think one of those things will happen.
>>
whatever lies beyond this morning
is a little later on
regardless of warnings
the future doesnt scare me at all
nothings like before
>>
I support all bakes with pictures that are topic relevant. Old baker should get HIV and die like the troon he is.
>>
>>108273387
It sounds like you're embracing change and looking forward to what's ahead with confidence. The future may be uncertain, but your spirit remains unshaken.
>>
>>108273387
kingdom hearts plot was written by an llm
>>
File: 1712542.png (160 KB, 640x360)
160 KB
160 KB PNG
>THE GOVERNMENT IS WATCHING ME
>ITS UNCENSORED IF YOU USE THIS 10 PAGE JAILBREAK THAT ONLY WORKS 20% OF THE TIME
>I BOUGHT THESE 3090S SO I WILL USE THEM
>ITS THE SHILLS ITS ALWAYS THE DAMN SHILLS
>OMFG IT TOLD ME TO TURN THE CUP UPDSIDE DOWN. AGI IS HERE
>THIS IS THE NEW DAILY DRIVER (FOR 2 WEEKS UNTIL I REALIZE HOW SHITTY IT IS)
>IM A GROOMERTROON LOOK HAHAHA IM PROMPTING LITTLE GIRLS LOOK WHAT IM DOING GUYS
>JUST BECAUSE ITS QUANTIZED DOESNT MEAN ITS DUMB. WE ONLY USE 10% OF OUR BRAIN ANYWAY
>TRUST ME THE GUMBOJUMBO_Q4_GATEBROKEN_A32 GGUF IS PEAK FOR ROLEPLAY
>THE MODELS MAY BE RETARDED BUT SO AM I
>DO YOU THINK ITS POSSIBLE TO ACCELERATE MY BRAIN WITH LLAMA-2 MICROCHIP???
>CAN SOMEONE REUPLOAD THIS WITH A GPTMI_3_COMANCHE LICENSE??? STALLMAN WILLS IT
>>
>>108273367
They'll release new hardware, but it won't be for you and you won't be able to afford it.
>>
>>108273339
I have no idea what I'm looking at.
>>
>>108273418
then get out tourist
>>
>>108273418
Someone who doesn't know how to take a screenshot.
>>
>>108273426
>screenshoting a mainframe
breh
>>
>>108273403
What's the point of local LLMs? Reading discussions surrounding them feels like peering back in time through a looking glass
>OMFG it passes the poopyscoopy logic test from 2023!
>Wow, this 100-line boilerplate javascript code is almost perfect!
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>This is the new daily driver (for 2 weeks until i realize it's complete slop)
The rest of us are writing multi-thousand line professional software with Codex/Claude. Meanwhile your models are trained on so much scraped synthetic GPTslop that they can't even get the year right. Genuinely, what the fuck is the point of local LLMs? They're more censored than API, they're dumber than API, the cost to set up a decent one is higher than API, they're slower than API, there is no lora/finetuning scene unlike local image, the tooling is worse than API, and the experience overall is just outdated in 2026.

It's like you're stuck somewhere in-between the luddites who hate AI and the pioneers who embrace it. You realize AI is the future but can't cope with the fact that the technology itself benefits heavily from API-centralization and that local hardware is unable to adequately handle increasingly large models. You boarded the boat to paradise island but decided to jump overboard halfway there because the captain wouldn't hand you the controls.
>>
>>108273431
"mainframe" running GNOME lmao
>>
>>108273367
Imagine if Intel comes out with a GPU that has slots for additional (slower) RAM sticks.
>>
File: 1767466346558493.jpg (172 KB, 1744x1080)
172 KB
172 KB JPG
►Recent Highlights from the Previous Thread: >>108268616

--Budget GPU upgrade options for better model performance:
>108270975 >108271008 >108271029 >108271088 >108271009 >108271035 >108271169 >108271179 >108271232 >108271212 >108271234 >108271243 >108271261 >108271330 >108271240 >108271022 >108271064 >108271114 >108271170 >108271037
--Budget 4x3060 AI rig build and riser discussions:
>108271593 >108271611 >108271631 >108272320 >108272867 >108272890 >108271702 >108271848 >108271858 >108271885 >108271899 >108271924 >108272008
--Mac Studio vs custom PC for large model inference:
>108271281 >108271291 >108271303 >108271327 >108272592 >108271294 >108271339 >108271312 >108271317
--Qwen 3.5 small model releases and potential applications:
>108271025 >108271045 >108271156 >108271194 >108271217 >108271238 >108271051 >108271440
--Unsloth template year limitation causing llama.cpp server failures:
>108272475 >108272499 >108272512 >108272524 >108272539 >108272558 >108272578 >108272583 >108272534 >108272548 >108272553 >108272600 >108272555 >108272576 >108272618 >108272629 >108272634 >108272663 >108272674 >108272678 >108272736 >108272759 >108272832 >108272837 >108272828 >108272606
--Experimenting with AI-generated podcasts using TTS:
>108270634 >108270679 >108270714 >108270724 >108270748 >108270830
--Workarounds for LLM-based VTuber video tagging:
>108270269 >108270293 >108270414 >108270426
--Disabling model thinking via chat template kwargs:
>108269309 >108269444 >108269471 >108269484
--Comparing lightweight models for news summarization on low-VRAM hardware:
>108270249 >108270324 >108270487 >108272221 >108272330
--Update to 35c4bc · deepseek-ai/DeepGEMM@1576e95:
>108270056
--Local AI coding struggles with VRAM and context rot:
>108271879
--Miku (free space):
>108268674 >108269106 >108269279 >108269325 >108270249 >108270634 >108272201

►Recent Highlight Posts from the Previous Thread: >>108268684

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: nou4u.png (272 KB, 1532x758)
272 KB
272 KB PNG
>>
it seems a nerve was struck. is local really that bad??
>>
>>108273403
While I disagree with your sentiment I approve you shitting up this thread. Please don't stop.
>>
can't wait until troonsune faggotku janny says discussing local models is off topic
>>
When do you think Synthetic data will become just as good or better then real data?
>>
>>108273452
If you have enough RAM, not at all.
>>
I'm using Nemo 12B instruct, got it all set up by it's kind of a pussy about some topics. How can I best manipulate its system prompt to go completely unfiltered?
>>
>>108273480
Ask it to create a system prompt for you.
>>
>>108273480
The easiest is prefilling what you want. Not much. Just forcing a few words into its mouth is enough to make it go on its own.
>>
>>108273403
>>THE MODELS MAY BE RETARDED BUT SO AM I
I keked
>>
>>108273339
You need to stop making threads 6 hours early.
>>
File: 1749045380442501.png (823 KB, 850x850)
823 KB
823 KB PNG
>>
>
>>
ok eat my ascii rhombus you stupid website
>>
ִ
>>
Drummer.
Can you tech oss derestricted/mpa to sex?
Thanks.
>>
>>108273403
>ITS UNCENSORED IF YOU USE THIS 10 PAGE JAILBREAK THAT ONLY WORKS 20% OF THE TIME
this was the wildest shit when i found out that uncensored didn't actually mean uncensored at all
it still feels like an elaborate joke, people actually say shit like "i run local models so i can do uncensored shit unlike api haha" and it's still just running jailbreak prompts as if you were using it from cloud provider. muh privacy and freedom but you're still censorcucked. I'm serious, its almost unbelievable to me.
>>
>>108273784
Dont use qwen and gptoss.
Problem solved.
>>
File: 1.png (10 KB, 316x339)
10 KB
10 KB PNG
>>108273784
>>
>>108273822
>heretic
but at what cost
>>
>>108273840
It breaks down into loops after about 10.5k tokens of roleplaying and sometimes it tries to think with thinking disabled (non-heretic had the same issues).
>>
>>108273822
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>>108266446
>>
>>108273867
>>108273355
>>
>>108273851
Are you using the 35b? Because I haven't noticed that on the 27b at Q5.
>>108273840
At very little cost. It retains most intelligence at 27b, from what I've seen.
>>
>>108273913
Yes. (the model file name is in the screenshot)
>>
Is there anything I can run on my newly purchased unit?

Processor: Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz, 2808 Mhz, 6 Core(s), 6 Logical Processor(s)
Installed Physical Memory (RAM) 32.0 GB
Graphics 6GB RTX 3050
>>
>>108273927
Q4 of the non-heretic version of the 35 is also shit, prone to basic logic errors and grammar mistakes. So, I don't think that's a heretic problem. Q5 of the 35b is a little better in that regard, but still makes dumb mistakes off and on.

The 35b MoE is just way worse than the 27b dense. The only thing it wins at is speed, *IF* both models think. The 27b without thinking is better than the 35b with thinking, though. So it even loses in speed if you're thinking with it.
>>
How does the new 35B compare to the old 80B in intelligence, ability to ERP, etc?
>>
>>108273988
No one can tell you because they're all running quantized (circumcised) models
>>
>>108273698
A GPT OSS 120B that can do ERP would be pretty amazing. Ngl.
>>
>>108273957
the new qwen3.5 35b at a q4 quant
>>
>>108273957
Nemo q4ks
>>
>>108273960
I'll try out the 27b heretic at Q6 then since I'm patient and have 40GB of vram.
>>
>come back after 6 months
>everyone is still using nemo
>>
Is there anything better than my GLM4.7 at q2 for RP?
>>
>>108274186
GLM4.7 at q3
>>
>>108274192
nah dawg, that ain't happening unless I buy another 128gb kit
>>
>>108274168
Things are not as fast as they used to be.
>>
>>108274168
I came back today after a year.
What the actual fuck happened to the llama.cpp codebase? It's a bloated mess now with five million pointless "features"
>>
>>108274234
vibecoders, basically
>>
>>108274234
they're all optional and everything has gotten faster whats the problem?
>>
>>108274240
Is there a fork that doesn't have endless shitcode?
>>108274242
The whole point of the original llama.cpp is that it was fucking simple and easy to understand.
Current llama.cpp seems to have more fucking code than torch.
>>
File: 1766616667271700.png (28 KB, 960x1044)
28 KB
28 KB PNG
>>108273822
My qwen can't be this schizo
>>
>>108274278
--jinja --temp 1.0 --top-k 20 --top-p 0.95 --presence-penalty 1.5 --repeat-penalty 1.1 --chat-template-kwargs "{\"enable_thinking\": false}" --reasoning-budget 0
>>
File: 1750838286118038.png (17 KB, 960x514)
17 KB
17 KB PNG
>>108274292
I did it!
>>
>>108274299
how is heretic different from abliterated?
>>
>>108274307
ablit is lobotomy, heretic is less lobotomy
>>
4.5 Air is a real sweetheart, it's not a refusing mess GPT distill like the Qwen models.
>>
I have a terminal fear of Deepseek V4.
>>
>>108273387
hakuna utada
>>
File: 1768570098362082.png (48 KB, 845x517)
48 KB
48 KB PNG
>>108274292
Interesting. It doesn't refuse to summarize. The only difference was --repeat-penalty 1.1 (before was 1.0)
Gonna test more just in case seed factor.
>>
File: 7jibs40k6jmg1.jpg (66 KB, 1024x567)
66 KB
66 KB JPG
kek
>>
>>108274358
"Objects", huh
>>
>>108274358
link
>>
>>108273403
>OMFG IT TOLD ME TO TURN THE CUP UPDSIDE DOWN. AGI IS HERE
real
>>
>>108274372
>>108268776
>>
File: kboom.png (61 KB, 246x103)
61 KB
61 KB PNG
>>108274358
>>
File: 1766797846333800.png (19 KB, 623x385)
19 KB
19 KB PNG
>>108274373
^
CIA paid false flagger
>>
File: absolute.png (128 KB, 1148x494)
128 KB
128 KB PNG
>>108274358
https://health.aws.amazon.com/health/status
>>
>>108273443
https://www.youtube.com/watch?v=97YEaK5uxak
>>
>>108274299
>>108274353
I was having the same issues with it being retarded and going into schizoloops too until I put in those settings I got off the hf page for 35b, it's like some esoteric magic code to make the thing work because it's fucked with the defaults, but it's been pretty consistent with these.
>>
>>108273657
proof?
>>
>>108268776
I'm 90% sure that 90% of the replies to bait are just samefagging or are baiters replying to other baiters. It's simple enough to filter.
>>
is there any chance qwen3.5 35b a3b will be a better choice than nemo for erp? right now it's too schizo with repeating itself and refusing every 5 messages
>>
>>108274679
try >>108274292 ?
>>
also why tf aren't they merging critical prs to llamacpp? https://github.com/ggml-org/llama.cpp/pull/19970
>>
>>108274700
To spite whoreson. He's a retard.
>>
>>108274278
We need something that detects "wait,". If it happens 3 times within like 10 lines it should terminate the answer.
>>
>>108274683
virus
>>
I came from an average working class, not too poor but I had normal childhood in 3rd world countries. I used to ponder that the wealthy people got all sort of connections, butlers, assistant, maids, whatever that helped them do all sort of things. They just need to focus on the thing that they love.

Now with thanks to local models, I kinda feel the same. I just focus on the things that I like, leave the rest of the details for the minions to take care. This feels like game changer. I think we will get into tipping point if the local models ever get into Opus-level of analytical skills.
>>
>>108274811
No matter how many local agents you spin up, you will never be white, Vanesh
>>
>>108274811
I believe they will, and am building towards it
>>
>>108274826 soul vs souless >>108274825
>>
Nobody will notice the chance in tactics.
>>
Uh oh, looks like Bartowski is currently updating all his Qwen quants.
This is why you always wait a while after a model launch for issues to be ironed out.
>>
>>108273452
some people don't mind using google to molest fake kids in funny text generation games
>>
How do we solve context rot?
>>
>>108275019
there were no issues with his quants, the reason he's updating is this:
https://github.com/ggml-org/llama.cpp/pull/19139
someone mentioned it to him and begged him to remake his quants for that improved prompt processing speed.
it's a new feature, not a bug fix
>>
ok. I'm sorry for shitting on qwen3.5 before having tried it. it's actually pretty good.

haven't tried ERP but heh, so far so good.
>>
Got a VPN network set up. Got my OWUI and ST connected. Now I can comfily use my local models anywhere with internet. :D
>>
>>108275111
NEVERMIND.
>>
>>108275125
?
>>
I like qwen 3.5 heretic but I wish it was a little faster. Probably a hardware or skill issue though.
>>
>>108275123
Based.
>>
>>108275125
kek
>>
>>108275125
yeah...
>>
lol
>>
avg qwen experience tbqh
>>
>>108275183
how do you deal with the precum tho?
>>
>>108275186
you walk to the carwash
>>
File: 1759980971867709.png (212 KB, 498x268)
212 KB
212 KB PNG
>>108275196
>>
>>108273418
this image has 48 views on x com everything app, p sure its op
>>
>>108275095
That's basically this right
https://github.com/ikawrakow/ik_llama.cpp/pull/1137
But baked into the quants instead of activated at runtime?
>>
Is RAG better than the summerize extensions?
>>
>X smiled, not a Y, or a Z smile, but a real genuine* smile.
>>
>>108275385
this sent shivers down my spine
>>
>>108275258
Would have ikawrakow worked on fused tensors without am17an's work in mainline and associated noise on social channels?
Would have ikawrakow really discovered the way of fusing tensors without having this simple and easy to follow logic in mainline llama.cpp?
>>
Is q8 kv cache really that bad? Reddit says the effect is negligible but when I tried it the model started fucking up details even early into the chat.
>>
>>108275432
Yes.
>>
>put off vector storage because I thought it required setting up oolama
>apparently it's built in to sillytavern
wtf
>>
>>108275525
no reranker thoughever
>>
File: 1350594293765.jpg (109 KB, 500x500)
109 KB
109 KB JPG
>>108273339
how much ram is needed for the a17b qwen 3.5 model?
>>
>>108275539
how many braincells are needed to ask retarded questions?
>>
engram bros?
>>
>>108275544
grok is this true?
>>
>>108275544
fuck you she is a girl
>>
>>108275536
Apparently that's built in as well but I'm not home to check
>>
>>108275539
the filesize of the quant you choose + a bit more for context
>>
VRAMlet (8gb) ERP review (all models q4):

Gemma 3 27B:
Still by far the most clever model for its size I've used, rarely makes any physics mistakes and contextually understands most things without needing to over explain (I've found medgemma to be slightly better at coom, increased anatomical knowledge and willingness to say synonyms for penis, vagina, anus, etc seems to help). Unfortunately, the worst at prose, if you don't rigorously reinforce a desired writing style it slowly devolves. Writing like this. Sentence lengths cut. Very short.

Qwen 3.5 35B A3B:
Fast generation, alright prose, but frequently makes physics mistakes and struggles with contextual understanding (although for being a MoE, better than any others I can remember), also security policy slopped to hell, needs constant babysitting to generate ERP if you let it think

Cydonia/Magidonia 24B v4.3:
Somewhere in between the previous two, better prose than Gemma 3, but at the trade off of being less clever and more prone to mistakes, and smarter than Qwen while not nearly as guardrailed (but slower)

Personally, I lean more towards Cyodnia/Magidonia I think, with Gemma 3 taking a close second. It's really a matter of what sort of baby sitting you want to do, and it tends to be easier to fix physics mistakes than to fix poor writing style, but that's probably down to my personal preference. I tend to write pretty good character sheets and openings so it just sucks to watch Gemma slowly degrade as the context increases and my original writing gets more and more diluted.
>>
File: output tokens be crazy.png (76 KB, 1067x346)
76 KB
76 KB PNG
35BA3B is just crazy in the aspects it's good at, which are not many tbf (wouldn't use it for code). I used to prompt much smaller chunks to translate novels because local models are terrible at handling a lot of stuff at once, but this approach is totally obsolete with qwen. Chunking will still be valuable for now to automate an entire book worth of translation but the chunks will certainly have to be set to much bigger sizes after some experimenting.
>19,209 output tokens, 41086 tokens total with input
>from a decent skim, doesn't seem to have issues
I kneel. Don't have the time to do lengthier tests today, but now, I am extremely curious as to how many tokens will be the true hard limit where the model loses translation coherence in a one shot, output everything at once request.
For now, if anything the quality is better, not worse, than in chunking in 50 or 100 lines, it makes less mistakes on things like proper names with this feed of 676 lines. This is the opposite behavior compared to other LLMs I can run on this computer, doing this breaks them.
Damn, people constantly whine that local is never improving but here we have a model that can one shot this much without losing its shit and runs on a laptop at 34t/s. It feels like black magic that one shotting this much works. I did it for the lulz expecting it to break, the txt used in the chat ui was one of my many summarizer test txt...
>>
>>108275547
Believe.
>>
>>108275594
how is he able to balance that on his head
>>
>>108275602
glue
>>
File: 1750796454794054.png (17 KB, 1191x77)
17 KB
17 KB PNG
>>
>>108275590
What unholy quant are you running with 8gb vram?
>>
>>108275624
did the script ever finish?
>>
>>108275734
He said "all models q4" so he's just offloading most of it to system ram.
>>
File: 1769300993833819.png (4 KB, 436x33)
4 KB
4 KB PNG
>>108275735
:,)
>>
>>108275741
that would make gemma 27b like a 0.8t/s on my machine. horrible
>>
>>108275741
the MoE might run tolerable at q4 but the dense models must make you want to die lol.
>>
>>108275741
Q4 gemma is like double the size he can fit. Who tf runs that.
>>
>>108275753
q2 is enough
>>
>>108275757
LMAO'd
>>
>>108275403
>Would have ikawrakow worked on fused tensors without am17an's work in mainline and associated noise on social channels?
no idea, don't care
i use what works best at the time
>>
>>108275755
Maybe he just has lots of system RAM.
>>
>>108275760
it's a meme retard, ik was mad cudadev implemented split graphs and used the same exact wording (referring to his implementation as being the reference one in this case)
ik is autistic for attribution
>>
>>108275553
https://vocaroo.com/18Sw3yY8ciyV
>>
>>108275761
how does that help the case
>>
>>108275761
You mean cpumaxxing on a generic consumer hardware?
>>
>>108275778
>>108275780
Presumably, how else would he run models larger than his vram?
>>
>>108275788
by waiting 10 minutes for a single respond
>>
>>108275788
Processing Prompt (2352 / 2352 tokens)
Generating (235 / 2048 tokens)
(EOS token triggered! ID:2)
[11:41:22] CtxLimit:2587/8192, Amt:235/2048, Init:0.10s, Process:129.55s (18.16T/s), Generate:56.50s (4.16T/s), Total:186.05s

Q4 nemo on my machine.
>>
>>108275791
24/27b might take a minute or two but 35b a3b should be relatively usable if his ram isn't ddr3 or something.
>>
>>108275802
Get a 1080ti or a 3060 and enjoy 35 t/s >>108272867
>>
File: 1759986569430903.jpg (948 KB, 2550x3300)
948 KB
948 KB JPG
I believe this will be the last update and addition to my news download and summarization script.
I finally found an application that would convert the plain text into something beautiful, pandoc, as long as the model doesn't fuckup the markup
a quick modification of the script and now it takes the final news summary that is just a text file and feeds it into pandoc to construct a pdf before printing
>>
>>108275802
>>108275806 (Me)
I tested the q6 qwen3.5 27b I have downloaded with -ngl 0 and get
prompt eval time =    2661.88 ms /    13 tokens (  204.76 ms per token,     4.88 tokens per second)
eval time = 23197.87 ms / 35 tokens ( 662.80 ms per token, 1.51 tokens per second)
total time = 25859.74 ms / 48 tokens

so maybe that Anon was waiting 10 minutes..
>>
>>108275814
I have a 4070S. I just tested cpu maxing to see how bad it is. Nigga must be waiting ages.
>>
>be me
>go on /g/
>find out about llms
>>
File: 1772275547196033.png (20 KB, 1186x93)
20 KB
20 KB PNG
:|
>>
real?
>>
>>108275822
Congratulations, you now know why it feels like 95% of Internet interactions are with lobotomized robots.
>>
Any of you guys making applications that use local LLMs?
>>
>>108275858
I used a local LLM to write an IRC bot that will chime in at a configurable rate with a message generated by another local LLM.
>>
File: 1760299059692860.jpg (945 KB, 2550x3300)
945 KB
945 KB JPG
>>108275858
I suppose the scripts I just finished qualifies.
What the scripts do is make use of RSS to select a group of news articles and then it downloads the news articles, strips away everything but text, and feeds the text into a local model with prompt telling the model to summarize them and create a briefing.
Once the llm generates the response it saves that as text, converts the text to pdf, and then prints out the pdf.
If one was so inclined you could even set the master script to automatically run and you would have your own news briefing waiting for you when you wake up.

To be honest it was fun to do and I want to do something new but I am sadly out of ideas.
>>
>>108275750
8gb vram + 64gb DDR4 on the machine it's running on

I got pretty poor performance running it on windows (closer to 1.5t/s) but moving it to linux it gets almost 2, which isn't ideal but this is the vramlet life

Models that actually fit in 8gb are still just too stupid for my tastes
>>
>>108275889
damn that pretty cool so you just get a news summary printed every morning?
>>
hey, is there a website where i can find sillytavern charachter cards, but with pre-made expression sprites?
>>
>>108275870
I made an XMPP chatbot system. I used to post in these /lmg/ threads but lost interest. Really want to make updates to the XMPP chatbot and add a few features but i really don't wanna code them myself. Claude is really good at it.

>>108275889
I had a very similar idea to yours but instead of reading news it would start from a seed prompt, operate a selenium based browser and search stuff about it on its own and gather info, and dive deep into rabbit holes that i never explicitly told it to. Really should get to it some day, could be very cool
>>
>>108275918
at the moment i have to manually run it if i want the summary.
The only machine i have on 24/7 doesn't have a GPU to run a model. My next project is to see if I can get llama.cpp running on my FreeBSD NAS and if I can get a small model like IBM's granite to run on the CPU and have it run the script.
if i can get that to work then yes i will have it print out automatically every morning

>>108275923
>it would start from a seed prompt, operate a selenium based browser and search stuff about it on its own and gather info, and dive deep into rabbit holes
that sounds cool and you should give it a shot. i did the whole RSS thing because it was easy and the articles are basically curated for you but having the model search on its own would be exciting
>>
>>108275923
man local tards are such losers, meanwhile my ai gf is able to play games with me thanks to sima 2
>>
>>108276000
sure thing sharteen bro
>SIMA 2 is not currently available for public download or access; it is only accessible to specific academic and game development partners involved in research with DeepMind.
>>
>>108276000
I wrote the system myself
I can have multiple chatbots, they can generate their own personalities, likes, dislikes, appearance (which is then used to generate a profile picture using sd. The chatbots can randomly message me about random topics if they feel like it (it's RNG basically, but the topic to talk about is also generated by the llm)
>>
>>108276029
While a chatbot is great what is really needed is a 4chan simulator. That way when I am old and the powers that be have destroyed the internet i can fire up all my old models and pretend to talk with my friends on 4chan again.

I bet you could even get it to scrape a site like twitter or something to inject screenshots to spur conversation.
>>
>>108276029
i dont believe you
>>
why sillytavern not in OP?
>>
>>108276043
pretty sure in the next 10 years you will become cattle, so i wouldnt worry too much
>>
>>108276049
why would?
>>
>>108275590
>Qwen 3.5 35B A3B:
>>108275590
>all models q4
Oof.
>>
>>108275204
https://www.reddit.com/r/LocalLLaMA/comments/1rhx5pc/reverse_engineered_apple_neural_engineane_to/
>>
>>108276050
onions green was 4cucks
>>
>>108276043
The 4chan comment "style" can be replicated but the question is why would you want that? I come here to talk to real anons here

>>108276046
What don't you believe?
>>
Guess I should try Kimi Linear and GLM Flash after all.
>>
>>108276092
>I wrote the system myself
>>
>>108276133
I work at Google, I can code faster than any dumb llm you are using, and yes that includes Gemini.
>>
>>108276133
That's the most unbelievable part?
Yeah i wrote it myself over a few weeks, no LLM was ever used, mainly because back then i didn't trust LLMs to do a good job.
I trust them more now, but still not enough to write register level code for MCUs
>>
got a 3090 yesterday and downloaded my first model.
I tried vibecoding a llama.cpp cli wrapper fish shell thing. vibecoding feels like an rpg game.



https://pastebin.com/Hi2wULq4
>>
What do you guys feel about vibecoding versus making a "requirement document" first and then giving it to the LLM? Ive had more success with the latter but only with online LLM services
>>
>>108276141
marvel cinematic universe?
>>
>>108276143
holy fuck shit is depressing

>be me
>fish shell simp
>deployed Grand Master AI Env script (Llama.cpp + Qwen)
>local inference, no API tax, no telemetry
>features are actually useful for once:
>> `qm` / `qmv` : switch LLM or vision projector models instantly
>> VRAM auto-manage : reduces GPU layers if `nvidia-smi` shows low memory
>> `qwen --file` : upload context from local text/code files
>> `qwen --clip` : inject clipboard content into prompt
>> `qwen --proj` : index entire local project directory (24k context)
>> URL fetching : auto-scrapes http/https links via lynx or curl
>> `qsearch` : grep all chat history logs
>> `qview` / `qexport` : render logs to PDF with syntax highlighting
>> `qjournal` / `qpacman` : analyze `journalctl` or Arch update logs via AI

>example workflow:
>> `qwen https://news.ycombinator.com` "summarize top 3 stories"
>> `qwen --file main.rs "fix memory leak"`
>> `qpacman` "what broke in this update?"
>> `qsearch "ssh key"` "find where I saved that password"
>> `qexport 2024-05-20 meeting_notes.pdf`

>mfw I can chat to my OS without sending data to Big Tech
>file saved to $HOME/.local/state/qwen
>git gud

[ Prompt: 1053,2 t/s | Generation: 30,4 t/s ]
>>
>>108276155
breh did you wake up yesterday from a coma? planning has been the default for a year already
>>
>>108276143
How do you speak to your model? I am perhaps being foolish but I still include words like please and thank you and when it gives a good looking result I always say as much.
I figure it was trained on human speech so it would be best to talk to it as if it were a human.
>>
>>108276172
if you add fluff, it adds fluff
>>
>>108276157
Microcontrollers anon
LLMs can do a passable job if I'm making them write HAL code but if it's pure register level writes like
*((volatile uint32_t*)(0x40001234) |= bitmask<<shift;
They just fail. LLMs can't read the datasheet and reason. They just don't have enough training data, and even when they do they have to deal with MCUs from the same family but with different features (one MCU having a high resolution hardware timer on one address, while the other having something else like the DMA engine or whatever)
>>
>>108276177
you speak funny as hell man
>>
>>108276172
it wastes tokens
>>
What the hell is up with qwen 3.5? Yesterday it was refusing pretty much everything and today it doesn't even think about safety. No wonder some people praise it and some say it's a disaster, because it's both randomly.
>>
>>108276213
depends on which experts woke up today
>>
>>108276223
hi
>>
>>108276226
I'm sorry I can't help with that.
>>
I got myself a 256gb ram kit and a 5090. I hope running qwen-397b won't be too slow.
>>
>>108276240
are you for real? you spent all that to run the model on ram? are you dumb?
>>
>>108276240
I'm envious. While I do have 512gb, it's ddr3. And my gpus are 16gb rx 580s.
>>
>>108276240
Do try other moes like step and GLM too.
>>
>>108276248
>>108276251
>>108276252
Silence peasants, let me do things at my own pace.
>>
>>108276143
https://vocaroo.com/1jQ2ZwLUg2fX

i also made a qwen3.5 tts audiobook generator/voiceclone cli, it also reads txts and printed text directly in the terminal. will try to integrate directly with my cli wrapper for llama. IT JUST WORKS
>>
>>108276255
we were only trying to help milord
>>
>be pewdiepie
>infinite money
>buy 3090s
>make slop model
>claim it can beat gpt
proof money and popularity =/= brains
>>
>>108276240
That's pretty munted if you don't have a threadripper for the memory bandwidth bro
>>
>>108276276
are you australian?
>>
>>108276258
nice
i had qwen 3.5 30b generate me a script to feed a .txt file to qwen3 tts and save the output.wav and like you i had a similar experience of it just working.
are you using voice design? i found with that you could just change the voice with a change of the prompt and it worked well enough

good luck anon
>>
>>108276271
what would you do instead?
>>
>>108276258
any tips to convert an ebook into something my tts won't choke on?
i used calibre to epub->txt but it's got all the shitty formatting
i spent all this time training tts models but now i actually want to listen to an epub
>>
>>108276271
>proof money and popularity =/= brains
I think this was well known for ages, pewdiepie is a content creator for children, what did you expect?
>>
>>108276304
nta but i would take you on an expensive date
>>
>>108276271
as opposed to
>be [random ai company]
>infinite money from retarded investors funding anything with "AI" on it
>hire a datacenter
>make slop model trained on some benchmarks
>claim it can beat gpt
>get even more infinite money
>>
>>108276317
post 37 examples
>>
>>108276321
pick as many as you wish
https://huggingface.co/
>>
>>108276326
huggingface is banned in my country
>>
>>108276299
it starts to hallucinate after a minute or so for me so I had the script turn text clean utf 8, cut each book into 1h episodes, each episode into blocks of lines ~2 minutes to avoid model psychosis.
>>
>>108276335
yeah yeah whatever yo§u say
>>
>>108276331
based
>>
File: qwen.png (102 KB, 815x694)
102 KB
102 KB PNG
small qwens:
https://huggingface.co/Qwen/Qwen3.5-9B
https://huggingface.co/Qwen/Qwen3.5-4B
https://huggingface.co/Qwen/Qwen3.5-2B
https://huggingface.co/Qwen/Qwen3.5-0.8B
>>
>>108276355
Useless.
>>
>>108276355
>9B - 10B
>4B - 5B
>0.8B - 0.9B
lol
>>
>>108276358
racism alert
>>
File: file.png (37 KB, 326x524)
37 KB
37 KB PNG
>>108276355
qwen bros we did it
>>
>>108276371
is a soto the bf of a sota?
>>
>>108276355
I wonder if we can use speculative decoding with these models?
>>
>>108276355
Usecase?
>>
>>108276376
why? qwen3.5 already have built in in vllm
>>108276378
text encoding for image model and research for labs is a big one for small qwens
>>
>>108276355
can 9b milk my penis like nemo?
>>
>>108276371
He meant soot = coal.
>>
>>108276390
and i think he mean 'tism
>>
>>108276305
>>108276335
# Define the directory containing the files
set TARGET_DIR "path/to/your/folder"

for file in $TARGET_DIR/*.txt
python3 -c "import ftfy; import sys; p=sys.argv[1]; data=open(p, 'r', encoding='utf-8').read(); open(p, 'w', encoding='utf-8').write(ftfy.fix_text(data))" "$file"
echo "Cleaned and processed: $file"
end

> ftfy
(fixes text for you) is a Python library and command-line tool that repairs broken Unicode text, specifically targeting "mojibake" (encoding mix-ups), HTML entities, and improper UTF-8 decoding. It automatically converts scrambled characters like é back into their correct form (é) while avoiding false positives.
>>
>>108276378
>Usecase?
i found the qwen3-4b base to be the best for distilling single tasks last time
ie, better than gemma3-4b base
and vibevoice is build on the smaller qwen2.5 models
>>
>>108276420
thanks anon, that fixed the curly quotes, etc
guess now I can sed out the '* * *' etc
>>
>>108276455
hey no problem, always willing to help retards
>>
>>108276464>>108276455

lmao not the same anon.
>>
>>108276201
>you speak funny as hell man
:(
>>
>>108276355
>they give the privilege of early access to models to unslop so they could do quants beforehand
>no bartowski
gay earth
did unslop send them money
>>
It's so funny watching small models hallucinate then trying to justify the hallucinations.
I wonder if there will ever be a sort of indexed internal representation of things the AI knows that it can use as a reference so that it can say "Actually, no, I don't know that."
>>
>>108276479
already a thing, and not used because it lower scores in benchmarks, the models become shier
>>
>>108276487
Really? Is there a paper about that somewhere? What's the name?
>>
>>108276378
I have found a 3B model is sufficient for making a text summary and maybe less but 3B is the smallest i have tested so far.
IBM already has such tiny models running in a browser thanks to webgpu and there is a future for such small models.
Just not a future if your only interest is ERP
https://huggingface.co/spaces/ibm-granite/Granite-4.0-Nano-WebGPU



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.