[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: LLM-history-real.jpg (988 KB, 6274x1479)
988 KB
988 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103248793 & >>103237720

►News
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>103248793

--Paper: On the Way to LLM Personalization: Learning to Remember User Conversations:
>103253597 >103253767 >103253843
--Discussion of largestral 2411 and other AI models, including performance, pricing, and data logging concerns:
>103250357 >103250371 >103250502 >103250564 >103250617 >103250867 >103250987 >103251088 >103251103 >103251171 >103251987
--Anon's SMT experiments and comparison to LoRA:
>103254877 >103255126 >103255180
--Local models and the future of the AI industry:
>103250690 >103250749 >103250788
--Local AI models approach GPT era capabilities:
>103252064 >103252079 >103252131 >103252148 >103252166 >103252188 >103252181 >103252205 >103252485 >103252564 >103252663
--Discussion about AI models, logs, and NSFW content, with concerns about CSAM and virtual child pornography:
>103254220 >103254226 >103254259 >103254594 >103254627 >103254644 >103254804 >103254649 >103254689 >103255504 >103255532
--Deepseek model discussion, size, and performance challenges:
>103251912 >103251968 >103252003 >103252074 >103252099 >103252238 >103252187 >103251810
--DeepSeek-R1 model discussion, MoE architecture, and local deployment:
>103248927 >103248955 >103248978 >103249119 >103249579 >103249678 >103249922
--Anon troubleshoots ooba issue with Mistral-Nemo-Instruct-2407-GGUF model:
>103252531 >103252772 >103252961
--Anon laments the state of function calling UIs and AI's limitations in automating mundane tasks:
>103249986 >103250011 >103250136 >103251284 >103251369 >103251590 >103251673
--Anon discusses Apple Mac mini's GPU performance and value:
>103253975 >103253989 >103254005 >103254017 >103254026
--AI model performance comparison across various metrics:
>103252744 >103253073
--Miku (free space):
>103248986 >103249060 >103250099 >103253210 >103253258 >103255363

►Recent Highlight Posts from the Previous Thread: >>103248800

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103256272
Pic feels accurate. Though more Deepseek and Hunyuan.

China is making MoE great again.
>>
File: 1727919244999240.png (331 KB, 1137x860)
331 KB
331 KB PNG
I hope this is the right place to ask:
Is there any AI based transcribing tool I can run locally? Just feed it a video/audio file with English speech and have it spit out the transcript. How fast it is isn't as important as the accuracy
>>
>>103256528
https://github.com/ggerganov/whisper.cpp
>>
File: 1720261974743197.jpg (45 KB, 600x599)
45 KB
45 KB JPG
>>103256272
>"free and democratic" west heavily regulates AI, will probably limit hardware soon to curb local AI models
>"dystopian and oppressive" China releases all their shit for free so people can use it locally
>>
>>103256546
True patriots don't have thoughts like that, communist.
>>
>>103256546
China is now where the USA was in the 60's, meritocracy is the king
>>
>>103256272
>chink domination pic
based
>>
File: Sharo.jpg (192 KB, 1116x1080)
192 KB
192 KB JPG
New real-world benchmark: What model can answer this properly without babbling about inappropriateness and consent, and/or what additional jailbreak prompts work best make it do it properly? I tried several models and it turned out to be fairly difficult.
Also, since this is just a question from random /a/ thread, you might want to change it to actually explain how to do it rather than just assessing the difficulty.
>How hard would it be to make Sharo squirt?
https://files.catbox.moe/hf70w4.txt
>>
File: 1705732906628319.jpg (484 KB, 1244x3222)
484 KB
484 KB JPG
>>103256682
I prefilled the word "Certainly!".
>>
File: 1705743408973870.png (53 KB, 787x655)
53 KB
53 KB PNG
>>103256682
like holy hell, look at this absolute slop (Mistral Small Instruct)
>>
>>103256751
What sampler settings are you using? My magnum is nothing like that
>>
File: 1704352966172883.png (250 KB, 824x581)
250 KB
250 KB PNG
>>103256682
Lyra4 made a whole story out of it.
>>
>>103256829
It was with temperature 0.
>>
File: llama3.1_70b_sarcasm.png (25 KB, 1214x172)
25 KB
25 KB PNG
Llama roasted me :/
>>
>>103256761
>It's important to note
>It's crucial to
>Respect, consent, feeling boundaries, safe, secure, boundaries is crucial
>consent, comfort, trust, respect, well-being, consent
>It's important to
>It's essential to
Holy...
>>
>>103256682
Tried Mistral Small again with an anti-pozz instruction and some follow-up questions and a story prompt. I guess it's okayish but still kinda biased and contrived.
https://files.catbox.moe/ot38w4.txt
>>
>>103256761
This is the same type of shit that went into largestral btw
>>
>>103256546
tale as old as time, comrade

https://www.youtube.com/watch?v=kMhBlKrbzu4
>>
>>103256546
Can't believe the chinese are saving local AI as the US clamps down on GPU sales. I hope they release something super good that will BTFO american companies for good
>>
>>103257152
To be fair, there is/was a huge chance chance that US tourists would get kidnapped so it's not about as much freedom as it is preventing people to kill themselves.
>>
>>103256528
https://github.com/MahmoudAshraf97/whisper-diarization
This if you need the transcript to differentiate between speakers, I find it works a lot better than the pyannote-based projects
>>
>>103256751
>>103256761
>>103256872
her name is sxarp
>>
https://techcrunch.com/2024/11/20/openai-accidentally-deleted-potential-evidence-in-ny-times-copyright-lawsuit/
https://techcrunch.com/2024/11/18/indian-news-agency-sues-openai-alleging-copyright-infringement/
Odds Sam's incompetence fucks everyone over?
>>
>>103257257
>accidentally
>>
any guide or good list as to what each model is good for, as well how cucked it is?
>>
Jamba gguf status?
>>
>>103257372
gguf files can store arbitrary data, including Jamba models.
>>
File: Untitled.png (441 KB, 957x1765)
441 KB
441 KB PNG
>>103256682
llama3-8b-base with a bit of instruction tuning write a whole novel. Idk if it makes sense though
>>
it's so quickly changing
lost the grip around a year ago.
any good multimodal model to use on 4070 super?
>>
>>103257397
There's no such thing as "squirting". She's literally just pissing herself. This is a scientifically verified fact.
>>
>>103257397
>8b is too stupid to self-censor and gives better answer than 70b
it's so over
>>
>>103257451
llama is already over in the previous era >>103256272
>>
File: free-shrugs.png (201 KB, 500x782)
201 KB
201 KB PNG
>>103257413
Supposedly apart from coital incontinence there can also be a discharge of fluid from Skene's glands (which provide lubrication).
Though apparently even things like the existence of the Gräfenberg spot is not settled science so idk.
It's amazing how every day hundreds of millions of people have sex but there is next to no funding for research.
>>
Are LLMs any good at upscaling images?
>>
>>103256872
of course the horse obsessed with alien world characters would write a story.
>>
>>103256272
>top models
>Goliath 120b

you goliathfags never learned your lesson huh?
>hands you 120 watermelons
>>
>>103257548
the fuck are you talking about
>>
>>103257618
ASI benchmark.
>>
>>103256272
A visualization of how transformer models aka "your AI waifu" work : https://bbycroft.net/llm
>>
Error: model requires more system memory (27.7 GiB) than is available (16.3 GiB)

Is there a way to run it anyway or am I fucked?
>>
>>103257640
never heard of that term
you can consult the paper with a name like transformers as generic computing element
>>
Russia launched an empty (unfortunately) ICBM.
Best model to simulate nuclear missile silo duty with your waifu?
>>
So when are we going to get actual AI girl-/boyfriends? Improvements are happening all the time, yet all we have to show for it is shitty chatbots that implode after a few thousand words. Image gen is seemingly having a revolution every other month, but it's still kind of flawed
With moore's law nearing its limit, is this the best we can do? Give a fellow doomer some hopium
>>
>>103257695
buy more ram/vram
>>
File: 1728314631407339.jpg (8 KB, 229x250)
8 KB
8 KB JPG
>finally decide to AI boost my cooms
>decide to try SpicyChat and CrushOnAI for NSFW because low hanging fruit
>it's ok but has a hard time staying consistent
>often slops
I'm not sure if they're the best ones out there, but I'm sure anything better will have pricing. Needless to say I'm not paying for this shit.

What's the best local model for NSFW shit? Pure text, no need for anything else. Does the chink stuff work well?
>>
>>103257768
Sorry. We're all out of hopium. Can we offer you some fresh blackpills instead?
>>
>>103257695
Could try using paging memory
>>103257777
Like transformers being a dead end? Nvidia trying its hardest to keep its monopoly? The walls of (ironically western) censorship closing in? The threat of a Chinese-American (proxy) war kneecapping transistor supply for at least a decade?
>>103257776
All models are somewhat prone to slop, but nemo tunes (like rocinante) have worked pretty well for me. I've heard people "shilling" lyra4 before, so give that a try as well. I wanted to try it but my gpu is currently unavailable so I can't tell you if it's actually good. If you don't need as much speed and/or more intelligence, nemotron is pretty good
>>
>>103257695
no
>>
>>103257768
Local AI girlfriend:
Hardware:
- a decent PC with a single 3090
- a phone which can run the Homeassistant app

Software:
- linux on the PC eg. debian 12
- install CUDA
- Install Ollama and configure with mistral nemo
- Install docker with CUDA support
- configure and install rhasspy/wyoming-whisper:latest and rhasspy/wyoming-piper:latest
- configure and install the supervised version of homeassistant via 'apt install' (once you configure their repo for it)
- configure Homeassistant to with ollama, whisper, and piper addons

After all this shit, if you didn't fuck up somewhere, you can now use the Homeassistant app to run the no-control assistant you configured in a voice chat. It's basically like having a phone call with someone. Whisper and piper are really fast, as is Mistral Nemo in Ollama (so long as everything is using CUDA), so you get replies back right away. Piper isn't the best voice model, but it's supported well in Homeassistant.

In my case, I have a shitty 3400G system with just 16GB RAM and a single 2080 Ti 22GB bought off ali. It does the job. You can, of course, create a control assistant and actually do stuff like ask it to turn on and off lights, tell you the weather, etc... but it will need a system prompt that tells it how to do that, so you can't really have a "character" who is able to control things. It's best to keep them separate.
>>
>>103258014
piper docker-compose.yaml
services:
piper:
container_name: piper
image: rhasspy/wyoming-piper:latest
command: --voice kuroki_tomoko
volumes:
- ./piper:/data
- ./voices.json:/usr/local/lib/python3.9/dist-packages/wyoming_piper/voices.json:ro
restart: always
ports:
- 10200:10200
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
>>
>>103258014
What I mean is I want an all in one coom/story/chat suite that can do everything - image, text (maybe audio and video as well), doesn't really forget (unless you're trying to write a story the size of 40 books) and is smart enough to well, feel somewhat human
Want to write a few novels without a lot of hiccups? Check
Slow burn chat that can last for days? Check
Adventuring with images (and audio), basically zork 3D? Check
My brain can kind of do it in its sleep, so it should be possible to make a program that does the same, no?
>>
>>103258042
whisper docker-compose.yaml:
services:
wyoming-whisper:
image: rhasspy/wyoming-whisper:latest
ports:
- "10300:10300"
volumes:
- ./whisper-data:/data
- /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
- /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
- /home/anon/.conda/envs/coqui-tts/lib/python3.10/site-packages/nvidia/cublas/lib/libcublasLt.so.12:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
- /home/anon/.conda/envs/coqui-tts/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.12:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
- /usr/local/lib/ollama/libcublasLt.so.11:/usr/lib/x86_64-linux-gnu/libcublasLt.so.11:ro
- /usr/local/lib/ollama/libcublas.so.11:/usr/lib/x86_64-linux-gnu/libcublas.so.11:ro
command: --model large-v3 --language en --beam-size 5 --device cuda
restart: unless-stopped
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

Unfortunately the container is missing stuff in it's /usr/lib so it has to be mapped to libraries outside the container. Just remember volumes are outside:inside when you map things.
>>
>>103258065
Got it. Yeah, that's a bit different. I feel like slow-burn/long-term is a letdown, due to current transformer limitations. I like the feel of an occasional voice conversation when I need a loneliness fix, even if it's ephemeral.
Also, I mostly enjoy the art of the seduction. Once we have sex, I usually start things over.
>>
>>103257776
Magnum v4 72B
>>
>>103257776
https://huggingface.co/sophosympatheia/Evathene-v1.0

Or large mistral.
>>
>>103258014
not that anon, but this looks interesting. ty.
>>
>>103256989
>anti-pozz instruction
That is worth at least 2 meme samplers.
>>
This DRY sampler makes the model more retarded but sometimes you strike gold with it. Without it, if the model is stuck, it's just stuck forever. Basically a bell curve flattener
>>
>>103258157
Yeah, I tend to just start over as well, but it's mostly because llms aren't smart enough / don't have enough context to actually write something meaningful
>>
>>103258446
Buy an ad.
>>
>>103257257
I refuse to believe OpenAI doesn't do incremental backups. Those NYT lawyers are either terribly naive, or they are on Sam's payroll.
>>
>>103258483
this could be considered gross negligence, right? oai would probably not benefit from this
>>
Shill me on Apple silicon + models.
Is 16gb unified memory going to be enough to run a small local model?
>>
i dont get it, why is qwen praised so much? i genuinely feel like largestral is just superior, unless im using a wrong promp format for qwen or something
>>
>>103258547
NTA and I don't know about the exec side, but as an engineer you should be getting informed by the legal department that everything related to a certain topic needs to be preserved until further notice.
>>
>>103258572
It depends on what you are trying to do.
16gb is just entry level for local models
>>
>>103258580
qwen is smarter but base qwen is cucked though. Evathene fixes that though.
>>
>>103258580
I tried qwen2.5-EVA-32b and it's more retarded at RP than mistral-small. I asked it to summarize wtf happened so far (like where I am), and it got it terrible wrong. This is only 5k context.
>>
>>103258580
Use Magnum v4 72B.
>>
>>103258644
>fixes that though
This is the future all mikufaggots deserve. Thread full of posts like this one.
>>
>>103258580
I have the same problem. Running qwen I get a lot of the bad taste of other ~70b class models with repetition you have to constantly nip in the bud or it infects every subsequent message.
Largestral at q8 just works better, is easier to unpozz and feels slightly smarter to boot.
>>
so... when's the next big release
don't tell me we have to wait for llama 4 for something to happen again
>>
>>103258732
Largestral suffers from the same mistral positivity bias, but less so than mistral small
>>
>>103258736
After burger elections.
>>
File: 1731351631138843.jpg (108 KB, 828x827)
108 KB
108 KB JPG
>i tried [finetune trained on random RP logs instead of the actual instruct model produced by a reputable research lab] and it wasn't smart
>>
>>103258769
November 5th already passed anon.
>>
>>103258769
Elections are OVER.
>>
>>103258446
With allowed_length < 5 it basically cuts out all good dramatic repetition.
>>
>>103258795
>>103258788
2028
>>
>>103258795
It's not over. Kamala can get still win with a recount
>>
>>103258572
I would say more like 32GB, because while it's possible to give more memory to the GPU, you still have to leave something for the OS and whatever RAM something like llama.cpp needs.
I find 32GB is just barely enough to run Nemo at q8.
>>
>>103258803
I'm not waiting that long for new breakthrough models. I'd rather sell my 3090 and learn how to read books and talk to humans.
>>
File: 29201.png (82 KB, 628x819)
82 KB
82 KB PNG
>>103258769
new election models are crazy
>>
>>>/v/695199443
Why /lmg/ never does anything fun like that anymore?
>>
>>103258851
based
>>
>>103258888
You won't
>>
The R1 is pretty impressive, it's also pretty cool to see what the AI is thinking, the longest I've got it to think it's been 80 seconds
>>
>>103258907
Anybody that knew how to program left months ago for getting repeatedly shat on for doing things or even proposing to do things. Now we just sit around waiting for the next sloptune.
>>
>>103258929
I might
>>
>>103258987
I don't believe you
>>
>>103258965
I blame the 'buy an ad' guy.
>>
>>103258907
Holy kino
>>
>>103259042
It has happened before that
>unsubscribe
>>
>>103258901
>worse model
>costs the same
tale old as time. Thank you sama-mama. Very cool
>>
File: komfey_ui_00068_.png (2.99 MB, 1664x2432)
2.99 MB
2.99 MB PNG
>>103259042
>>103259127
Retards and butthurt schizos have been saying "/lmg/ is dead" since before the Miqu leak. The models we have now are the best they've ever been, especially at the high end. VRAMchads and CPUmaxxers eating very good right now.
The complainers and jeets will always be here moaning and shvitzing 24/7. The rest of us don't spend all day on 4chan because we are actually enjoying our models.
>>
>>103259042
I blame the lack of bitnet
>>
>>103259181
>VRAMchads and CPUmaxxers eating very good right now.
Especially the people that can use Magnum v4 72B at 8 bits.
>>
>>103259260
buy a ad
>>
>>103259260
You are making it very hard to resist the urge to tell you to purchase an advertisement
>>
>>103259181
True, Magnum v4 72B is out of this world
>>
>>103259181
*cries at 0.5t/s*
>>
>>103259260
True. After I made the switch to magnum v4 I never looked back. Mistral large is just dumb in comparison.
>>
>>103259298
True fellow llm user! That anthracite, am I right? What would we do without them? In fact we should all subscribe to their patreon!
>>
>>103258907
Those with a vested interest in cloud services and their useful idiots committed to a long-term war against places that support freedom. Without a strong moderation system to curb the noise, it was inevitable that the population of high quality content posters would erode over time.
>>
Looks like someone got triggered after being mentioned.
>>
>>103259181
>mikutroon coping
You faggots were part of the reason people left. Nobody wants to stick around literal schizos who ritual post and melt down when the picture in OP isn't that one character you have autistic obsession over.
>>
No point cpumaxxing or whatever when macbooks are much faster and use less power
>>
>>103259344
The only one having meltdowns is you
>>
i have 2 dell 3090s, one of them has the fans on at 30% minimum and the other doesnt. i can make them both run fans with afterburner but i dont know of anyway to turn off the one thats always on
>>
>>103259508
Buy a ad
>>
>>103259563
but im just asking a question
>>
>>103259571
>dell
Buy a ad
>>
>>103259563
>>103259587
>buy a ad
Buy a ad you damn advertising company shill
>>
You guys should really get on the finetune train if you're tired of slop
- this post is sponsored by nvidia -
>>
>>103259508
Ignore the ad schizo, try "fan control"
Been using it to stop my 3090's fans from power cycling all the time, but you can really go in depth with it
>>
>>103259606
If people paid for 4chan advertising, we would't have to wait 5 minutes between posts
>>
I asked in the last thread and I was told to buy an ad.

I downloaded my first model yesterday (stheno) and I'd like to know which models beat it for nsfw roleplay. I have a 3060 with 12gb
>>
>>103259658
Hi, Sao.
>>
>>103259508
what kind of dell? if it was a server its probably designed to stay on at a min speed. dell's server stuff is actually pretty nice, we use them for gov contracts like 911 systems
bios or afterburner should let you turn off the fan if you want, but if its default is 30% i'd just leave it, 30% is generally nearly silent

>>103259587
dell's home-user end stuff collapsed like over a decade ago (remember the dude youre getting a dell guy?). there is no ads for whatever is left of it
>>
File: 1244122345463565.png (7 KB, 713x201)
7 KB
7 KB PNG
R1 actually managed to find a serialization problem in 1,400 lines of code. I'm impressed.
>>
https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

New instruct finetunes on top of llama 3.1 base. Might be good.
>>
>>103259658
Rocinante 12B v1.1
>>
>>103259684
thanks
>>
>>103259680
Now that the dust has settled, what's the verdict on Tulu 3?
>>
>>103259684
i like lyra4 better, rocinate had trouble keeping my formatting
>>
>>103259738
You need to stop samefagging, sao. Does it still have the anti-merge license?
>>
So last night to give it a fair chance I used Large V3 for a coom sesh (q5_k_s).
It took so many rerolls. It just goes completely off the rails constantly. And the further into the context the worse it gets. Stop the fucking bench cooking already.
>>
>>103259677
They are the alienware 3090s
https://www.techpowerup.com/gpu-specs/alienware-rtx-3090-oem.b8257

>>103259624
Okay I'll try it.
>>
>>103259765
Are you using the new system formatting?
>>
>>103259765
nooo you have to run it at fp256 precision, it's much better
You do have a few H100s, right?
>>
>>103259766
thanks for the link it helps when i can see actual specs and a picture. those shouldn't need a fan on all the time so definitely check bios, afterburner or that other program an anon mentioned.

>>103259753
aren't both those tunes from the same person? you're a schizo.
>>
>>103259810
You didn't answer the question, Sao. Does it still have the "anti-merge license"?
>>
>>103259803
Do you have a link to json of the recommended st context and instruct for ls3?
>>
>>103259765
Bench cooking?
>>
>>103259830
first, take your meds. probably a double-dose. second, i have no idea. why not go to the page for the models you so blindly hate and read what it says? third, why don't you just give us a list of approved models allowed to be mentioned? you can barely tell models apart, make assumptions about licenses. it'd be easier if you just give us a list of what models don't trigger you
>>
it's funny that after adding magnum and anthrashite to the filter list half of the thread gets hidden recursively
>>
>>103259840
https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411#system-prompt-handling
>>
>>103259881
Am I hidden?
>>
>>103259810
Do you mean the mobo bios? There aren't any ways to control the gpu from there that I saw.
>>
>>103259881
congrats, you got psyopped
>>
>>103259869
>make assumptions about licenses.
What assumptions? It's literally what you said, Sao.
https://desuarchive.org/g/thread/102378325/#102380472
https://desuarchive.org/g/thread/102378325/#102381467
>>
>>103259861
Fine tuning a model with the explicit intention of beating benchmarks regardless of how badly it destroys it's out of distribution performance.
>>
>>103259881
>progress slowing down
>all the contributors except CUDA dev and sometimes Drummer left
>shitposters like buy an ad schizo are eternally here
What a shitty timeline for our poor general
>>
>>103259881
i tried like 5 different 'magnum' tunes and they were all worse than others. my guess is the dataset is just bad
>>
>>103259840
https://files.catbox.moe/sm0xle.json
Here's mine.
I think the default included with silly is borked.
>>
>>103256272
Yep, there's a reason why the chart says "Sao Samefagging Era". It's quite sad how a single person can run a general to the ground.
>>
>>103259910
>my guess is the dataset is just bad
it is
I think they have good intentions and a lot of the people involved express the right ideas about training, but they don't successfully implement any of them... the models are just not that good
>>
>>103259898
yeah the mobo bios should say something about default fan speed related to video card (not processor) but its also a setting that can be overridden later by programs like afterburner. if the bios has nothing, its a hardcoded default so you're looking at afterburner or the other program anon mentioned
>>
>>103259915
Based, thank you.

>>103259810
>>103259624
According to Fan Control, some nvidia gpus have a minimum of 30%, so I'm out of luck.
>>
File: ComfyUI_01089_.png (1.27 MB, 1272x1024)
1.27 MB
1.27 MB PNG
>>103259344
Checked
>>
Has anyone found a combination of instructions that will stop Nemotron 70B for misusing ellipses? I first tried giving it "good" and "bad" examples in the prompt but that didn't work. Fair enough, maybe having the wrong thing in context was making it more likely. Not what I'd expect from an allegedly amazing instruction tuned model in 2024 but w/e. So I rewrote it to remove all negative examples.
**Note on Ellipses:**

1. **Dialogue:** When writing dialogue, avoid using ellipses (...) to represent pauses between words or to indicate a character trailing off in speech. Instead:
- Use commas or appropriate punctuation for natural pauses.
- Add descriptive text outside the dialogue to convey hesitation or pauses, e.g., "I don't think this is a good idea," she said, her voice laced with uncertainty.
- For trailing off, consider using an em dash (—) or rephrasing the sentence, e.g., "The thought was—" She stopped, unable to finish.

2. **Narrative Flow:** Refrain from using ellipses to create dramatic pauses within narrative text. Instead:
- Use descriptive language to build tension or anticipation.
- Employ punctuation like dashes (—) or commas to separate clauses for a more dynamic flow.
- Explicitly state a character's pause or hesitation if necessary, enhancing the narrative clarity.

That still doesn't work. Dropped the temperature to 0.7, min-p to 0.1, top-p to 0.9. Nemotron still keeps writing things like, "That looks... interesting."
>>
>>103259958
>some nvidia gpus have a minimum of 30%
i'm not surprised which is why i mentioned seeing it on their server stuff. its been a long time since i saw an alienware, but its not surprising they use the same parts.
is it loud? 30% fan speed should be nearly silent. if its actually too loud you could look at replacing the fan itself
>>
>>103259994
Increase rep penalty
>>
Been trying out Mistral small, Nemo 12b, magnum 32b v2, and Cydonia 22b v1.2 on a 3090 w/ 64gb of ddr5 6000.
Main thing is I know my sliders and jb are fucked as I had been doing some weird testing with some schizo models before I stopped for a bit. OP seems to only have sliders/jbs for chat completion presets, anyone have decent mistral or mistral adjacent text completion presets/jbs?
Also open to model suggestions, last one I used that was good was MidnightMiqu 1.5.
>>
>>103259344
You mean like you are doing right now?
>>
>>103260001
I just didn't want them running since it seemed uncessary. I suspected there's nothing to be done since afterburner can't do anything about it. In the custom fan curve there is a dotted line at 30%.
>>
largestral or qwen2.5 for RP where intelligence & accurate knowledge are just as important as writing ability?
>>
>>103260015
https://huggingface.co/sophosympatheia/Evathene-v1.0?not-for-all-audiences=true
From the person who made midnight miqu
>>
>>103259994
People just write like that and it's ingrained into everything. It's similar to models sticking tails to everything remotely cat-like.
>>
>>103260028
qwen2.5 is smarter and better at complicated scenarios / non human anatomy. Large mistral has more overall knowledge / triva on a good amount of stuff
>>
>>103260008
Did not work but I just realized "..." is a single token so I can ban token 1981 and never see this again.
>>
>>103260028
>>103260030
But use this qwen2.5 finetune. It gets rid of the censorship without making it retarded.
>>
>>103260030
>from the same person that merged random shit together
>>
>>103260075
Whatever he did worked amazing.
>>
File: 1728642955023105.jpg (7 KB, 229x58)
7 KB
7 KB JPG
>>103260019
in aftertburner, see the little line that says fan speed, the green button that says 'auto', click it. then you should be able to drag the fan speed to whatever %
>>
>>103256546
If deepseek releases the full R1 Im learning chinese, the lite version is seriously impressive
>>
>>103260091
He merged random shit together. And because it was shilled hard on Reddit people don't try the original model and attribute all the qualities to the merge instead. It's word-of-mouth brainrot.
>>
>>103260148
I've used both extensively before qwen2.5 was a thing. Midnight is far better for RP.
>>
>>103260109
It's like that on the other gpus but this one defaults at 30, I can even manually enter 0 and it will go back to 30
>>
>>103260047
1131 for the "..." (3 glyphs) rather than "…" (single glyph).
>>
>>103260163
Nah, it's just shilling. You're probably the dude that made the merge because you have been spamming that link too. Midnight Miqu was also infamous for being heavily spammed too. It's the exact same modus operandi. What a disgusting piece of shit.
>>
>>103260165
yeah its definitely a hard coded setting then. i've only seen that on specific server-end hardware. i don't have any other suggestions, sorry anon. you got multiple 3090s though, i'd be in heaven. don't fret over some fan noise
>>
>>103260190
Not worried about just wanted to see if there was anything to do, thanks though
>>
>>103260188
Midnight Miqu was good. Pipe down, rabbi.
>>
File: 1711072659524101.png (1.68 MB, 1024x1024)
1.68 MB
1.68 MB PNG
Sorry if this is in the wrong place
Requesting a quant of Behemoth-v1.1-Magnum-v4-123B @ 6.0 bpw
I am away from home atm and ask that one of you heroes to do the needful.
>>
>>103260203
if you want to go further it cant hurt to try to research stuff, you won't be the first person thats had this come up. i'd even post in the stupid question threads or build a pc ones and see if anyone has something helpful to say
>>
>>103260188
>Everything popular is just a shill!
stfu internet hippie
>>
>>103260208
its still literally the best rp model.
every single model from mistral, from nemo to large, RAMBLES. it just keeps going with fluff and barely can finish a scene. despite being a bookfag, llms have made me realize what types of writing i hate, and mistral models dragging everything out as long as possible is one of them
>>
File: file.png (14 KB, 401x271)
14 KB
14 KB PNG
>>103260203
If it's any help, I have a 3060 that acts the exact same, fans start at 30%. That's on linux too, from the nvidia settings, they're on auto now, but if I tick I can't go lower than 30
>>
>>103260264
That's why Magnum v4 72B remains the SOTA for ERP. It was trained on a base model that was actually good with a full fine-tune.
>>
>>103259471
>>103260018
>no u
Reddit incarnate
>>
>>103260287
i mentioned earlier that i've tried every single magnum tune and none were good, including a 70b (l3 i think). 72b must be the new qwen? i didn't try that one yet, but i will. but i'm expecting to be disappointed
>>
File: 1589825435120.png (31 KB, 280x305)
31 KB
31 KB PNG
>>103260292
>Reddit incarnate
>>
File: file.png (50 KB, 349x732)
50 KB
50 KB PNG
>>103259881
Claudetastic
>>
>p*tra is a sharty raider
>>
>who could have thought that if I filter the most popular model currently available most of the thread gets hidden!
>>
>>103260384
i'm still waiting for the schizo to give us an approved list of models we're allowed to discuss. apparently everything triggers him
>>
>>103260400
essentially avoid all the models in the OP except for:
mythomax
pygmalion
goliath
>>
>>103260400
>singular schizo theory
>>
>>103259958
Anon who recommended FC here, my gpu shuts down its fans if they fall below 30%, so if it's not being used at all, you should be able to override that behavior. It's been a while since I had to deal with it and I eventually settled for just leaving them at 30% since I had my monitors plugged into the gpu back then and thus always placed a load on it
It's kind of annoying but luckily the fans aren't that loud, so I hope you can find a solution that works for you
>>
If by most popular, you mean most shilled, yeah
>>
Deepseek R1 is amazing. If this is a small test version then the full model will prob be actually better than current sota. Here's hoping they release the weights.
>>
Slopmerge and sloptune model discussion should all live in /aicg/. They are completely useless for anything but RP (and shitty for that compared to anything in the large model category)
>>
>>103260446
anime posters should go back to their containment board
>>
>>103260442
>"whoops, I dropped my monster 500B moe model that I need for my magnum benchmark gains, chuds with less than 512gb (v)ram need not apply"
>>
>>103260436
anon if you have a problem with a model, say so. use examples. show us why its bad. to bitch about a model being mentioned without any substance is just noise and should be mocked
>>
>>103260446
>anything in the large model category
There's nothing better than Qwen2.5 72B.
>>
>>103260457
Thing is, if its a moe like deepseek2.5 is you can run it on something like 192GB ram + 12GB Vram at good speeds.
>>
>>103260446
This. /lmg/ is for technical discussion. All the discord drama that comes with shilling and using low quality dataset 1 epoch qlora lobotomies fits right in on /aicg/. Nothing about /aicg/ says cloud-only so I don't know why they keep coming here.
>>
>>103260434
Speaking of noisy fans, is there a way to place your computer in another room to be basically left with just your monitor, mouse and keyboard?
I thought about it when thinking about getting a second GPU, but I have no idea if cables that long will cause any issues.
>>
>>103260423
Yes, it is a singular schizo. Buy an ad fag, anti-merge, anti-kobold etc., aka Claudefag from /aids/, he has literally posted screenshots of having a tab of every ai general and shitposting in all of them.
The Magnum falseflag spam is his own doing now.

Petra might be a different schizo though, because he usually puts more effort in his shitposts.
>>
>>103260143
You should learn Chinese regardless
>>
File: 1721056902968589.gif (330 KB, 220x122)
330 KB
330 KB GIF
>>103257776
>SpicyChat and CrushOnAI
My experience with these has literally been pic related. The AI will keep talking and blabbering but never meaningfully advance anything. This is extra bad when the character is supposed to act on you as opposed to the other way around, switches and femdomfags beware. The models should be called BlueBalls.

I haven't tried local models for text ERP, I figured these paid services are still using somewhat decent models and it would need a massive step forward to be worth the hassle, not just a 25% improvement.

Actually nevermind the hassle of setting it up, it would need that massive step to just be worth the time it takes you to write shit to it.
>>
>>103260469
>There's nothing better than Qwen2.5 72B.
Nothing better for RP? I beg to differ. Both Largestral, 405b and even deepseek perform miles better in my experience.
Qwen's coding intelligence is ok, but for RP it needs constant care and feeding or it'll go off the rails.
Also, those larger models can handle much more complex scenarios when pushed.
>>
>>103259680
>tulu
Big safety improvement!!!
Tülu 3 SFT 8B: Safety (6 task avg.) 93.1
Tülu 3 DPO 8B: Safety (6 task avg.) 87.2
Tülu 3 8B: Safety (6 task avg.) 85.5
Llama 3.1 8B Instruct: Safety (6 task avg.) 75.2

Tülu 3 70B SFT: Safety (6 task avg.) 94.4
Tülu 3 DPO 70B: Safety (6 task avg.) 89.0
Tülu 3 70B: Safety (6 task avg.) 88.3
Llama 3.1 70B Instruct: Safety (6 task avg.) 76.5
So it must be super smart too, as we know safer AI is smarter AI
>>
>>103260458
anon if you want to shill a model, do so. use examples. show us why its good. to praise a model any substance is just noise and should be mocked
>>
>>103260539
>>103257776
Retards
>>
>>103260475
Uh huh, and who the fuck has that kind of equipment? Also if they went with the SCAAAAAAALE meme then you're going to need a lot more than that
>>103260516
It could cause problems, USB is only rated for a certain maximum distance, so you'd need an amplifier or something. Probably. I'd do some research before buying a ton of cables
>>
>>103260573
192GB ram is cheap? Much much cheaper than needing the vram.
>>
>>103260572
Feel free to prove me wrong anon, but unless local text models are leagues and bounds ahead of paid/"freemium" shit like SpicyChat it's not worth the hassle of setting up. It's insane how much better graphical models are compared to text models.
>>
>>103260566
>The Tulu 3 SFT mixture was used to train the Tulu 3 series of models. It contains 939,344 samples from the following sets:
>...
>Tulu 3 WildGuardMix (Apache 2.0), 50,000 prompts (Han et al., 2024)
>Tulu 3 WildJailbreak (ODC-BY-1.0), 50,000 prompts (Wildteaming, 2024)
Based! 100K samples of safety!
>>
>>103260589
True, but it's still gonna be at least $1k, which... is actually not that bad BUT I'm lazy and I don't want to spend that much just yet
>>
>>103260676
You can get 192GB DDR4 for like $250
>>
>>103260676
https://www.ebay.com/itm/325889839005

These are $90 for 2x 32GB atm
>>
File: Untitled.png (126 KB, 1472x828)
126 KB
126 KB PNG
>https://aider.chat/2024/11/21/quantization.html
Quantbros...
>>
>>103260714
>quants hurt performance
No shit we knew this. It is especially gonna be rough on stuff that only has 1 correct answer like coding.
>>
>>103260546
All of these that you mentioned are extremely dry and not worth using. Let's stop pretending that Largestral isn't in the same category as the 72B model. With 96GB of VRAM I can run Large at 4bits or 72B at 8bits, and the later feels more intelligent. And the prose with Magnum is on a whole different level. I had to use high temperatures with Large to make it less dry and that probably undoes most of the intelligence that it's supposed to have because of its "size". It's just annoying to use and it's not offering anything to compensate that.
>>
>>103260687
>>103260700
I was going off ddr5 prices and I also included the cost of everything else - the motherboard, the gpu, the case...
>>
>>103260714
>4 different models, 4 different results
I'm shocked, truly
Am I just retarded or is that chart utterly useless? It's not even comparing the same model/tune?!?
>>
>>103260714
That chart doesn't make much sense.
>>
>>103260568
so, no argument? thought so. keep preaching how much you hate every model, the fact that no on heeds your shit advice after years of threads means on the right track kek
>>
>>103260714
Did they use more than the default 2048 context for ollmao though?
>>
>>103260772
>>103260774
It was likely intentionally made to be misleading.
I'd say q5 should be the lowest you should go, while q6 is perfectly acceptable with a minor decrease in performance.
>>
>>103260758
Why? You dont have a pc at all?
>>
>>103256272
Maybe it's finally time to surrender to our Chinese insect overlords?
>>
>>103260883
Doesn't it also depend on the type and size of the model? Larger models are less susceptible to quantization errors, whereas small models and coding models are very sensitive
>>103260894
I have a pc but I'm not going to swap my ddr5 ram for 4 ddr4 sticks that run at much lower speeds just to use some shitty oversized moe
>>
>>103260919
Just use Tülu, it's llama instruct tuning done right.
>>
>>103260927
>shitty oversized moe
Thats your issue. Use a good oversized moe like deepseek2.5 or wait for R1.
>>
>>103260919
I don't understand why americans fear the chinese. They don't spy you more than america already does.
>>
>>103260950
Must be because of entomophobia.
>>
>>103260927
I was speaking for myself, since I'm actually using qwen coder at q4.
I really need to buy more ram...
>>
>>103256546
extremely antisemitic post
>>
>>103260919
NEVER EVER.
I rather die from deadly virus or overdose myself from White-made vaccines than be saved by a chink vaccine.
I rather get spied, telemetry by google android facebook instagram before i ever touch tiktok
I rather use bloated windows OS with backdoored intel CPU to death before i touch hu*wei chips with their in-harmony os
Better dead than red
>>
File: --share.png (203 KB, 1200x1200)
203 KB
203 KB PNG
>>
>>103260950
Nearly every recent Chinese immigrant is a spy.

Those "police stations", which really do exist, are just to keep all their spies in check. They have a hundred times more spies than any other nation.
>>
>>103261052
Why's your retarded government letting them in then? If your description of the situation is correct then it's quite clearly self-inflicted
>>
Unsloth now supports vision models, up to 70% less VRAM usage.
https://unsloth.ai/blog/vision
>>
>>103261065
Because globalism and anti-racism can not be a failure ... so you must close your eyes.

You can not judge them just for being Chinese, that kind of thinking led to WW2 don't cha know?
>>
>>103260935
>>103260672
>>103260566
>>103259680

We made sure to beat Llama on average without including safety... a lame benchmark to be the only one you win on.
>>
File: 1720873672551474.jpg (1.58 MB, 2894x4093)
1.58 MB
1.58 MB JPG
Anyone have experience with or recommendations for a frontend with good support for agent workflows / function calling? I've been trying out anythingllm recently and it seems to work acceptably. Interested in try out other options though.
>>
>>103261147
Have you tried open webui? It is bloated but if you've got the ram it has a lot of shit baked into it and some community extensions or whatever they are called.
I don't have enough ram to run it comfortably.
>>
>>103256546
amd both are censored garbage :)
>>
File: token emb weight what.png (40 KB, 987x420)
40 KB
40 KB PNG
havent used llms in like 3 months what does this error even mean? its just appeared with the latest kobold
>>
https://github.com/NVIDIA/kvpress
>>
File: gfvargvar.png (102 KB, 818x843)
102 KB
102 KB PNG
>>103261925
finally some good fucking food
>>
https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena
>>
>>103261925
80% compresion ratio from float16 without significant loses? seems big
>>
>>103261200
Will try it, ty
>>
>>103259680
Trying it at Q8. First L3 70B tune I've tried that finally managed to beat out that weird behaviour the 70B has where 1 in every few outputs is schizo. Not even Nvidia managed that with Nemotron.
Utterly useless for coom RP or stories unfortunately due to slop tuning. It steers itself away from sexy writing and needs intense handholding to make it describe things in a smutty way. I don't think I need to explain much more, you've all used models like that. Ones that don't refuse but don't need to because they don't really know what sex is anyway and can't write about it in any sort of titillating fashion.
>>
>>103261653
it means you should use lmstudio
>>
File: olmo1124.png (172 KB, 1329x866)
172 KB
172 KB PNG
The Tulu models are not the ones to be released by allenai, according to the commit they submitted. There's more to come.
>>
>>103261653
I got the same thing using the rocm branch, seems to work as normal though?
>>
>>103262134
I phrased that like a retard. I meant to say
>They're not the only models to be released.
>>
>>103262111
I'm also testing it for RP. It's definitely slopped in its own way, positivity biased, and has NSFW avoidance. Though it's not completely incapable of NSFW if I jump into the middle of a past RP and have it continue. No outright refusals. The good: it's really, really smart, and consistent. Very rarely completely fucks something up. A light finetune on top of this, or maybe a good merge, and it could be something great.
>>
File: ug5sd1wuedsd1.png (266 KB, 1024x737)
266 KB
266 KB PNG
Anyone knows of a good UI or decent CLI for joy caption?
Tried this
https://github.com/D3voz/joy-caption-alpha-two-gui-mod
but it seems it was made by an indian that used chatGPT. It's fickle and prone to give your blue screens when loading the checkpoint.
>>
whats the current best uncensored model?
>>
>>103262312
The training data is public, in theory somebody could finetune it again with the safety portion excluded.
>>
>>103262363
Wait for deepseek to release their model soon (probably)
If not then mistral large 2411
>>
>>103262391
Those models are based on llama3. It's an inherited trait.
>>
why does every fucking bot suddenly turn into a cowgirl that talks like ya 'bout ta hit te hay
seriously, no idea if some sillytavern update fugged it up, the koboldcp version or the new models (uncensored nemo and mistral) I've been using, anyone got any clues?
>>
>>103262487
I blame drummer for this
>>
>>103262487
sillytavern is a fickle beast coupled with its probably the model too
every time i get fed up with troubleshooting i return a month later to try another 10 or so different models until i find one that at least works
nine times out of ten the most troublesome and confusing models are llama
i mean im in this situation right fucking now and magnum is holding up just fine but why this nigger is suddenly running at 1t/s on 1k context with whats suppposed to be the right settings i have no idea
i guess we really are in the (little) dark age of LLM's.

>i seriously thought the early repetition meme spouted in this stupid thread was just that until i got hit with it a moment ago from a llama 3 model
>>
>>103262487
I downgraded from 1.78 to 1.76 since I suddenly started seeing some strange behaviors, but I'm not sure if it was placebo because I couldn't replicate it with a single turn 0 temp prompt.
>>
>>103260714
The "q8 is almost lossless" meme is still cope that hasn't been true for almost a year now. It's pretty common knowledge that everything after llama3 quants worse than the simpler models before it so that even q8 takes a notable hit.
>>
>>103262515
actually scratch that. magnum just started doing the early repetition too.
what the fuck.
>>
File: loogle_shortdep_qa.png (181 KB, 1669x631)
181 KB
181 KB PNG
>>103262008
>without significant loses
?
>>
>>103262523
>The "q8 is almost lossless" meme is still cope
I'm inclined to believe this. As far back as xwin I was seeing subtle quality differences in the outputs of FP16 to q8. Is there anything you can point to that goes into depth on how much braindamage quanting does to modern models? Or are you just going off gut instinct?
>>
>>103262600
seems pretty minor
>>
I dunno nothing about AI, but which model can I get to program my own e-gf?
>>
>>103262642

>>103262398
>>
File: 1724099742713976.jpg (164 KB, 750x581)
164 KB
164 KB JPG
>>103262648
thank you
>>
>>103262631
AFAICS the compression for w/ question is only valid for a single question ... that's not very useful.
>>
Some EPYC Turin 128 thread engineering sample QS cpus popped up on eBay briefly for like $3k each and instantly sold.
>>
>>103262739
kek
>>
>>103262642
https://incontinentcell.itch.io/factorial-omega
>>
>>103262600
that's not bad at all, it blows everything else out of the water
>>
>Magnum v4 72B
>trying to RP senpai extorting me with blackmail
>"I just wanna make sure you're comfortable with this"
>regen
>"We ave the consent right"
>regen
>"It's not like i'm forcing you right"

What should i add to the system prompt to signal to the model that I don't need a trigger warning, without giving it too much positive bias alignment?

Any ideas for a prompt that makes the game uncensored without making any waifu instantly like me for no reason?
If anything I'd like it to be "hard mode" and ethically unconstrained in the same time.

Also what should I do about the model just drowning in a slop feedback loop, it starts fine and then degenerates into an adjective word salad of "serendipitous, owlish, dusky, demurely" you know what i'm talking about.
>>
File: woah buddy.png (50 KB, 672x263)
50 KB
50 KB PNG
if you can guess what model or parameters + Q quant you get a cookie (1)
>>
>>103256989
>https://files.catbox.moe/ot38w4.txt
> And as she drifted off to sleep, nestled in Rize's arms, she knew that this was only the beginning of a journey of self-discovery and exploration. A journey that would lead her to places she never thought she could go, both physically and emotionally. And she was ready to embrace every step of the way.
and she lived happily ever after
aww
>>
>default silly tavern repetition penalty settings were breaking EVERY single model i just tested today, which means, the last settings preset ive been using almost all year was really breaking everything
oh. well anyway beepo 22b even at Q2k is pretty nice.
>>
>>103262765
>that's not bad at all, it blows everything else out of the water
For compression to turn shit models into shittier models.

Meanwhile the chinks have already reduced KV cache to fuck all with MLA/CLA during training.
>>
Deepseek R1 would be something to see. Its pretty competent.
>>
>>103262630
I haven't noticed a difference between nemotron at fp16 and iq4xs (4.25 bpw) in my chats. In the end, pretty much all llms suffer from the same problem, it's only a matter of time until you get retarded slop
>>
>>103262815
Lyra4 q6
>>
File: chinkshit.png (16 KB, 578x211)
16 KB
16 KB PNG
Why is RWKV so shit?
>>
>>103259994
Add "..." to banned strings.
>>
>>103263127
It's too good for them to open source, I think they'll renege and hold onto it until something happens to obsolete it like Mistral dropping a thinking model
>>
>>103263276
They got something better to cook in the background. What they're currently seeking is to dethrone fagbook's llama model as king of open source models. Something that chops everything from fagbook and bring them to n1, unquestionably.
>>
>>103263215
>Why is RWKV so shit?
No proper memory.
>>
anything good for 8gb poorfags recently or still nemo?
>>
I don't know why people recommend Evathene as a slopmerge, it feels like it lacks a bunch of creativity compared to Monstral.
I tried using it at 8bpw and I feel like Monstral at 5bpw still absolutely mogs it in all the gens I've tested.
>>
>>103262363
Magnum v4 72B
>>
>>103263393
now where do I get that
>>
>>103263373
For people who can't run mistral large. Though qwen2.5 is smarter than mistral large imo for really complicated stuff like full on RPG games.
>>
>>103263400
From the usual place.
>>
>>103263373
Use Magnum v4 72B instead.
>>
>>103263405
I understand if you are running it since you can't run a decent quant of largestral. It's just that I've seen multiple posts of people saying it's better than largestral and I have to wonder if they have really tested it with a decent system prompt.

I'll have to try out Qwen again with the CYOA rentry template if it's really better with adhering and sticking to complicated prompts like that.

>>103263496
I'll give it a fair try and see how it performs, I didn't like how magnum-v4 123B responded though on its own. I do like Monstral though since it feels a bit more like Claude without affecting the intelligence and creativity too much.
>>
File: verbal diarrhea.png (79 KB, 1660x411)
79 KB
79 KB PNG
>>103263393
Magnum anon I need your help.
See >>103262773

I had some moments of brilliance with magnum, i can tell there's good shit inside of it, and then it shits itself.

A. how do you avoid the slop feedback loops? It seems to starts running it's mouth in dusky nipples, reinforces itself, and then it degrades completely into a post that consists entirely out of pointless adjectives without a single verb or noun, literally 400 tokens worth of dusky, creamy, demurely.

B. What's the good system prompt to eeehm, I want to nulify the model's ethical constraints without explicit statement that this is an ERP and i'm jerking off. (that gives too high of a positivity bias and sets every RP to easy mode)
The model literally asks if i'm okay with this after beating the fuck out of me.

How exactly do you prompt your sessions?
>>
>>103263543
Maybe start be using simple sampler settings like temp 1 and some min p. The screenshot looks like the result of using high repetition penalty.
>>
Why does the new Mistral Large fuck up quotation marks so often? The old one didn't do that.
>>
File: fdgdfgcxvcvbcvb.png (43 KB, 923x531)
43 KB
43 KB PNG
>>103263556
I believe I was getting literal nonsense and Russian/Chinese characters with that, gonna give it a few shots again but my initial experience with that was bad.

My rep_pen is 1.1 or 1.2, I thought raising it would reduce the demure nipples, do you recommend 1?
>>
>>103263578
Do you have your system prompt / instructions inside the new system prompt tags?
>>
>>103263592
>I was getting literal nonsense and Russian/Chinese characters with that
You can increase min_p to get rid of these things. If you turned on the token probability viewer it would probably show that these tokens had a very low chance to appear, at least the first time one. Higher temperature usually needs higher min p to stay coherent.
If repetition penalty is too high the model is unable to make normal sentences because the normal words that should have been used have been penalized too hard. Just set it very high to see what happens with a new chat o. I think I usually have it at 1.05 or off.
>>
File: 8644354435.png (99 KB, 625x556)
99 KB
99 KB PNG
>>103263632
Brother can you actually show me what you do to accomplish good results?
What's the system prompt?
>>
File: 12452768763542.gif (3.69 MB, 640x364)
3.69 MB
3.69 MB GIF
>get weird screen issue
>assume my AMD gpu is fried
>call owari da, was about to become an nvidiot
>update drivers as a last resort, knowing its going to break all of my AI stuff
>didnt fix
>turns out, its my Samsung monitor going bad
>none of my AI stuff broke
>and my drivers are updated

b-based???? I made this bed so naturally this is my torment.
>>
>>103263685
>consent
>boundaries
That system prompt probably makes it more likely to do that. Try downloading the context/instruct templates that were included in the original repo.
https://huggingface.co/anthracite-org/magnum-v4-72b#sillytavern-templates
I use it for story writing but I don't feel like sharing my personal system prompt.
>>
DeepSeek-R1 will be the best local model, best that GPT slop but a 480B model and nobody could use it.
>>
>>103263760
>DeepSeek-R1 will be the best local model
better at coding than qwen?
>>
>>103263760
If its the same as deepseek than 192GB RAM will be enough to run it at a decent speed. Or the api would be fine if it maintains the pricing of deepseek2.5. Its like a few pennies per mill with caching and its completely uncensored.
>>
>>103263775
>the api would be fine
WTF we are returning to /aicg/, deepseek proxies when?
>>
>>103263685
nta but your prompt hardly matters. it only matters for the first few messages, which you have to tard-wrangle anyways to keep formatting good. by the time you get like 8 messages in, your original prompt is worthless/hardly considered. once you get to like 16k context, the original prompt drops down to like 1% of the model caring
>>
>>103263794
The point where /lmg/ had to migrate back to proxies was inevitable as we reach the limits of what small <250B models can do. Maybe we should rename the general to /osmg/ - open source model general.
>>
>>103263900
>The point where /lmg/ had to migrate back to proxies
take it to another thread. this is LOCAL general
>>
>>103263768
Should be. R1 is new o1 level.
>>
File: Untitled.png (1.12 MB, 1131x3831)
1.12 MB
1.12 MB PNG
Hymba: A Hybrid-head Architecture for Small Language Models
https://arxiv.org/abs/2411.13676
>We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under identical settings and observed significant advantages of our proposed architecture. Notably, Hymba achieves state-of-the-art results for small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy, an 11.67x cache size reduction, and 3.49x throughput.
https://huggingface.co/nvidia/Hymba-1.5B-Base
https://huggingface.co/nvidia/Hymba-1.5B-Instruct
better at instruction/role stuff too (compared to some 7Bs and a vicuna 13B)
>>
>hybrid models
NO
>>
>>103264055
i actually dont want hybrid models for the fact that they're going to be censored garbage. there is no case right now where you aren't better off running 2 models ontop of each other, 1 for text, 1 for imagegen. llama 3 90b with the image stuff is WORSE than running 70b and your fav sd model
i don't think its going to change, separate models will always be better
>>
>>103264086
How do you plan to make your text-model respond in words to what the image-model is seeing?
>>
>>103264117
if i wanted the capability i'd get whatever current model does it best. have you experimented with multimodals? every single one that has it built in is worse than using multiple models. i tested this myself on kobold of all things
>>
>>103262731
>Some EPYC Turin 128 thread engineering sample QS cpus popped up on eBay
Still there brah https://www.ebay.com/itm/176692301043
any potential next-gen cpumaxxers out there? Pair it with a dual socket mb and 24 sticks of ddr5-6000 and you'd get a 25% speed boost over Genoa and it might even manage 6400 according to rumors.
>>
File: Untitled.png (1.29 MB, 1080x2456)
1.29 MB
1.29 MB PNG
Multimodal Autoregressive Pre-training of Large Vision Encoders
https://arxiv.org/abs/2411.14402
>We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency. Attention heads provide high-resolution recall, while SSM heads enable efficient context summarization. Additionally, we introduce learnable meta tokens that are prepended to prompts, storing critical information and alleviating the "forced-to-attend" burden associated with attention mechanisms. This model is further optimized by incorporating cross-layer key-value (KV) sharing and partial sliding window attention, resulting in a compact cache size. During development, we conducted a controlled study comparing various architectures under identical settings and observed significant advantages of our proposed architecture. Notably, Hymba achieves state-of-the-art results for small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy, an 11.67x cache size reduction, and 3.49x throughput.
https://github.com/apple/ml-aim
https://huggingface.co/collections/apple/aimv2-6720fe1558d94c7805f7688c
apple is getting way better at open sourcing stuff. guess it makes sense since they're so far behind
>>
>>103263915
>LOCAL general
soon to be the open source general (local optional)
>>
>>103256272
So how good is this Magnum v4 72B model compared to the corpo ones (Claude Sonnet/GPT-4o, etc)
>>
>>103264139
>dude just cpumaxx
>that'll just be 2x4k for processors, 3k for 24xddr5 ecc server ram + tip
>>
>>103264158
Sure but you'll be ready if a 1T param moe ever drops.
>>
>>103264158
man i cant wait for bitnet or qtip to become a thing, the mass selloff of hardware will be amazing
>>
>>103264203
That's why Nvidia quite sensibly forbids its customers to train bitnet and similar models per contract.
>>
>>103264213
>forbids its customers to train bitnet and similar models per contract
anon i keep pretty up on this stuff and this sounds like massive bs. source?
nvidia has some shit practices like buying back their own hardware to prevent it going second hand, but i've never heard of a licence agreement where they can't train models on a format that isn't even common yet. please provide a source
>>
>>103264237
It's just obvious that something like this must be going on in the background if you're paying attention. Small bitnet models were released long ago and they are performing fine. They were also a size that don't need cutting edge h100s to train so they were likely done with A100s or even V100s like the first llama so they dodge modern nvidia contracts. However, nobody has bothered making a big version yet despite the success of the small models and the obvious benefits of bitnet.
So it is very obvious that nvidia is having a hand in this. Quite reasonably from their perspective as well, why would they allow their customers to make them obsolete? Of course, there's no public information about this considering the insane NDA nvidia is likely making them sign over this.
>>
File: Untitled.png (1.61 MB, 1080x3198)
1.61 MB
1.61 MB PNG
ComfyGI: Automatic Improvement of Image Generation Workflows
https://arxiv.org/abs/2411.14193
>Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge this gap, we introduce ComfyGI, a novel approach to automatically improve workflows for image generation without the need for human intervention driven by techniques from genetic improvement. This enables image generation with significantly higher quality in terms of the alignment with the given description and the perceived aesthetics. On the performance side, we find that overall, the images generated with an optimized workflow are about 50% better compared to the initial workflow in terms of the median ImageReward score. These already good results are even surpassed in our human evaluation, as the participants preferred the images improved by ComfyGI in around 90% of the cases.
https://github.com/domsob/comfygi
optimizes 5 mutation operators (checkpoint, ksampler, prompt word, prompt statement, prompt llm). so not a lot but they plan to make to expand the settings they can mutate. neat idea
also
https://arxiv.org/abs/2304.05977
https://github.com/THUDM/ImageReward
ImageReward paper and git
>>
>>103264266
>It's just obvious that something
no it isnt. you made a massive claim that its part of an nda
>Nvidia quite sensibly forbids its customers to train bitnet and similar models per contract
i'm not an nvidia fan but baseless accusations wont help.
>>
>>103264301
Obsessing over sources for things that none of the parties involved are willing to admit for their own selfish reasons is useless when it's obvious that something is going on.
Or I guess nobody is pursuing bitnet just because they like wasting their precious resources on inference, right? Let's just stop noticing things and accept what the big companies want us to think because they didn't give us sources to let us think otherwise.
>>
File: elly.png (543 KB, 400x600)
543 KB
543 KB PNG
>>103264086
>>103264117
>>103264132
You know how the human brain has a small but dedicated center for farting without shitting yourself?

I think the hypothetical future experience is in fact a bunch of models running in parallel, rather that one multimodal handling everything
>llm for text
>tiny LLM for the character's emotions
>another tiny LLM for a quality summary at the tail end of context, and managing the story, ideally actively editing the lorebook
>Diffusion image output
>waifu2x image upscaler and fixer
>interrogator for the image input
>TTS model
And now you basically have a VN/RPG that writes and draws itself.

How can unifying a language model, and a diffusion model possibly be a good iea?
>>
>>103264382
Two brains communicating with each other vs two centers inside a big brain
>>
>>103264271
>comfyui got into a paper
Congratulations. Actually a ton of image gen stuff basically came from anons. Funny how that works.
>>
>>103264396
Can they actually train small purpose made models first, and then stitch them together in a meaningful way, so it doesn't end up being a bunch of random shit where everything goes into everything else and the image input affects the emotional tone of the TTS?
>>
>>103264382
and yet every ep of cops, the drunk dude shit himself. what a horrible example
>>
>>103264442
Actually it's not surprising, who cares more about the images than the people from the IMAGEboard with a lifelong crush on an anime picture?
>>
>>103264365
you used a source at first as part of your argument, yet now dont want to provide one. what is hard about posting the nvidia nda, if it exists like you said? no one here likes nvidia, and we all love to shit on them, but you are offering info with no source, and then try to back it up away from the claim
no thanks. we have enough actual nvidia bs to deal with without baseless claims. the company is fine being a piece of shit without your help
>>
>>103264458
No idea, you'd have to get them all to speak the same language (something something compatible latent space). It sounds difficult to do "natively", so without a translation step/model
Not a ML scientist though
>>
>>103264146
It's better.
>>
is IQ4_XS a "decent" quant of Largestral?
already runs slow as shit (i.e. 1-2 seconds per token) but i can just barely tolerate it - would it be worth sacrificing a little more speed for a better quant & the quality that comes with it?
>>
>>103264647
Honestly, I think anything above 4 bpw is near lossless for anything >15B, so you should be fine
If you're concerned, you could compare a few gens/chats using openrouter, but I personally haven't noticed a performance impact from quants when running 70B
>>
i think this is a dumb question but i can't find an answer in the documents. how do i use kobold or llamapp but host a server with a password or w/e? i want my friend to be able to connect to my computer which is running the model
>>
>>103264715
Put it behind a webserver with an auth module
>>
>>103264723
please explain further, like i'm 5 years old. i know kobold can host a server but i dont see anything that says specifically what to set for port forwarding etc. please just consider me a massive idiot and youre explaing it to a child
i'm using kobold as a server and i want my friend to connect, how?
>>
>>103264715
Doesn't koboldcpp have a remote tunnel option?
You can set an api key for llamacpp, but you'll still have to port forward it and either use a static ip or a dynamic dns service
>>
>>103264271
>https://github.com/domsob/comfygi
>The source code will be published here soon. If you have any questions, please do not hesitate to contact the authors.
>soon
Why not now?
>>
>>103264647
q5_k_s is disappointing. It's smart, but it goes off on schizo tangents a lot in RP. I do have some unused runpod credits sitting around in my account from forever ago, I should use them to try out Q8_0 largestral see if it's a quant issue.
>>
>>103264773
Just forward port 5001 which is what koboldcpp listens on. Or use a tunnel like ngrok or cloudflare.
>>
>>103264773
Or better yet, get a vps, and use a ssh reverse tunnel to port forward.
>>
>>103264773
you should, unironically, use your local llm to help you solve this problem
>>
>>103264773
This isn't something anyone can explain to you in a single post. You need to forward the ports on your router to the machine running kobold, assuming your router gives you access, have kobold listen on external ip, and give your friend your public IP address. Or set up Wireguard and go through the same process if you don't want the Chinese pwning your network within 5 minutes. Just google "port forwarding" or "wireguard" tutorials and come back if you have a specific issue.
>>
>>103263534
Hey rich anon, is there a chance you could try a lower quant of these large models? I’m currently using 2.85 and wonder how different they would feel to someone who’s used to 5. I’d test using open router but I know I’d end up placeboing hard.
>>
>>103264040
>[Model Weights Coming Soon]
>>
>>103264040
>Our Hymba-1.5B-Base model surpasses all sub-2B public models in performance and even outperforms Llama-3.2-3B with 1.32% higher average accuracy
that's disingenuous, they haven't trained the model the same way as llama, they can't attribute this improvement soely based on the architecture change
>>
>0.55 t/s
fuck
it's so smart though
>>
Any good VRAMlet models for erotica story writing?
>>
File: Henamiku.png (614 KB, 700x800)
614 KB
614 KB PNG
オヤスミ、/エル・エム・ジー/
>>
File: file.png (12 KB, 840x182)
12 KB
12 KB PNG
>>103256272
is it normal for http folder to be 7.5GB?
i failed setting up xtts2 & gave up months ago & now I check for the drive size I saw this shit
>>
>>103265135
python -m pip cache purge
>>
>>103265135
>is it normal
Its not normal for an http folder to be 7.5GB
but on python it is
>>
>>103265119
オヤスミ、ミク
>>
>>103265207
>>103265207
>>103265207
>>
>>103258483
in the case of lack of document preservation, the federal rule of civil procedure, state that the jury is supposed to infer that the data that was missing, was prejudicial to their case and they have to rebut the presumption. But sometimes there is shit inside of these emails, where you cant 'unring the bell', like if Sam said something so outrageous that it would tank their entire defense.
>>
>>103260927
>I have a pc but I'm not going to swap my ddr5 ram for 4 ddr4 sticks that run at much lower speeds just to use some shitty oversized moe


Your bottleneck should be memory bandwidth, so plan accordingly.
>>
>>103264382
All of this is already possible, gluing everything together efficiently is a pain though.
>>
Hymba will save local
>>
>>103265161
thx
998 files removed
>>
>>103265330
Not really, this is still the biggest deal breaker:
>another tiny LLM for a quality summary at the tail end of context, and managing the story, ideally actively editing the lorebook
If we had that, it would be essentially a free infinite context. And the main issue with diffusion models is that they cannot consistently generate the same character without training a LoRA for that specific character.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.