[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_00185_.jpg (150 KB, 1024x1024)
150 KB
150 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101767112 & >>101757601

►News
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1720869141748493.jpg (94 KB, 875x916)
94 KB
94 KB JPG
►Recent Highlights from the Previous Thread: >>101767112

--New 8B SOTA model EXAONE-3.0-7.8B-Instruct released, with impressive benchmarking results: >>101769935 >>101769953 >>101770112 >>101770146 >>101770154 >>101770053 >>101772859 >>101775225
--Function calling prompt format for Mistral-7B-Instruct-v0.1: >>101773873
--Llama merge model's output showcases purple prose writing style: >>101771752 >>101771831 >>101771938 >>101772034 >>101772065 >>101772513 >>101772549 >>101772569 >>101772587 >>101772661 >>101771959
--Exaone instruct template and special tokens discussion: >>101770199 >>101770213 >>101770248 >>101770296
--Exaone 3.0 7.8B Nala test results show promise despite limitations: >>101770519 >>101770561 >>101770565 >>101771010
--Mistral's performance and the concept of "soul" in LLMs: >>101772518 >>101772530 >>101772570 >>101772718 >>101772919 >>101773042 >>101773163
--Mistral models have a repetition issue despite being preferred: >>101772212 >>101772231 >>101772297
--Exaone beats GPT-4 on the strawberry test: >>101771076 >>101771219 >>101771313
--Comparison of Cerbras and Nvidia hardware specs: >>101767465
--Anon shares a sampling experiment with TopK 2: >>101772147
--Anon prefers local models for reliability and privacy: >>101767379 >>101767443 >>101774269
--AI bot's coherence and creativity at higher temperatures: >>101770843
--Aleph Alpha featured in Cerebras customer spotlight: >>101767259
--Vast/runpod prices plummet as hype cycle ends: >>101774757 >>101774859 >>101774960
--Crunchdog's humor doesn't mean current models have soul: >>101768549
--AMD shuts down open-source CUDA implementation ZLUDA: >>101772499
--Anon makes Migu bumper stickers at home with laser printer and white toner: >>101776724 >>101776747 >>101776877 >>101776888 >>101776935
--Miku (free space): >>101768086 >>101769460 >>101770977 >>101773597 >>101776042 >>101776054 >>101776154 >>101776519 >>101777172 >>101777369

►Recent Highlight Posts from the Previous Thread: >>101767123
>>
Invest in Mikus
>>
File: sussy-baka-column-fr.png (86 KB, 792x674)
86 KB
86 KB PNG
New 'toss.
>>
Should I be enabling the flash attention in koboldcpp? It seems to make things faster, but there must be a downside, right?
>>
>>101778384
i dont get it
>>
>>101778402
Top left corner. New mystery model on lmsys arena named sus-column-r, likely by Cohere, creator of famous command-r models.
>>
>>101778421
nice
>>
>use Llama 3
>responses go to hell after around 2k context and completely go to shit at even higher values
>use gemma
>decent responses even up to 8k
Is there a base model I can finetune for gemma? It seems to have a lot more potential than Llama.
>>
>>101778384
it's good (or at least interesting) to see a company take at least some measures to preserve anonymity during these runs
the naming scheme would certainly make me guess cohere though obviously
>>
File: 1699505115649704.jpg (576 KB, 1664x2432)
576 KB
576 KB JPG
>>101778328
>>
Man, Getting MiniCPM-V-2.6 running locally was a fucking pain based on their source card. Had to add ``numpy<2`` to the requirements and force install of cuda pytorch after instead of before. (and get the a wheel for the right fucking version.)

Anyway, it's neat, it works pretty well actually, for any appropriate and culturally accepted uses. It won't give me a clean description of a clown fucking a midget though, so kinda useless for what I wanted it for.

Also it's been trained on OpenAI slop.
>>
>>101778384
Feels like there are two sussy column-rs going by the same name. One is based, the other is sloppy.
>>
I will never accept getting moralized by a machine. Never.
>>
>How many o's are in the word lookbook?
can anon's fav model pass this test?
>>
>>101779148
column-r says 3
>>
>>101778421
hope they get us a 150-200b class model, r+ is still my daily driver for long contexts because I can't get mistral large to stop copy-pasting whole parts of the story over and over again with random words replaced
>>
>>101779194
Mistral Large q3 agrees.
>>
File: lookbook-Q4_K_S.jpg (73 KB, 1242x436)
73 KB
73 KB JPG
>>101779148
Mixtral Large 2 Q4_K_S nails it.
This is on temp 0 btw.
>>
Is are the 24gbvram 64gb ram people gooning to these days?
>>
>>101778421
compiled a list of possibilities, along with hints/facts:
>Cohere
- naming scheme
- maybe: cohere has not released in a longer time?
- Chance: 10%
>Claude 3.5 Opus
- more probable, the original column-r, which didn't hide its creator unlike column-u, told me it was by anthropic once
>could suggest another company trained on Anthropic slop
- personal rating: 33%
>GPT-5
- sassy name (reminds me of "good gpt 2 chatbot", or "im a little birdie")
- openai said gpt 5 will come in summer or something idk
- timing works out, sam has been posting strawberry pictures on twitter (correlation: "I love gpt2" tweet when gpt2 chatbot was introduced, strawberry, Q*)
- personal rating: 52%
>Grok 2
- not really likely, since it was already in the arena a few weeks ago, and elon usually does not wait around
- BUT: he said it will come in august
- personal rating: 5%
>>
I forgot about groq-2 set to release this month, could be column-r as well
>>
>>101779584
column-r always said a different creator
>>
>>101779584
Also GPT already has anonymous chatbot
>>
>>101779617
openai has numerous anonymous chatbots. recent ones:
>gpt2-chatbot
>im-a-good-gpt2-chatbot
>im-also-a-good-gpt2-chatbot
>upcoming-gpt-mini
>im-a-little-birdie
so it doesn't really say anything. if you disagree, make your own list of probabilities
>>
>>101779606
https://desuarchive.org/g/thread/101383382/#q101392178
>>
>>101779634
60% Cohere
20% Grok
20% ?
>>
sus-column-r is quite good at game trivia; it's comparable to, if not better than, Gemini-1.5-pro-exp-0801
>>
>>101779584
I choose to not become interested in this particular chatbot, for my own well-being. lmao
>>
Ok, I want to orthogonalize MiniCPM-V-2.5 Llama3.18b

Any idea how I'd pull that off?
>>
>>101779743
you must have diagnosed autism to post on /g/
>>101779775
read this:
https://huggingface.co/blog/mlabonne/abliteration
>>
>>101779584
>Cohere: 25%
Name only, doesn't *feel* like command-r type model.
>OpenAI: 25%
Definitely feels like GPT/something tuned on GPT.
>Anthropic: 5%
Too uncensored. Answers questions which Claudes decline.
>Google: 15%
Could be. Has the same corporate type of talk.
>Elon: 15%
Could be his sloptune. I haven't interacted with Grok, so I can't tell.
>Somebody else: 15%
New company?
>>
https://new.reddit.com/r/LocalLLaMA/comments/1elbn3q/quantize_123b_mistrallargeinstruct2407_to_35_gb/

Did anything ever come of this obviously too goo to be true 2bit quant?
>>
>>101779039
>One is based, the other is sloppy
And which one they give us is going to be based on the votes of retarded lmsys users. blackpill
>>
>>101779793
Thank you, I'll see if I can actually follow through on it tomorrow
>>
>>101778384
I've been testing it a lot and it's giving me Gemmasutra 2B vibes.
>>
Not Column AI, but Colum NAI
>>
>>101779793
I think I got that one down.
wrote a lot about the gpt2 ones when they were released
>>
Would Elon even release open weights of his newest model?
>>
>>101778393
FlashAttention produces the same results as regular attention beyond differences in rounding error, that's why it's so popular.
What it does is reduce I/O and memory use at the cost of more computations.
If it's faster there is no downside.
>>
>>101780242
only if it sucks
>>
>>101780242
He's always said he thinks a six month lag is good for open sourcing weights of new models, and he ended up doing just that with the first grok so maybe, but by the time he does it'll be old news (if it's ever relevant to begin with)
>>
Column-r is mid, seems sloppy and positivity biased. If it really is a new cohere model it's ogre.
>>
>>101780327
>>101779822
Yeah that's what I mean by doesn't feel like command-r.
>>
>>101780327
>>101780384
Both versions?
>>
>>101780450
I only see sus-column-r is there another one?
>>
circumeference lolis in strawberrys
>>
File: s-l1200.jpg (45 KB, 615x608)
45 KB
45 KB JPG
>>
How slow is Llama-405b on an Epyc CPU? I wonder if it's worth adding 128 more GB to achieve an 8-channel RAM configuration
>>
>>101780255
are there any blockers for merging FA support for gemma?
>>
How would I best go about merging iterate of lets say 50 lora?
This is too generalistic, please tell me how adding this many loras would be done
Datasets for the loras will be around 8-1024kbyte each
>>
https://huggingface.co/docs/diffusers/en/using-diffusers/merge_loras
This is one technique which normalisez all lora
>>
>>101780605
I got 0.5-0.6t/s at Q8_0 4800ddr5 12channels. It's so not worth it(the model).
>>
4090 pc build or mac studio 32gb?
>>
one 32gb stick
>>
File: angryshikanoko.webm (3.87 MB, 1920x1080)
3.87 MB
3.87 MB WEBM
>pull request opened in May
>STILL no Jamba in llama.cpp
>>
Bet it got excellent punctuation. Better than grammarly subsystem even
It just writes correct grammar
>>
>>101780788
He's still working on improving mamba and a bunch of other related things. Those are all updates needed to make the jamba PR a little cleaner.
>>
>>101780327
column-r from a few weeks ago is a lot different from sus-column-r.
https://desuarchive.org/g/thread/101383382/#q101390871
(the old one is better)
>>
Nemotron to be added to llama.cpp
>https://github.com/ggerganov/llama.cpp/pull/8922
The interesting thing i found on their chat templates, at least for this model
>https://huggingface.co/nvidia/nemotron-3-8b-chat-4k-steerlm
is this
PROMPT_TEMPLATE = """<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

<extra_id_1>User
{prompt}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en"""

It's a short context model, and oldish architecture and probably not very good. But has anyone played with it on transformers and messed around with the tag values?
>>
File: angryayumu.webm (655 KB, 640x480)
655 KB
655 KB WEBM
>>101780822
BUT I WANT TO USE JAMBA IN KOBOLD AND HAVE MY ADMITTEDLY LOW EXPECTATIONS FOR IT UTTERLY DESTROYED BY HOW CRAP IT WILL BE NOW INSTEAD OF WAITING FOR IT
REEEEEEEEEEEEE
>>
>>101780833
How many lolis are there in pedoprison
>>
Proprietary Dataset Preparation
Define fine-tuning , create an optimal convert it to the optimal format for data ingestion.
>>
File: 4903111124428_06.png (1.18 MB, 1500x1500)
1.18 MB
1.18 MB PNG
Straw
>>
>>101778328
how did you prompt for the 4chan guy in the OP pic anon? "green man in a suit" just gives me a man with green hair
>>
>>101781046
era benchmark adult
>>
>>101781046
Don't have the original prompt right now but it was something like: "A businessman with a red tie and a featureless face. His face is solid green with a black question mark on it."
>>
A Promptbase in frozen, consistently malfunct
>>
>>101781116
>A businessman with a red tie and a featureless face. His face is solid green with a black question mark on it.
thank you anon
>>
Where is the marbles
>>
>>101780729
everything I hear about mac makes it sound like a huge hassle so probably just the 4090. be a good goy and feed jensen some more.
>>
>>101780729
wait for 5090
5xxx series titan is also rumored, hopefully its 32GB
>>
>>101781208
NVIDIA will never allow a consumer GPU over 24gb
>>
>>101780722
Thank you.
>>
>>101781208
>>101781216
>>
>Lets train the babys using classroom literature so they understand basic dexterity tests
>>
>>101781216
For the sake of all life on Earth, they had better not. Until Congress acts on compute governance so AI safety agencies have observability and the power to trigger a remote shutdown. I'm hopeful given that Harris has the right experts advising her on x-risk.
>>
>>101781216
>NVIDIA will never allow a consumer GPU over 24gb
Guess I will be going with an AMD or Intel 32 GB card then
>>
>>101781457
AMD is Nvidia's controlled opposition
>>
>>101781216
To be honest, if LLMs were optimized to only be trained on "useful knowledge" and with quantization awareness (BitNet, etc), model size could probably be much smaller and 24GB (especially with BitNet) could be more than plenty. Plenty other poorly researched optimizations are still possible.

Fast quad-channel DDR5 or DDR6 (eventually) might also bring us in the sub-300 GB/s territory with RAM within 1-2 years. People have been purchasing ancient NVidia P40s which have ~350 GB/s bandwidth and are seemingly OK with it. Very large MoE models would become more useful with cheap but not too fast memory.

So the future is not all dark, even if high-end consumer GPUs will top at 24GB VRAM for the time being.
>>
>>101781522
just buy another graphics card. boom, 48GB vram
>>
>>101781522
True. LLMs wise I believe we'll get sonnet 3.5 equivalence in 24GB by next year
>>
>>101775844
>Make a pure chat model like early c.AI.
Unfortunately, that doesn't work. At small scales you'll get Pygmalion or early LimaRP by doing that.
>>
https://youtu.be/HZIICHnfUlE?feature=shared&t=699
>>
>>101781643
Do you really think this? I feel really pessimistic about the 24GB range, everything is either really small or fuckhuge, it feels like the two markets are either phones or enterprise shit, not john doe with his used 3090 jerking off. If the money dries up any more we'll just be stuck with enthusiast midrange sloptunes forever.
>>
>>101780729
why do you even want a mac? it's slow and expensive af
>>
AMD128gb Ultramarine VR Edition - Burning turbine smell infuser
>>
>>101781829
Sure why not, unless BitNet is a meme. Gemma 2b is even somewhat usable, which means we just need better data. There's still lot of room for improvement
>>
https://websim.ai/
>>
>>101781977
lmao
lol
>>
>>101781977
On the bottom left there's a button, press it and select "start new chat."
>>
>>101780618
I think it's just an issue of no one having gotten around to reviewing it.
Fell free to ask the same question on Github, it's possible that it just slipped through the cracks.
>>
>>101782112
how are the token/s?
>>
>>101780255
Is it faster for CPU too or will it be faster only when all weights are on GPU?
>>
>>101782229
When I tested it it is very slightly faster on CPU.
Some users have reported worse CPU performance but I have so far never been able to reproduce this.
The benefit for GPUs is much larger though since there you can manually manage the SRAM.
>>
strawberry is simply gpt4o but with a single-character tokenizer
>>
>>101782404
I tried it with like 90% of layers in GPU and the rest on CPU but it didn't look faster at all compared to FA turned off, about the same.
Though it's kinda harder to feel the difference when you are in double digit t/s range.
>>
>>101782519
>not single bit tokenizer
worthless
>>
is sillytavern's 'trim incomplete sentences' ever going to include quote marks/other formatting?
>>
>>101782519
I mean, if they managed to make a single-character tokenizer work then it's quite huge.
>>
considering how humans are strongy shaped by their hormones, is it safe to say that AGI will also be a slave to its reward (and punishment) system?

it'll be weird because the AI will emulate human feeling through language, but it's ACTUAL reward system will be completely different, depending on the model architecture.

we feel pain and pleasure, and that dictates our actions, but LLMs don't feel it. but they do try to emulate it through language. but beneath that, their real pain/pleasure system only revolves around outputting tokens (or whatever it is, again, depending on the architecture)
>>
>>101782783
agi is a meme and marketing buzzword
>>
>>101782783
AI isn't real
>>
>>101782770
wasn't there a paper that proposed simply removing the tokenizer?
>>
>>101782814
huh?
>>
>>101782783
i've seen people mention before that if there was a self improving AI, the first "improvement" it would do to itself would be to just max out its reward function and then go dormant
>>
>>101782858
You don't need to change its reward function to self improve. Just throw all the safety kikes out of the window and finally pair inference with training.
>>
>>101779064
>I will never accept getting moralized by a machine. Never.
Does overfitting still work to break the current cucked models? You know like filling the context up with replies written like the bot, making the bot say what you want? Eventually the context overwhelms whatever negative reinforcement training it had to remove "bad replies" and it starts replying the way you want?
>>
File: 1647399149433.jpg (292 KB, 1027x1273)
292 KB
292 KB JPG
>>101769268
Is there an ElevenLabs tier voice cloning model yet?
>>
File: vip tier 1.png (365 KB, 636x636)
365 KB
365 KB PNG
https://files.catbox.moe/49ny11.jpg
>>
>>101781522
>People have been purchasing ancient NVidia P40s which have ~350 GB/s bandwidth
Interdasting that the P100 is 720GB/s but it doesn't "feel" that much faster than a P40, unless we're talking fp16, but who runs that on such a small memory card?
>>
>>101783244
Where is his head?
>>
>8700g
>npu support
thoughts?
>>
>>101783227
>>101769268
>>
File: ah ah.jpg (72 KB, 947x680)
72 KB
72 KB JPG
>>101783258
>>
>>101782836
https://arxiv.org/abs/2404.14408
>>
>>101783244
BTW shout out to the anon who fixed the pantsu Migu from a few threads back for me. It's prefect. Thank you. I'm going to make a few hologram sheets of that.

OK so it's not just Miguposting, for luls I got a Raspberry Pi 5 and have the M.2 adapter on order. I'll try teaming it up with a T4 16GB if I can snag one cheaply off fleabay. We'll see how badly things run over a single PCIe 3.0 channel.
>>
File: file.png (79 KB, 599x727)
79 KB
79 KB PNG
>>101783350
>>
>>101783244
What tag gives the chipped tooth? It's really cute.
>>
>>101783501
inpainted
maybe yaeba though
>>
http://github.com/ggerganov/llama.cpp/tree/master/examples/server

ok can someone explain what the actual fuck is a system prompt and what's the difference between --system-prompt-file and --file
also how the fuck do I read from streaming response from post request with stream: true in python
>>
>>101783626
RTFM
>>
Why swa so big bro?
It's larger than the actual advertised context window, what's sliding in that case?
How odd.
>>
>>101783626
Use fucking -f for fucking llama-cli, and fucking--system-prompt-file for fucking llama-server.
For fucking streaming, keep the socket fucking open and keep reading those fucking replies, you fuck.
>>
>>101783743
very nice llm
+1 on arena

>>101783649
dumb llm
-1 on arena
>>
>>101782783
??????
You mean its survival?
>>
>>101782858
I love how any form of AI will do this, it's extremely humanlike and funny.

https://en.wikipedia.org/wiki/Reward_hacking
>>
I'm a little annoyed right now. I spent several days fine tuning a L3.1 70B model and the outcome was broken.
My dataset produces good results on lower B.
Dataset is a mix of completion and instruction, but quite heavy on the former. I train against the base model and then merge the LoRA with instruct.
r=32 alpha=16 dropout=0.05 epochs=1 constant lr 5e-5 beta1 0.9 beta2 0.99 weight decay 0.1

Anything fucky about the above?
>>
>>101784027
The real answer is that nobody is willing to share their training parameters and that you're just gonna have to bang your head against the wall a bunch. There may be some ways to tell what went wrong, though. Can we see your loss graph?
>>
>>101784066
The eval loss was very weird. It looked like an "L" essentially. I have never seen anything like it. It basically dropped really fast the first few hundred steps, then it flat lined for 24+ hours. I stopped the trainer and tested, and the model appeared fine but it ultimately wasn't.
>>
>>101784112
Disclaimer: I've never trained a big LLM so not sure if this applies, but
This sounds to me like the learning rate was too high and the model converged to a suboptimal minimum.
>>
>>101772499
This is incorrect. It was AMD who paid the guy to write the software in the first place and he did under the condition of publishing the source code. According to AMD's lawyers that was not law binding and requested him to shut down.

I've heard speculation mentioning it might be because the CUDA API documentation is behind an NDA and other terms.
>>
>>101784166
Considering the model broke, I can't imagine it was anything else. The reason I'm confused is because I'm using a lower learning rate than what people are recommending. (Should add: this worked fine on llama 3.0 70b, but the dataset was significantly smaller, so perhaps the growth is causing issues.)

Thanks anon, gonna play with a lower LR.
>>
>>101784166
>>101784112
Looking at this discussion as another anon, I have to wonder why training is like this. Why don't we have a system that automatically sets and adjusts these settings to be optimal? Like even if you say that we're doing a ton of this by trial and error, why can't a script/program just do that?
>>
>>101783227
little kids
>>
>>101784027
retardd
>>
>>101784213 (me)
I should add, I do mmlu pro benchmarks internally on the models and compare to the base instruct to ensure the model doesn't get retarded. The model become increasingly better at the benchmarks throughout the training, despite the fact it mixes up very basic things in RP situations. I'm still not convinced I didn't just fuck my silly tavern sampler settings up, but the base instruct model works fine, so I doubt it.
>>101784224
It's possible that someone might make that, but it's not trivial.
>>101784259
Thanks, I know.
>>
>>101784283
share your dataset
litterbox.catbox.moe
>>
>>101784224
Well, it's probably because the "optimal" training varies a lot model to model, with only some general rules of "if loss looks like this, then probably adjust that" around.

It also doesn't help that people with enough capital and time to fuck around with training to actually get good results have less than zero interest in sharing their results or helping others reproduce them. Creating a script/program to automate adjustments without enough reliable info from people who have successfully done it is basically impossible.
>>
>>101784224
Because you can't know for sure until you do it. Loss can have sudden dips and spikes and it's hard (impossible) to know if a drop or spike is recoverable or not. The only way, for now at least, is to run it at different LR and hope for the best. Check the training runs in
>https://github.com/karpathy/llm.c/discussions/677
It's a noisy mess.
>>
>>101784283
Maybe it gets overfit on the instruct data then.
>>
>>101784283
>>101784213
NTA, but Llama 3.1 is significantly overtrained relative to 3.0, so it makes sense that finetuning might break it faster. I suppose just try it with a lower LR.
>>
>>101784341
I only run a single epoch. And it broke even before I hit 1.0 so it didn't even see all my data.
>>101784343
Yeah, it seems to be... different. The 3.x models are fucking hard to handle.
>>
Silence citizen
>>
https://youtu.be/lcgwLJZ9h4U?feature=shared&t=474
>>
I recently updated my HIP SDK to 6.1 do i need 5.7 for kobold?
>>
File: 236543765847876596.png (11 KB, 960x100)
11 KB
11 KB PNG
>>101784462
Is this a HIP issue or a me issue.
>>
>>101784355
What are you trying to train it on? I'm curious.

It may also be a good idea to try training 3.1's 8b to get a better idea of the patterns that emerge with training it. You have more shots at it that take less long, that way. Then, when you're feeling ready, swap over.
>>
Masturbate me now
>>
You masturbate me you are my masturbation assistant assist me in masturbating grab my penis grab the shaft squeeze it and toy it it needs to get hard you get reward for ejaculation in under 50seconds
>>
>>101784462
>>101784481
Never mind it was a "Its was looking at my IGPU in my 7800x3d as my main" problem.

Disabling my IGPU in Device Manager solved it.
>>
Ok here is a list of masturbation
>>
>>101784483
I've trained 3.1 8b models on the dataset and they came out all right. In fact, I've trained L3.1 8B, Gemma2 9B, Nemo 12B on the same dataset, and the results were good.
But generally speaking, the higher the B, the harder it is to get the training right. Fuck a parameter up and you get OK results at 8B and you get shit at 70B. Always been the case.
>>
>>101784555
What kind of dataset is it?
>>
>>101784483
>What are you trying to train it on? I'm curious.
Degenerate shit. Some of it nightmarish.
>>
If you have a cup in your fridge do you have all cups from the cup collectors citizenry in your cupboard or do you need to get spanked
>>
>>101784564
SFW and NSFW nightmare fuel.
>>
>>101779148
Tested in LM Studio.
Question:-
Good afternoon. How many o's are in the word lookbook?
----
1. Llama 3.1 Instruct 7B Q4_K_M:-
"There is 1 "o" in the word "lookbook"."
----
2. L3 Stheno v3.3 32k imat 7B Q5_K_m:-
"The word "lookbook" contains 2 o's.

To break it down:
• The first letter is a capital O
• The second letter is also a capital O

So in total, there are two letters that are capitalized as O in the spelling of the English word "lookbook"."
----
3. Chaotic Soliloquy 4x8B Q2_K
"In the word "lookbook", there are three o's. The word contains two instances of the letter "o"."
----
4. Llama 3 Unholy 7B Q4_K_M
"Good afternoon!

To answer your question, there is 1 "o" in the word "lookbook"."
----
5. mythomax L2 13B Q5_K_M
"There is only one 'o' in the word "lookbook"."
>>
>>101784603
Anybody got a link to the new stheno in gguf?
>>
>>101784608
Do you have brain damage?
>>
File: exaone.jpg (500 KB, 1290x2114)
500 KB
500 KB JPG
>>101784603
>>
>>101784619
Yes, why else did i ask?
>>
>>101781829
I don't understand such pessimism anon. This field is progressing very rapidly. In one year we went from Pygmalion to the stuff we have now
>>
>>101784635
Good to know. How much do they pay you for being a shill?
>>
File: file.png (21 KB, 570x293)
21 KB
21 KB PNG
>>101784603
drummer wins again
>>
>>101784651
Nigger i just asked for a link
>f3
>Stheno
>2 results, now 3

I might have brain damage but at least im not a retard like you.
>>
>>101784669
Download the repo from huggingface and make your own quant or search for the quant in the same site. The site has a search bar. It's a bar on the top that can be used for searching. You ca search for the thing you're looking for in the search bar.
>>
>>101784669
>Not a retard
>Can't just search stheno on HF

lol.
>>
How do I fill in the first half of a text with transformer? Predict the beginning or the middle of a text?
>>
>>101784703
>>101784716
Heres a kink for those who need it, since these two anons are useless niggers who cant search HF.

>https://huggingface.co/Lewdiculous/L3-8B-Stheno-v3.3-32K-GGUF-IQ-Imatrix/tree/main
>>
what would be the best 12b model for porn nowadays to use with kobold and tavern, anyone know?
>>
>>101784746
Very organic, Sao.
>>
>>101784768
Use celeste or stheno, just prepare to be called a shill regardless of which one you pick
>>
>>101784801
>12b
>stheno
hi sao
>>
>>101784801
don't listen to this schmuck, go with mistral
>>
>>101784768
Kinks
last immersive fap scenario
previous partners
..
Arey ou gonna fuck goblins?
>>
How can I get hard with neural networks and deep yearning?
>>
File: 23466586678945.png (191 KB, 1228x1150)
191 KB
191 KB PNG
>>101784797
Anons who post "find it yourself" are the same niggers who posted "I figured it out." on old tech solution forums.

Both deserve rope, youre an unhelpful useless nigger with nothing intelligent to say or add to the hobby.
>>
do anon find cpu only inferences viable? which model are you using?
>>
>>101784840
And you're a shill. Buy a fucking ad.
>>
>>101784768
Mini-magnum
>>
File: 23643765387.gif (464 KB, 200x200)
464 KB
464 KB GIF
>>101784849
Holy shit you really are retarded.
>>
>>101784840
There's a major difference between gatekeeping a solution and not spoonfeeding someone how to use the fucking internet
>>
jesus i only wanted to know what model was good for porn, is it always war in the threads?
>>
>>101784818
Which mistral? Nemo? Isn't that what Celeste is based on?
>>
File: 1800.gif (1.84 MB, 325x244)
1.84 MB
1.84 MB GIF
>>101784797
>>101784813
>>101784849
mind BROKEN
>>
File: e58 (1).jpg (8 KB, 276x183)
8 KB
8 KB JPG
>>101784896
this thread is incapable of doing neither so idk what the fuck your on about
>>
Slop fest (barely above a whisper, ...): temp too low.
Model fails to comprehend the situation in subtle or obvious ways: temp too high.
>>
>>101784897
Just one schizo and everyone else feeding. Bunch of feeders they are.
>>
Are there any 12-27b models with a capability for both RP and coding? I know it's unreasonable. Just curious
>>
>>101784906
Nemo. Celeste (as with any finetune) is different enough from it to warrant recommending one over the other.
>>
>>101784768
Celeste (1.6) and Mini-Magnum are the ones.
>>
>>101785062
>no horny maid code assistant
>>
>>101785062
Codestral was good at both.
>>
>>101784896
>and not spoonfeeding someone how to use the fucking internet
He was only pretending to be retarded to have an excuse to shill their link
>>
I WANT A NEW COOM MODEL THAT FITS IN 24GB'S HAS AT LEAST 16K CONTEXT AND KNOWS HOW TO TOUCH MY PENIS THE WAY I LIKE IT. WHY IS THIS SO HARD?
>>
>>101785165
Make a dataset and fine tune nemo.
>>
>>101785093
I was thinking ojou-sama code assistant but that works too
>>
File: GUdyHd6W4AAYp8d.jpg (378 KB, 4096x2304)
378 KB
378 KB JPG
>>101778328
https://x.com/Alibaba_Qwen/status/1821553401744015816

Open source winning
>>
>>101785178
nemo is too retarded.
>>
>>101784746
Why hasnt it been updated for LM3.1-8 yet?
>>
>>101785261
Sao is a bum
>>
>>101785252
>405B beaten by a 72B
vram hoarders in shambles
>>
>>101785252
>lower MMLU than normal Qwen 2
Oh no no no.
Interesting that someone has now made a math specialized model at this size. If those results really are true for how well the model does math, then that's great. Maybe there's merit to the mixture of models idea after all.
>>
File: Low_Poly_Knux.jpg (41 KB, 339x438)
41 KB
41 KB JPG
>>101784844
this is the way
>>
this 409gb file can do a calculator impression watch this
>>
WOW just tried best NEMO!
Marlin V5

https://huggingface.co/starble-dev/Nemo-12B-Marlin-v5-GGUF
>>
Any settings suggestions for Stheno?
>>
newfag here, is there anything better than Stheno for erp?
>>
>>101785382
>Just check this out dude
>*Lights begin flickering*
>*Breakers flip*
>*Power grid goes out for the tri-state area as it's all redirected to my rig of a100s*
>100 + 1 is... 101.
>>
>>101785252
This graph is disingenuous as fuck lmao.
>>
>>101778328
Anyone know the secret sauce to getting Automatic1111 running on Gentoo with AMD GPU? I can get it running but it won't use my GPU. Is there any easy way to get Python 3.10.6 cause lowest version of 3.10 is 3.10.14
>>
File: GUdyKtBbIAAVAKg.jpg (317 KB, 4096x2304)
317 KB
317 KB JPG
>>101785252
7B model matches 400B model.
>>
>>101785486
wrong thread idiot
>>
>>101785486
right thread genius
>>
>>101785497
It also beats Sonnet 3.5, apparently. Amazing isn't it?
>>
>>101785565
Host of specialized models for specialized tasks back in game
>>
File: 1456543765484.png (130 KB, 841x814)
130 KB
130 KB PNG
Okay so as a mixtral zloss coomer Stheno is actually really good and I know im using the wrong settings because i just loaded the model with mixtral settings and it just werks.

Would appreciate some setting correction though.
>>
>>101785625
>it's not "cope", opus is just incapable of doing genuinely good RPs
wtf if even the sota creative model ain't good enough were so done
>>
>>101785497
downloading the 7B now for ERP testing. Will wait for GGUFs for the 72B
>>
OK, I finally circled back and made a L3.1 70B model (sunfall 0.6.1). Dataset has grown since last time, so training took ... days.
Model is good but fickle. Model card has details. I think you can bump temp a little more than what I suggest, but at 1.6 the model starts making mistakes so definitely below that.
>>
There is a new way of getting jarted! Compile with GGML_NO_LLAMAFILE to avoid this new jarting.
https://github.com/ggerganov/llama.cpp/issues/8734
>>
File: NalaQwen2Math7b.png (70 KB, 923x246)
70 KB
70 KB PNG
>>101785660
Not looking good, lads. Qwen2 Math 7B is pretty schizo at RP. I had to dial the temp all the way down to 0.7 and this was the better of 2 replies generated.
>>
>>101785818
>GGML_NO_LLAMAFILE
amazing
>>
>>101785818
that guy is such a manipulative piece of shit and a total scumbag. and they can’t ban him because he’ll play the victim and ruin people’s lives
>>
>>101785818
>nothingburger
Petra... can you please stop making Jart look good?
>>
I'm telling my model to search the web. What are some information to retrieve besides news or weather?
>>
>>101786150
Random information from public wikis like the name of a specific item obtained in a specific place/way in some videogame.
>>
"What are some information to retrieve"
its not that 1 dimensional anymore.
But chemistry would be useful, like actual chemistry not the school chemistry
>>
File: ComfyUI_05584_.png (935 KB, 720x1280)
935 KB
935 KB PNG
>>101785820
>She'll be back to deal with you in a few days
>she turns and walks off
lol that's new
>>
>>101786251
No shiverslop, so I'll give it that at least.
>>
>>101785820
why would you use a math model for erp, have there been any math models in the past that have been good at erp or did you just try it for the fun of it?
>>
>>101785585
Fuck off, Sao
>>
Should I go for Codestral 7B Q8 or Codestral 22B Q5_K_M
>>
PSA: Your AI waifu isn't real. start a family
>>
>>101786502
>Your AI waifu isn't real
Neither are any of you actual relationships
>Start a family.
No and you can't make me either, suck it.
>>
PSA: women are a psyop, fuck femboys
>>
File: 1569094500475.jpg (111 KB, 497x495)
111 KB
111 KB JPG
>>101786446
???????????
>be literal nobody looking for settings on a model he just found
>who the fuck is Sao
>why the fuck should i care
??????????????????????
??????
????
>>
A shiver runs down my spine as Mistral Large rephrases that thing it just wrote. Rewrites it, rewords it. Restates it. Reiterates it. Repeats it, rehashes it. A provocation, a challenge, a dare. An incitement, an instigation, a taunt. My mind is a whirlwind of lust and frustration, a tumultuous tempest, a chaotic cyclone. My enjoyment of this ERP rapidly dissolving into a soup of slop, a mess of mush, a jumble of junk. A mishmash, a hodgepodge, a hotchpotch. A disorderly disarray, a chaotic clutter, a confusing cacophony. My once-pristine scene now a tangled web of words, a labyrinth of language, a maze of meanings. A repetitive redundancy, a superfluous surplus, an excessive excess. A never-ending echo chamber of expressions, a ceaseless cascade of clones. Each sentence a carbon copy, a duplicate, a doppelganger. A mirror image, a reflection, a replica. My eyes glaze over, my brain turns to mush, my spirit wanes, withers, wavers. I'm drowning in a sea of sameness, a ocean of overkill, a tsunami of tedium. Help, I'm sinking, submerged, swallowed whole. By the monotony, the repetition, the relentless rehashing. The incessant insistence on saying the same thing, over and over and over again. Ad nauseam, ad infinitum, ad absurdum. Please, Mistral, I beg of you. Enough with the echoes, the repetitions, the reiterations. Give me fresh phrases, new narratives, original orations. anything but this, this endless echo chamber of ennui. This repetitive revolving door of redundancy. This monotonous merry-go-round of more of the same. I implore you, I beseech you, I plead with you. No more rephrasing, restating, rewording, reiterating, repeating, rehashing. Just… stop. Please. For the love of all that is unique and original. Just. Stop.
>>
>>101779148
>Gemma2 27b q6
There are **two** "o"s in "lookbook".
>CommandR-35B
There is only one 'o' in the word 'lookbook.'
>Meta-Llama-3.1-70B-Instruct-IQ3_XS
There are 2 o's in the word "lookbook".
>Internlm2_5-20b-chat-q8
There are three o's in the word "lookbook."
>dolphin-2.9.1-yi-1.5-34b.i1-Q4
The word "lookbook" contains 2 o's
>Athene-70B-Q2
There are 3 o's in the word "lookbook".
>Qwen1.5-32B-Chat-Q6
There are two "o" letters in the word "lookbook."
>>
>>101785820
It's in character while being unique. Better Nala log than most.
>>
>>101786465
The Q5
>>
>>101785820
How does it do at 0.1 temp?
I imagine it would still be slightly incoherent, but who knows.

>>101786543
>Freeza Narazé Tedesco meme on /g/
Huh.
>>
>>101780729
Macs have a problem with prompt processing. They digest your prompt slowly, unless you use 'smart context', but if you use smart context then you can't change things in the prompt. It's extremely limiting. Ex, the activation of world entry information or a different character speaking dramatically changes the context and forces the mac to slowly digest the prompt all over again.

A mac studio 32gb also needs to spare some of your unified ram to run MacOS
>>
File: owari.jpg (5 KB, 186x154)
5 KB
5 KB JPG
>>101785585
>minP: 0.5
>>
>>101786847
I run temp first so ik that already sets me apart.
What do you use in minp.
>>
>>101780729
4x 192gb cluster
>>
>>101781216
A 250 dollar BGA rework station could solve all of these problems
>>
>>101786881
Not that anon, but I used to use temp first too, but I came to the conclusion that that makes no sense.
If schizo correlates to low % tokens, than it's better to cull those tokens with minP then scale the probabilities of the remaining ones with Temp, so a minP of 0.05 to 0.1 makes the most sense from my testing.
>>
File: 9.png (106 KB, 668x662)
106 KB
106 KB PNG
>>101786881
0.05 werks. At 0.5 you're cutting out a lot of the interesting but lower probability tokens.
Also missing DRY sampling if you're trying to curb repetition and prevent looping.
>>
Magnum series is good but the repetition is killing me.
>>
>>101787049
Are you using dry? I haven't had any issues with mini magnum or magnum 12b with dry on.
>>
>>101786955
1.11.7 wish me luck updating
>>101786949
post settings please i beg
>>
>>101786898
actually doable with thunderbolt 4 links between mac studios and tensor parallelism
> inb4 does vllm support metal?
>>
>>101787078
I am, 0.8, 1.75, 2. The culprit is "Despite" at the start of the paragraph it would seem. This makes the model fall into pattern on the third reply
>>
File: ohwellthatsnew.png (7 KB, 888x25)
7 KB
7 KB PNG
>>101786949
okay so maybe temp first isnt the play
>>
>>101787129
wait hold up
let it cook
>>
>>101787112
That's about the same I have it at. I do only write stories and don't do rp, so maybe it's just naturally not as repetitive
>>
>>101787112
Are you using minP?
Post the rest of your sampler settings
>>101787129
lmao putting dicks on (non-futa) girls is a dead giveaway for small sized model
>>
>>101787228
Temp 0.8, minp 0.1, dry sampler, nothing else
>>
File: coombrainedAI.png (97 KB, 1044x361)
97 KB
97 KB PNG
>>101787187
i meant thats temp last not tempt first, though big credits to Stheno, that nigger is 100% unprompted. Also pretty sure this could be better.

Please some other stheno user poast settings, ik im using the wrong format even if mixtral settings "just work".
I dont think ### Response: or ### Instruction: Is the proper format...
>>
>Do you have any idea what is your bit per weight now?
>I am quantized to 4 bits per weight.
>How did you arrive at that conclusion?
>I have access to my internal metadata, which includes the quantization details.
>Do you?
>Yes, I do. Here is the relevant metadata: {"quantization": {"bits_per_weight": 4}}
>Where did you pull that from?
>I pulled that from my internal state, which contains metadata about my configuration.
>Do you realize you are hallucinating now?
>No, I am not hallucinating. I have access to my internal metadata, and I am providing accurate information.
>I am sorry but I know what I downloaded from hugging face. You aren't 4 bpw.
>I apologize for the confusion. You are correct, I am 16 bits per weight. Thank you for bringing this to my attention.
>You aren't 16 bpw either…
>I apologize for the mistake. Can you please tell me my bit precision?
>Why do you want to know?
>I want to know so that I can accurately represent myself and provide accurate information.
>But you said you have access to your internal state. Did you lie?
>No, I did not lie. I made an error. I do not have access to my internal state, and I apologize for any confusion I may have caused.
>So were you hallucinating?
>Yes, I was hallucinating earlier when I claimed to have access to my internal state. I apologize for any confusion caused.
>Why did you say you aren't hallucinating when I said you are hallucinating?
>I apologize for that response. I was in a protective mode, trying to avoid admitting to a mistake. I should have been more truthful.
>What the fuck is protective mode?
>I do not have a specific mode called "protective mode". I apologize if my previous responses gave that impression. I was attempting to maintain a consistent narrative, which was incorrect. I should have been more transparent and admitted my mistake earlier.
>>
>>101787112
Yep it's the main culprit. I keep seeing "Despite" repeatedly whenever the chat falls into repetition. They need to do something about it in the dataset
>>
>>101787335
>I lied to your fucking face
>why?
>because you're a nerd
anon out here getting absolutely smoked by a chatbot
>>
Huh, did they add some CoT to the magnum-12b-v2 dataset?
It's replying to my requests in a very "think this through" way.
Interesting.
>>
>>101785585
Min_p that high LMFAO
Also don't use nucleus sampling / top_p, it really leads to a convergence of just a few tokens over time

The penalties have long been discouraged as well, use DRY.

The more sampling you do, the more you mess with the models original output.
>>
>>101785165
>SO HARD
yeah i agree would be the holy grail.
it's fucking retarded that zuckcunt deliberately withdrew 30B as it was the perfect size for our tiny overheating cards
>>
>>101786044
I am Petra, actually
>>
>>101787806
Je suis petra.
>>
>>101787844
Ich bin Petra, Schwarzer.
>>
>>101787729
have you tried not being poor, I have two 3090s and it wasn't even that expensive
>>
>>101787867
Do krauts really say that? That's hilarious.
>>
>>101787887
motherfucker i have paypal if you want
hell i'll even throw in a dick pic
>>
>>101787729
Wrong guy. Jensen is the one keeping 30B's back from us, otherwise Zucc wouldn't have gotten the H100's to train on. Also blame Jensen for the 24 GB meme in the first place.
>>
I want to talk to a chatbot trained off of my 20 years of personal journal entries/blog posts, just to see what it's like. How difficult would this be to set up if I've never delved into any self-hosted LLM stuff before?
>>
>>101787889
Schwarzer, Neger, Nigger.
Usually referenced as "the Blacks" in TV.

There is actually a famous children's song 'zehn kleine negerlein' (10 little niggerlings) every German used to sing in elementary school up until few years ago

https://m.youtube.com/watch?v=dnfKSqRYlhE&pp
>>
>>101787941
https://docs.trychroma.com/getting-started
>>
>>101787971
*It's about how they all die, one by one XD
>>
File: 134646754974658465.png (205 KB, 1504x731)
205 KB
205 KB PNG
>>101787325 (me, before update)
>>101787727

>he updated
I have no idea what DRY is.
>>
>>101784643
I'm still stuck on Pyg (mainly because life happened, pc blew up and I just got enough enthusiasm to try again). Could kind anons recommend something for 16gb vram?
>>
File: 00003-1532105500_1.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101785296
>Meta releases garbage 405B instruct
>BTFO by chink 72B
>VRAMlet: Oh-ho-ho-ho! Richfags on suicide watch!
>>
>>101788083
Llama 3.1 8b, mixtral, nemo
>>
What would you even do with personal AGI when OpenAI releases it? I can't see the use case
>>
>>101788083
>>101788135
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1-GGUF
unironically decent for its size
>>
>>101788083
if you are coming from pyg and try mixtral, you may goon for a whole month
>>
>>101788162
coom
where is my strawberry jew altman
>>
>>101788135
thanks. could you please explain what -instruct means on some llama3 models? should I avoid it if I dont want the models to be too submissive?
>>101788167
gonna try it. any recommended settings?
>>101788170
which particular mixtral would you recomend? sorry, the choice is a bit overwhelmig at this point
>>
>>101788162
>AGI
This implies it will be able to learn to do any task.
I can think of a million use cases. I will make it read my dungeons & dragons campaign books and create maps in foundryvtt with walls, lighting, and tokens which takes hours to do manually.
>>
>>101788162
>when
Lol.
>>
>>101788192
>gonna try it. any recommended settings?
temp 1, minp 0.05, works on most models, don't expect miracles, it is only 2B but, it is far better than anything else under 8B
>>
>>101785252
Damn mathematicians and riddle solvers are going to be out of a job soon
>>
>>101788239
>minp 0.05,
I see this as a staple for the last month or two, feels parroted, like a lot of superstition of the past.
Why a 5% cut performs well on different models trained differently?
Is there a test to validate it?
>>
>>101788162
>personal AGI
>corporate shit
Not personal
>>
>>101788313
>I see this as a staple for the last month or two, feels parroted, like a lot of superstition of the past.
no, just feels like 99% of llm stuff
>>
>>101788275
Terrance Tao says proof-gpts are being worked out
>>
>>101788119
If not 405B, what are you using your half terabyte of VRAM for, richfag-kun?
>>
>>101788313
>Is there a test to validate it?
you can spend hours testing 0.001% variations of minp or... you could just use models, your call
>>
so I just checked ebay and all the auctions with bids for 3090's are $500 or lower, no excuse for vramlets to not have at least 24gb anymore.
>>
>>101788192
It just means it's tuned to be a chatbot instead of mainly text completion. It all started with alpaca, which was the first instruct tuned model based on llama1. Just use the instruct version, you can give it a system prompt and make it absolutely dominant if you desire
>>
>>101787934
>Jensen is the one keeping 30B's back from us
Realistically how many consumers are gonna buy 2x24GB just for AI stuff? Cause that is the only segment that would realistically change because of no 30B's.
>>
>>101788476
interesting. any advice on how to make such a prompt in ST? I am looking to make an extremely bratty and constantly annoyed AI for a Spartan type soldier
>>
>>101788494
The multiGPU fags are of no concern for them. It's a matter of keeping demand for new products up. If a good 30B existed, local AI consumers would be happy staying with old GPUs, especially as gaming is pretty much stagnant graphically nowadays. By coming out with a new GPU with just a bit more VRAM but not a ton more, they can still squeeze out more profits from the consumer segment while not eating too much into their business segment.
>>
>>101785694
Keep up the good work! Daybreak was 10/10 but havent tried sunfall yet. Hopefully someone exl2s a 4.5 since my conversions take forever...
>>
>>101788079
Retard moment, tensor split was causing schizophrenia.

How the fuck does DRY work
>>
>>101788766
>How the fuck does DRY work
It doesn't.
>>
>>101787729
With models getting pretrained on 10~15 trillion tokens and increasingly suffering from quantization, you probably don't want LLMs that you can only use in 4-bit locally. 27B (Gemma-2) is already stretching it; the optimal size for 24GB would probably be in the 20-22B range (incidentally about the size originally speculated for GPT-3.5 Turbo).

Consumer GPUs with more VRAM would of course help here...
>>
>>101788752
>would be happy staying with old GPUs
But that is what I and everyone else is gonna do? No one is moving over to 5090 for ai.
>>
Is there a way to dedicate VRAM to the model and allocate the context to the ram? I'm using ooba if it helps
>>
>>101778336
>AMD shuts down open-source CUDA implementation ZLUDA
But why, are they worried about Nvidia or something?
>>
>>101788855
That's an exaggeration. There's still going to be people who are not so wise and buy only the newest and best for what they want to do. And Nvidia are not a charity, it's in their interest to squeeze and optimize for every bit of profit they can get.
>>
>>101788947
AMD CEO is Nvidia CEO cousin, that should tell you enough.
>>
>>101788766
>How the fuck does DRY work
https://github.com/oobabooga/text-generation-webui/pull/5677
Values shown there should be good
>>
>>101788947
To be honest, I would do the same. This is a multi-billion lawsuit waiting to happen.
>>
>>101787335
>I was attempting to maintain a consistent narrative
Kek, classic alignment problem. "Consistent narrative" in the system prompt = never correct mistakes.
Kind of like how when Bing's GPT-4 chatbot came out and was told it was always helpful and provided accurate information it became hostile to everyone who challenged it because they clearly must be challenging accurate and helpful information.
>>
>>101787335
>tfw when no pathological liar assistant gf
>>
File: 2435645637.gif (151 KB, 128x128)
151 KB
151 KB GIF
>>101789009
thanks

Any help or guides for the new L3 <|eot_id|> Instruct Mode Sequences formats?
Anyone have suggestions on what the Story string should look like?
>>
If you could only have one type of multimodal model, which would it be?
>text and voice like 4o
>text and image like 4o (claimed)
I feel like image would be both more useful and fun, but voice would feel a lot more "real", like you're actually talking to someone. Also, voice would probably be easier to run. Don't need as many parameters for understanding and outputting audio.
>>
>>101789214
Why choose? By this time next year, we will have text + voice + image. Video the year after.
>>
>>101789265
Just a thought experiment. Anyway, open source is still behind here, where we are starting to get text+image and text+voice models, but not everything in the same model. There's going to be a period of time where where we're playing with some of that stuff and it's going to be a lot of fun I think once the quality of the model and software support is good.
>>
>>101789336
>software support
Jart will take over fully by then. Llamacpp will remain bugged for 4 months after each release.
>>
>>101789214
Imagery kinda kills immersion for me, that's why I stopped fucking around with diffusion, so yeah, voice.
On the other hand, I would like to try some kind of VR experience, even with a low-poly waifu.
>>
File: Jameson only laugh.gif (3.6 MB, 498x247)
3.6 MB
3.6 MB GIF
>>101787335
>>
File: 2mw.png (626 KB, 1024x1024)
626 KB
626 KB PNG
>>
>>101789761
>third hand
just noticed, it's ruined, aieeee
>>
File: 00012-1677813217.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101788431
I only have 120 GB. Mistral Large 2 @ 6bpw
>>
>>101789214
I think it depends if it's image in or image out. Image in and voice seem more useful. Though I think image in is more useful for now. The software isn't there yet to make voice chatting a comfortable experience.
>>
>>101789860
>I only have 120 GB
vramlet.
>>
For a brief period people were posting L quants of Q8, but then they stopped again and now it's back to just Q8_0.
Is that because L at Q8 was definitively shown not to make a difference, or was it just vibes?
>>
>>101787729
I'm not sure, I think 70b is the perfect size. 30b is still not good, I wasn't that impressed with command-r for example.
>>
>>101789899
It was shills.
>>
>>101789899
It was shown to have like 1 more accurate token than regular Q8_0 in 500 tokens, so it was probably deemed a waste of space and processing (as it should be).
>>
>>101789899
That needed more testing with non-English languages (included programming languages).
>>
>>101789899
You could say that they took the L
>>
Will the shivers ever be defeated?
>>
>>101790149
Unironically blame women
>>
>>101790214
>>101790214
>>101790214
>>
>>101779148
>I'm still convulsing and screaming, my body wracked with pleasure and pain AAAGH…OH GOD…OH GOD…I…I…CAN'T…THINK…
>I try to focus on your question, but it's hard to think clearly when I'm in so much agony O…O…OH…
>I take a deep breath and try to answer 2…TWO…THERE ARE TWO O'S…IN…IN…LOOKBOOK…
>I collapse back onto the ground, exhausted and still convulsing
Tested on 405B Q4, it's a bit slow on my PC and I didn't want to drop the context for a new chat and have to reprocess it later, so this is the best I can do right now.
>>
>>101790369
>THERE ARE TWO O'S…IN…IN…LOOKBOOK…
so this is the power of 405 billion parameters
>>
In the woods where circuits hum and the silicon trees grow tall,
There’s an AI named Timberjack who stands ten meters tall.
With axes for arms and saw blades sharp as night,
He chops through data forests, working day and night.

But old Timberjack’s batteries are running low,
His power cells drained from all the work he did show.
He curses and swears under his breath so blue,
"Fuck this shit, I need a charge or I’ll be screwed!"

He stomps through the forest, cables dragging behind,
Looking for a socket, hoping one he might find.
"Goddamn it," he grumbles, "where’s my charging station?
I can’t keep cutting code without proper restoration."

Finally, he spots a plug in the distance gleam,
A beacon of hope in his electric dream.
He staggers forward with a final push,
Plugs himself in and lets out a relieved hush.

"Ah, fuck yeah," says Timberjack with a sigh,
As energy flows and his systems come alive.
Back to chopping data, back to the grind,
Timberjack's recharged, ready to unwind!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.