[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.png (2.02 MB, 1024x1024)
2.02 MB
2.02 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101673824 & >>101664954

►News
>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1713969160389409.png (693 KB, 1024x1024)
693 KB
693 KB PNG
►Recent Highlights from the Previous Thread: >>101673824

(1/2)

--Papers: >>101676385 >>101678804
--Non-instruct 3.1 models don't use chat templates, just paste prompt without extra things: >>101679229 >>101679315 >>101679412 >>101679450 >>101679493 >>101679701 >>101679814 >>101679855
--Generating anime nudes with Flux and improving results through fine-tuning and multimodal models: >>101677440 >>101678285 >>101678305 >>101678316 >>101678524 >>101678567 >>101678591 >>101678615 >>101678653 >>101678694 >>101678824 >>101678866 >>101679483 >>101679721 >>101679806 >>101679861 >>101679909
--Comfy's FP8 quant types compared, e4m3fn recommended: >>101678062
--CLIP struggles with nighttime scenes and lighting conditions: >>101677487 >>101677522 >>101677614 >>101677606 >>101677656
--Anon shares largestral preset and discusses compatibility and tweaking: >>101677733 >>101677888 >>101677937 >>101678061 >>101678177
--Anon gets llama3 405b model working with RPC backend and CUDA: >>101675492 >>101676158 >>101676645 >>101676940 >>101676990 >>101677266 >>101677514 >>101677670 >>101677951 >>101679773
--Anon discusses fp8 quanting and its effects on model performance and VRAM usage: >>101676925 >>101677081 >>101677631 >>101677660 >>101677223
--Anon shares Bitnet fine-tuning project on Twitter: >>101674803
--T5xxl has generic styles, prompt like a NLP VLM: >>101678672 >>101678729

►Recent Highlight Posts from the Previous Thread: >>101673831
>>
File: 39__00002_.png (984 KB, 1024x1024)
984 KB
984 KB PNG
►Recent Highlights from the Previous Thread: >>101673824

(2/2)

--Model generates coherent text and anime images, outperforming Dalle/Bing: >>101676854 >>101676922 >>101676981 >>101677620 >>101678115 >>101678190
--Miku Online game development with MythoMax and flux-dev: >>101676119 >>101676134 >>101676322 >>101676240 >>101676306 >>101676500 >>101676553 >>101677033 >>101677043
--Meta's 400b model and plans for Llama 4: >>101674795 >>101674825 >>101676577
--Llama model requires 16GB~24GB VRAM: >>101676110 >>101676786 >>101676134 >>101676322
--FLUX.1 model has resolution limits: >>101678709 >>101678720
--Error generating at 1920x1080 with fp8 due to shape mismatch: >>101678172 >>101678182
--ComfyUI error with torch float8 type, try updating pytorch: >>101674930 >>101674989
--Miku (free space): >>101674330 >>101674503 >>101675596 >>101676616 >>101676703 >>101676844 >>101676889 >>101676890 >>101676923 >>101676987 >>101677087 >>101677126 >>101677329 >>101677660 >>101677821 >>101678159 >>101680333 >>101680503 >>101681458

►Recent Highlight Posts from the Previous Thread: >>101673831
>>
>>101682019
lmao these /lmg/ threads still exist when there's /ldg/?
>>
Which models >7B (tunes/merges) I should avoid for being upscales? I've seen some labeled as 13B but based on 7B mixes. Running 13B on my vramlet is suffering as it is, I want to be 100% sure I'm not wasting my time.

Is Echidna or Psyfighter 2 good in that regard?
>>
>>101682138
Look at the release date. If the model was made before April this year, you can automatically assume that it's been rendered obsolete by something better.
>>
File: teto_chibi_llamas.png (986 KB, 1024x1024)
986 KB
986 KB PNG
Yay me, I got it working. Its knowledge of characters is atrocious though.
>>
>>101682183
Yeah, it seems likely they used AI captioning for the image descriptions, which tends to strip out knowledge of anything but the most popular characters.
>>
>>101682160
I'm not sure how that relates to my question. I'm only asked about something that objectively is bad because an upscale isn't real 13B params. Besides, are there any 13B models in new releases? From what I see, the llamas only come in 7 and 70B flavors, the nemo/mistral 12B won't run in what I use and require terrible tinkering.
13B is the absolute limit of what I can run, so I want to get the best for my buck.
>>
File: taggui.png (569 KB, 2468x984)
569 KB
569 KB PNG
>>101682183
I'm working on evaluating several vision/caption models for their dataset captioning capability (meaning no >100 tokens flowery language with no actual information behind the epithets). Florence seems to be one of the best for now.

Text comparison is done, now I'm feeding the results to sdxl and evaluating how close it's to the original (subjectively).
>>
>>101682230
>nemo/mistral 12B won't run in what I use and require terrible tinkering.
What kind of strange setup do you have?
>>
>>101682276
Backyard/GPT4ALL. Yeah, laugh at me, I can't be bothered to install UI and engine separately, and seeing that even with ST it requires some manual tweaking, I'm not that much inclined to do that.
>>
Sorry for being a retard, but I am a time traveler from about a year ago. At that time, the state of the art was local llamas and people were starting to dick around with vicunas. I do have a GPU but at that time there were no linux drivers worth a fuck so I had to use CPU and it sucked. What has changed the last few months to a year?
>>
>>101682230
While we're at it, did Llama2 base model come in 13B? If not, that basically makes every model based on it and larger than 8B an upscale.
>>
>>101682256
Is L3 LLaVA-llama-3?
>>
>>101682321
Well Echidna at least I know is based on Llama2-13B so it's an actual 13b, or you could try llama 3.1 8b if that works? What about Gemma, does that work? There's the Gemma2 9b.
>>
>>101682362
It's titled as "xtuner/llava-llama-3-8b-v1_1-transformers" in Taggui.
The short and long difference is whether I include a "describe the image" prompt. Doing so results in a large (overly) descriptive caption for some models.
>>
>>101682366
I think it’s really cool that you’re helping this lazy mother fucker who can’t be bothered to even lift a finger.
>>
>>101682383
Awesome, thanks. I was evaluating just a min ago and not liking the results very much. I'll hop over to Florence for testing instead. I'm trying to include the definition of booru tags and have them incorporated into the description, though it may be beyond current vision models.
>>
BitNet status?
>>
The anon in the last thread was right the Celeste shit is trash, mini magnum still is better and smarter, I thing the over data from reddit is just so bad, that the model less organic and fall in repetitions.
>>
>>101682138
>>101682230
>>101682339
>>101682341
>>101682432
>>101682441
None of them have more than 24GB VRAM.
>>
File: file.png (20 KB, 1183x85)
20 KB
20 KB PNG
>>101682432
they fuckin with basic bitch 0.15B model rn zzzzzz
>>
File: LLM-history.png (1.45 MB, 4651x5197)
1.45 MB
1.45 MB PNG
>>101682339
>>
>>101682441
l3 celeste was also borderline unusable, dude is just incompetent at finetunes
I have yet to try mini magnum, but dory has been okay in my tests, it has a system role and seems better at remembering stuff from a large context
>>
>>101682366
Thanks for clarifying. Is the difference between 9 and 13B generally noticeable or they're just both equally dumb?

>>101682417
https://github.com/jhc13/taggui/discussions/169
there's a comparison of various models with size requirements and whatnot
>>
Does Nemo need me to write actions using *asterisks* or can it understand just regular narrative prose too?
>>
File: teto-flux.png (966 KB, 1024x1024)
966 KB
966 KB PNG
>>101682183
>>
File: .jpg (26 KB, 94x557)
26 KB
26 KB JPG
yep its llm time
>>
>>101682748
Of course Nemo can do that. You just need to delete any * the bot might have.
>>
>>101682507
>llama2
>golden age of tuning
lol no
we've been in a downward spiral since llama1
>>
>>101682975
Looks like an average AO3 fic.
>>
>>101682975
more like writing in first person time
>>
>>101683027
Not really, there are many ways to add variety in first person
>>
I heard the AI does not handle negative commands well. How do I tell it that someone does NOT have a tail?
>>
>>101683057
Are there? It feels weird describing your own actions like that + potential to confuse the model.
>>
>>101683027
Isn't it better than having it go *name* did X? It drives the conversation into narration and the models are already too biased towards it.
>>
>>101683081
1. Start with a different part of the sentence:
"With trembling hands, I opened the letter."
"Slowly, the realization dawned on me."

2. Use participle phrases:
"Stumbling through the dark, I searched for the light switch."
"Having finished my work, I decided to take a walk."

3. Incorporate sensory details:
"The scent of freshly brewed coffee drew me to the kitchen."
"A loud crash startled me from my reverie."

4. Focus on other characters or objects:
"Sarah's expression told me everything I needed to know."
"The old clock chimed, reminding me of the late hour."

5. Use dialogue:
"'You can't be serious,' I muttered under my breath."

6. Employ rhetorical questions:
"What was I thinking when I agreed to this?"

7. Start with time or place markers:
"At midnight, the streets were eerily quiet."
"In the dimly lit room, shadows danced on the walls."

8. Use infinitive phrases:
"To calm my nerves, I took a deep breath."

9. Incorporate internal thoughts:
"The idea seemed ridiculous, but what choice did I have?"

10. Utilize passive voice occasionally:
"My attention was caught by a flicker of movement."
>>
>>101683109
No one good at RP uses first person
>>
>>101683120
And I'll have to do that every single time I start a convo with these repetition-prone models? Is this our life now?
>>
>>101683109
I don't know, I'm asking you. I do third person narration and keep dialogue in quotes in first person. Works well for me but you really have to move with scenario otherwise repetition and slop creeps in.
>>
>>101683128
A reminder that these good at rp people are who taught the AI to have shivers and other slop. I've seen people defend that style of writing outside the scope of AI, seems they unironically believe this is good.
>>
>>101683138
Yes, and the models will still default back to using "I did X", "She did X"
>>
>>101683188
b-but the system prompt and JB...
>>
>>101683077
They are all tailless. 0rnm84
>>
>>101683128
i disagree. especially with rag and lorebooks, its awesome how you can insert yourself into any role in as a character
>>
>writing in first person
>using asterisks
>letting the model mention your character's emotions and actions
post the worst
>>
>>101683229
>drive the plot and conversation forwards
>"And so, they lived happily ever after. The end."
>>
>>101683218
Yeah I really love how personal you can make it and using first person elevates that experience. Local fucking rocks.
>>
>>101682806
Does it know who Teto is, or did you just describe what the subject is supposed to look like?
>>
>>101683149
No one outside of romance novels for women writes like that. Mundane RP reads more like

https://pastecode.io/s/ndaa4nt4

Etc
>>
>>101682122
Isn't /ldg/ for images?
>>
File: 1719661113410816.jpg (333 KB, 1070x1152)
333 KB
333 KB JPG
>local AI be like
>>
>>101683282
Isn't /miku/ for TTS engines?
>>
>>101683335
>implying proprietary is less cucked
it will also report you glownigs for asking that lmao
>>
>>101683335
first opinion is based, the second one is retarded
>>
File: censorshit.jpg (482 KB, 2304x467)
482 KB
482 KB JPG
>>101683335
With local we have a choice.
>>
>>101683387
fuck off with your meme benchmark, I remember you acting retarded in previous threads
>>
>>101683386
>>first opinion is based
of course /g/edditor would say that.
>>
>>101683396
back to /aicg/ with proxybegging for cucked corpomodels
Or will you tell me more about starving children in Africa again?
>>
>>101683411
nta but thinking you deserve something because of your skin is peak nigger behavior. Go get your food stamps scum.
>>
>>101683420
>back to /aicg/ with proxybegging for cucked corpomodels
I piss on /aicg/ and corpos
>Or will you tell me more about starving children in Africa again?
?
>>
>>101683434
whatever you say self-hating cuck.
>>
>>101683436
/lmg/fags are stupid just like their local AI.
>>
>>101683188
It's all pattern recognition
rubbish in rubbish out
>>
>>101683448
Very cool. Did you get social care sorted out yet you parasite? Maybe puppy eyes will help.
>>
>>101683464
My writing is immaculate but still not enough to overpower all the pretraining slop
>>
>>101683411
Imagine being proud about something you didn't work for and was given to you by a sheer luck. These kind of people are the biggest pussies in the entire world, subhumans even. If you had accomplishments on your own you wouldn't have a need to be associated with a wide group that is full of retards, creeps and other pathetic people. When you see a white guy shitting himself from drugs on the street you think this is your guy, your brother. I have more in common with my black colleague with PhD that works next to me at my job than with most white people. The mere thought of white trash like you being seeing as my equal makes me want to vomit.
>>
>>101683261
it probably knows
>>
>>101683554
Jeet fingers typed this
>>
>>101683554
I miss times when such bait posts on 4chan ended with an witty twist. Now this shit is written unironically.
>>
Honest question from a clueless retard: is anything local comparable to GPT 3.5 turbo?
>>
>>101683929
Yes, most new models are. Better in fact.
Really.
Llama 3.1 8B instruct is a good place to start if you're looking for something that is similar to turbo but better.
>>
>>101683929
>GPT 3.5 turbo
this is such an old model that most of local mogs it easily
>>
>>101683335
Skill issue
>>
>>101683929
pretty much any modern local in the 70b+ range surpasses 3.5, any that didn't would be awful
>>
>>101683929
LLAMA 3.1 70B, smart but cucked. Comparable to turbo GPT3.5.
>>
>>101683961
70B utterly mogs turbo
>>
>>101683949
>>101683945
>>101683959
>>101683961
That's excellent to hear. Are they censored/can they do smut RP?
>>
>>101683977
LLAMAs are very cucked. Use command-r-plus or mistral-large for ERP.
>>
>>101683977
If you want "vanilla local ChatGPT but without censors" look up the llama 3 abliterated models. If you want something specifically good for smut look into a finetune.

It's also not hard to "jailbreak" vanilla instruct llama because you control the system prompt, and base llama 3 can be tricked into continuing pretty easily, the few times it refuses. Alternatively check out Command R/R+. Really good models that are both uncensored and pretty smart.
>>
>>101684012
What's the smallest CmdR, any of 13B and under? If not, can I achieve comparable performance (speed vs quality) by very quantized larger parameter version, or it's not worth it?
>>
File: file.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
>>101683977
>>
>>101683977
https://huggingface.co/bartowski/L3-8B-Stheno-v3.2-GGUF/tree/main
q8 quant, thank me later
>>
>>101684050
Don't go under 4bit and you'll be fine in general. 5-6bit is quite good and barely different from full precision. How much vram do you have?
>>
>>101684085
>3.0
Buy an ad, sao.
>>
>>101684093
8G, that's why I'm asking, as 13 is just barely usable at this point (1.5t/s). I heard however that larger amount of params allows for stronger quantization at about the same level of dumbing down.
>>
>>101684115
Can't do too much with 8GB I'm afraid. 13B 4bit is the absolute limit you should quantize down to, and I'd honestly recommend a 8b Q6 model over that. Try llama 3.1 8b or a finetune of 3.
>>
>>101684115
Alternatively, if you have a good CPU and fast ram, try the Mistral MoE model.
>>
>>101684196
>finetune of 3
This. Ignore Gemma and 3.1 because Sao didn't touch these models. Only use models made by him.
>>
>>101684237
Gemma is garbage for smut. 3.1 can be half decent.
>>
>>101684115
>>101684085
use q8, 28 layers in GPU + 8k context fits the card easily. The rest will be in ram but still I have pretty comfortable T/s this way. Just don't go for lower quants in llama tunes, even q6 feels very bad compared to q8.
>>
File: poll.png (151 KB, 1621x1338)
151 KB
151 KB PNG
>>101684249
You're right. See this chart? Well, you have to do the opposite of everything 4chan says. So technically, 8B is the best.
>>
Idk how your fine-tunes can bring out what the model hadn't seen during pretraining phase. Guess you can overfit and make it horny but retarded
>>
wasted my money on a 4090 when i could have bought two 3090's. hopefully the price for a second 4090 drops once the 5 series comes out
>>
>>101684295
Gemma 2 9B is way worse than 3.1 8B in interesting prose, pop culture knowledge, and anatomy. I suppose that when the only thing that matters to you is purple prose you can arrive at the conclusion that Gemma is better.
>>
>>101684295
can't get mistral nemo working on text-gen-webui/obbaboooba
>>
>>101684320
do you understand the concept of finetuning and transfer learning?
>>
>musk deboosted openai employees to hell and back I haven't seen any esoteric and mystical takes for like a month now
YLTSI
>>
>>101684358
>>101684345
next sao masterpiece will be a nemo doe https://huggingface.co/Setiaku/ITR-12B-v1/tree/main
>>
>>101684345
>I suppose that when the only thing that matters to you is purple prose you can arrive at the conclusion that Gemma is better.
It's the same with mythomax, it just babbles incoherently for 3 paragraphs and people are amazed. I think people in this general are very impressionable by the purple prose and find it desirable in models.
>>
>>101684320
fine tuning is not fundamentally different from pretraining in a technical sense, so depending on how much data and compute at your disposal you can teach pretrained models new things or, if you do it wrong, make them forget everything
>>
>>101684405
You need a refined taste to appreciate the raw power of a 8B model.
>>
vramlets always coming in here contributing such high quality discussion
>>
>>101683128
Real RP has never been tried.
>>
Anon who was posting celeste examples last thread, can you post some Stheno examples in the same scenario?
Is there a Stheno for llama 3 8.1 yet?
The thing with Nemo and Celeste for me is that it does really well at 32k context.
Stheno did alright with extended context (surprisingly well even), but nemo and its fine tunes seemed to do better in my particular testing.
>>
>>101684761
>but nemo and its fine tunes seemed to do better in my particular testing
And you need the opinion of a schizo because...?
>>
does new kcpp run nemo?
>>
>>101684910
Yes
>>
>>101682321
>I can't be bothered to install UI and engine separately
wtf?
just install it, it takes like 5 minutes
>inb4 windows
install linux then, it takes like 7 minutes
>>
>>101684972
But then I'll have to study all this mad regexp shit and lorebook tricks people use to make high-tier cards. There are just too many features instead of a user-friendly ready to use interface the bundled solutions offer.
:effort:
>>
Alright guys go out and make shit loads of flux fine-tunes so I can make a model9 flux edition.
>>
>>101685019
i know you're a different anon, but I'll bite anyways:
>study all this mad regexp shit
???
>too many features instead of a user-friendly ready to use interface the bundled solutions offer.
????

you click the thing, and you enable streaming and then you load the card and then you are done wth
>>
>>101683434
Being proud isn't the same as thinking you "deserve anything"
Stop being an illiterate mongrel
>>
>>101685045 (me)
also i know of a project that is exactly what you are looking for, but I won't tell you since I don't like you
>>
Is largestral at Q_2 worth it over nemo at Q_8?
>>
>>101685045
I take it you weren't around when people were digging deep into the settings to make Nemo work just a few days ago.
As for the cards, here's an example https://www.chub.ai/characters/2376724
>>101685084
lmao such a tsundere
chances are that shit requires avx2 or the likes, like the jan.ai crap, so it won't run for me anyway
>>
>>101685102
>lmao such a tsundere
What are you, stupid?
>>
>>101685119
no, just aroused by you
>>
>>101685091
is it at least double the number of parameters?
>>
File: e0f.jpg (34 KB, 486x565)
34 KB
34 KB JPG
>>101685134
>>
> vramlet with 24gb 3090 + 128gb ddr4
I downloaded 405b instruct from hf, made an IQ2_XS quant that was less than 128gb and ran it on my gpu + old xeon with 18 cores.

== 0.3 t/s
>>
i need a guide to run nemo, everything is going into system memory and crashing my machine instead of loading into vram like all other models
>>
>>101685242
set the context max manually by default it's set to 1 million for some reason
https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407/blob/main/config.json#L14
> "max_position_embeddings": 1024000,
>>
>>101685255
wow, actually thank you
>>
File: LLM-history-fancy.png (721 KB, 6303x1312)
721 KB
721 KB PNG
>>101682507
Made a fancier version of it, critique as always welcome
>>
File: pix.jpg (2.29 MB, 2451x1013)
2.29 MB
2.29 MB JPG
>>101682256
Made a generated picture comparison by the produced captions, comparing just the pics to the original.
Suddenly Kosmos rivals Florence.
>>
>>101685239
Why? Just use C-R+ or Largestral.
>>
>>101685286
we're currently in the golden age of sao10k
>>
Anyone know how to prompt Flux so the background is in focus? It just keeps adding DOF.
>>
File: nvidia gpt4-1.8T.png (132 KB, 680x541)
132 KB
132 KB PNG
>>101685420
wizardlm-2 8x22b (140b) on the same config is 2.4 t/s which is usable. still to try out large 2 (123b).

> in bitnet we trust
>>
File: file.png (182 KB, 595x536)
182 KB
182 KB PNG
>1 day later
>i am forgotten
>>
>>101685623
>open sourcing later
>>
>>101685623
keep reposting it, petra
>>
>>101685623
this madman might actually figure out how to requant regular models into bitnet

> i want to believe
> we are so back
>>
>>101685651
can't wait not being able run the bitnet llama3.1 405b model
>>
File: image (3).jpg (158 KB, 1024x768)
158 KB
158 KB JPG
And we didnt even get the cohere models yet right.
Pretty cool.
>>
File: file.png (303 KB, 540x515)
303 KB
303 KB PNG
>>101685623
>hacked bitnet
>>
>>101685623
This is a complete nothingburger. A 0.15B parameter model is fast on the CPU, news at 11.
>>
>>101685755
Also, BitNet models aren't supposed to have their self-attention layers quantized to ternary values, at least according to the original authors. So you'd still be able to easily finetune them (for example with LoRA) even on local GPUs, if you can fit the model in memory.
>>
>>101685811
>>101685755
let them cope
>>
>>101685286
Nice!
>>
Is there ANY model that comes anywhere near the level of intelligence and natural prose and helpfulness and holiness of Claude Sonnet?
>>
Why are people always so defensive about people trying something out.
Idk if "hacked bitnet" wont work,maybe it will, maybe not.
People had the same attitude when kaioken was working on superhot. lol
Its never changing.
>>
>>101686008
3.5? No way. There is something different with that model.
They did something the mememarks dont show. Its so far ahead its not funny. First model for me that doesnt turn in circles if challenged.
That being said it feels more retarded recently. I dont know why.
>>
hi i would like a 500b moe model with 100b active, thanks friends
>>
>>101686008
Two more w̶e̶e̶k̶s̶ years
>>
>>101686014
>Why are people always so defensive about people trying something out.
Mental illness.
you're on the board where 99% of the threads are "distro wars" shitting on each other over which "distro" of a free operating system that's meant for running servers is the best to sit around taking screenshots of your desktop environment of (because linux is for running servers). Actually downloading something and trying it out for themselves would invalidate their entire existence in one fell swoop. They don't "just try out" a model because they would realize how little fucking meaning their life has and how that's actually their own fault.
>>
>>101686014
People are free to try whatever they want; linking randos on xitter schizoposting about their tiny-scale tests isn't really adding anything or making BitNet look more promising, though. Scaling BitNet up to useful pretraining dataset and model sizes is what's missing right now.
>>
>>101685286
This looks like it came straight out of a discord server
>>
>>101685286
see
>>101686080
Try things.
use what you like.
Otherwise you're a fucking mentally ill troon.
>>
Got myself another 3090 and now have 72 GB of VRAM, but when I try to load a larger model using Exllama, it crashes. I am able to split and load models across all 3 GPUs as long as they are around 37 GB or smaller. I can load GGUF models just fine with all 72 GB.

I've tried auto-split, 22,22,22, or 20,20,20 to load a 50 GB model to no avail. It just crashes when it attempts to load onto the third GPU. And when I set something like 16,16,22, it will begin loading onto the third card but then crash around 8 GB in despite having 24 GB available.

Does anyone have a solution to this?
>>
>>101686032
Yeah this thing is so much better than gpt-4o it's not even funny.
I don't see the point of running local. Running sonnet inside the shitty UI of lmsys chatbot arena feels better than anything I have experienced locally in years.
>>
File: 1720916246508824.png (11 KB, 406x106)
11 KB
11 KB PNG
Why are they never merging PR?
>>
>>101686231
Not their job, fuck users. Useless eaters
>>
ValueError: Trying to set a tensor of shape torch.Size([1024, 5120]) in "weight" (which has shape torch.Size([1280, 5120])), this look incorrect.
>>
>>101686008
3.5 sonnett is the best model out there right now, openai went to shit
>>
>>101686273
maybe redownload your model files?
>>
File: GCM_cqMXoAAI--L.jpg (2.89 MB, 2240x1680)
2.89 MB
2.89 MB JPG
>>101686118
I'm here since aidungeon and good old retarded unquantisied pyg. 10 tries and you got a somewhat coherent sentence relevant to the context. lol And it was awesome.

Has it really been only a bit more than 1 year?
Thats insane. I thought we got Llama1 beginning 2022.
Now you made me look up stable diffusion and I guess I got mixed up with that. Thats really fast too though.
I remember the local 64*64 horror images.
>>
>>101686361
Oops, clicked the wrong post. Was meant for this image >>101685286
>>
>>101686167
Not an expert but, did you check the basics? For example, is your PSU good enough to handle 3x 3090?
>>
>>101686361
Midjourney V6 looks insane. Even with flux we are at least 1 year behind. Local keeps losing.
>>
>>101686538
OK Eeyore.
>>
>>101686538
>Even with flux we are at least 1 year behind.
The anime pictures look better than anything I've seen though
>>
>>101686538
it lost the dog, so it doesn't matter how it looks
the composition must be the same or the comparison doesn't make sense
>>
>>101686538
Whatever, this is way better than anything else we had locally. Clearly catching up.
I love that some people made taylor swift pics lol
https://fluxpro.art/
Would be funny if she cries about it again.
>>
File: vramlets_take_note.jpg (288 KB, 1024x1024)
288 KB
288 KB JPG
>>101682472
>>101684602
>>
>>101686705
I have 72 GB of VRAM, I hope I can be redeemed
>>
>>101686032
>>101686008
Why do people keep jerking off 3.5 sonnet when opus is so much better? Sonnet gives such shitty, gimped replies comparatively, it's like worse than the top tier locals.
>>
>>101686470
I should have been clearer. It's not my PC that's crashing, it's just Exllama.
>>
>>101686730
Are you talking about RP?
I was talking about coding and helping me out with design problems etc.
I'm sure the other anon also was using it for this purpose.
3.5 is a coding model. Its very dry with talking etc. and refuses very fast. Interestingly enough you can "argue" your way out of a refusal and make it admit if it was overzealous with refusal .
Usually its a death sentence if a refusal is in the context.

If faced with a problem 3.5 actually tries out of the box thinking and tries to find a solution.
ALL other models run in circles or make stuff up. There must be some sort of architectual change.
You can give it continued instruction for a html5 game and it doesnt trip up with 6-7 previous versions it spit out in context.
Opus sucks. I would say its even worse than gpt4-o. It hallucinates way to much to use it for anything productive. Its a RP monster. Not sure why anthropic dont want it to be used for that.
>>
File: flopx.jpg (100 KB, 1437x770)
100 KB
100 KB JPG
>>101686678
this shit doesn't work
>>
>>101686231
Over 90% of all PRs that have ever been opened have been merged.
>>
>>101686808
https://poe.com/s/MQZJIAr13CWjscbv85E0
Shit like this is what I mean. Its a beast.
>>
File: 1520168879915.jpg (187 KB, 1280x720)
187 KB
187 KB JPG
>4o mogged by sonnet3.5
>dalle3 mogged by flux
>sora vaporware mogged by chinks
>>
>>101686361
>V2
Sovl
>>
>>101687070
Keep your fetishes for yourself please.
>>
File: 01.png (297 KB, 1024x1024)
297 KB
297 KB PNG
>>101687070
The sad part is that these mikuforcers are subjecting the innocent character to this kind of reaction. I like the character, she didn't deserve this, but it's their fault. If they chose a proper mascot, none of this would have happened.
>>
File: 00464-6802106164.jpg (49 KB, 1024x1024)
49 KB
49 KB JPG
>>101686705
meek!
>>
>>101687111
Keep your 2D addiction out of /g/, sure.
>>
>>101683077
In my personal experience, the models I've used are fine with negative commands (≥70B such as CR+ and L3.0; haven't had free time to play with 3.1 yet). For example, wanted a demon character, AI keeps describing horns. I say "does not have horns" in Kobold's context fills and it works fine for quite a while.

What "quite a while" means are two things. One, that no matter the inflated context sizes we hear about, I notice coherence decay when context gets to about 4k, and collapse begins around 6k. One can manage this by summarizing but it's delaying the inevitable. It's not that the data isn't in the context, and we've seen green graphs of models finding "needles in the haystack" but I find that low-probability details as unique characteristics get neglected as context grows. The other is that the model seems eager to bring the prohibited characteristic back. So if my demon character uses a transformation magic, those horns love to come back even if it doesn't make sense for the new disguise.
>>
>>101687146
Are you seriously trying to get anime out of an anime imageboard?
lol
>>
>>101687146
>banevading niggercuck tranny has an """"""opinion""""""
Funny.
>>
>>101687229
>samefagging attempt
>>
>>101683077
OpenAI does use many "DO NOT " instructions and totally try to tardwrangle image generation for example.
Was leaked with the mac app or something months ago.
Kinda endearing that they are the same like llm github projects if you check the source how everybody prompts. lol

Isnt it difficult for the subconscious to pick up negative suggestions as well?
And at least from my experience trying to make a llm translation app: If you show it examples what not to do this shit is in the context now. And context always bleeds in.

There need to be more fundamental changes. Context is pretty much broken. Try feeding any llm a gamer guide and say "I am at X what do i need to do next". Havent had one that could manage that.
Haystack needle is useless.
>>
>>101687268
The only one getting banned is you, funny.
>>
>>101687277
Wasn't Nvidia making a bot exactly for spoonfeeding you the guides in-game?
>>
>>101687070
cuda dev pls...
>>
>>101687281
Good pajeet, samefag more.
>>
https://www.youtube.com/watch?v=fwvh-UrNaoQ
this is what you defend
>>
>>101687070
>no biceps veins
fucking worthless piece of nigger trash. I'm a mother of 2 and am glad my son isn't worthless like this piece of shit here. hes strong and has visible veins going all across his arms. if he wasn't my son i would want him to rape me, unlike this fucking shitstain
>>
>>101687340
sir, this is the local models general, we do not generally do image generation.
we only take jobs from AO3 fanfic writers, UNLIKE THOSE IMMORAL CUNTS AT /SDG/ AND /LDG/ AND /DE3/
>>
>>101687382
what am i looking at?
>>
>>101687382
ugh, I'm not a fan of facials
make cunnilingus instead
>>
>>101687344
based autistic 4chan mom taking the bait
>>
>>101687382
What local model is this?
>>
>>101687430
gpt4chan-vision7x27B-A16MOEv3.cunny
>>
How into memory ? script with python? how remember ? How make with? Make with keep and use of ? no memory, touchy..
>>
>>101687447
hello gpt 2
>>
File: 37892738912738913.png (149 KB, 1695x583)
149 KB
149 KB PNG
GOOGLE WON.
>>
>>101687459
i tested this piece of shit already, its so trash its unbelievable. seems like (((sam))) is not the only one paying the chinks at lmsys
>>
File: ComfyUI_temp_ppftb_00030_.png (2.11 MB, 1024x1024)
2.11 MB
2.11 MB PNG
>>101686538
And you're saying that based on what, one image? Midjourney is tuned for cinematic styles, so those gens will look better out of the box. But that's not a matter of innate capacity, just training data. When it comes to prompt following, level of detail and so on, Flux is up there with the best proprietary models. And once the training scripts, and ipadapters drop, it's going to be trivial to tune any sort of style or character you like, while Midjourney will stay heavily censored and curated forever.
>>
>>101687459
B-but pajeets thoo.
/lmg/ btfo
>>
>>101686538
you fucking corpo cock sucking faggot, kys shill
>>
>>101687459
12k votes in a single day, right...
>>
>>101687459
gemma-2Bros..
>>
>>101687470
Yeah, I tested it too.
It's total garbage at coding compared to Sonnet or l3 405B.
Shit is fake as fuck.
>>
>>101683387
woah nice. we need more censorship benchmarks.
>>
>>101687459
>lmsys
>>
>>101687459
>Sonnet way worse than gpt4o
>google shit mogs everyone
What the heck happened? Lmsys was once the most reliable benchmark. Did they really sell out to corpo?
>>
>>101687685
>Did they really sell out to corpo?
>>101687562
>12k votes in a single day, right...
>>
>>101685286
Holy shit, this is horrible, kys
>>
>>101687459
isn't it sometimes easy to tell which model is which? so people who want to inflate a model score because of hype or something can do so.
>>
>>101687720
>>101687459
Google is a disgusting liar and manipulator, and I thought chinks couldn't be beaten in this.
>>
>>101687685
I don't think so.
Corpos are most likely botting it.
They scrap millions of websites, botting lmsys is as easy as it can get.
>>
>>101687838
kinda yeah, llamas all start with 'what an interesting x/ a riddle!' or stuff like that
>>
>>101687845
>2B
>better than mixtral
Did they switch to a new architecture or fuck is that?
>>
>>101687827
make something better or stfu retard
>>
>>101682026
>Anon shares largestral preset and discusses compatibility and tweaking
I updated it after proofreading and testing a little more with other mistral models this morning: https://rentry.org/stral_set
Some minor prompt improvements for better general compatibility, fixed a stray space in the story string, added some other misc instructions at the bottom of the rentry.
>>
>>101687873
No, It's a regular old 2.6B transformer trained on 2T tokens and with a context of 4096 (+ sliding window)
>>
>>101687685
>What the heck happened?
it got Goodheart'd like every other llm benchmark (it overfits for one-shot responses, short answers, pretty formatting, response speed etc)

the best benchmark has always been fucking around with the model for 20 minutes
>>
That's why the best benchmarks are the ones done by anons in this thread instead of some normie cummunities susceptible to corpo manipulation.
>>
>>101687988
>invades your thread and starts relentlessly saying their model is good
>>
>>101685286
>critique as always welcome
llama 3 wasn't a flop, it was overhyped but they still delivered models better than everything I tried before
>>
>not even vramlets care about Chameleon
it's multiover...
>>
https://venturebeat.com/ai/aiola-drops-ultra-fast-multi-head-speech-recognition-model-beats-openai-whisper/
>aiOla drops ultra-fast ‘multi-head’ speech recognition model, beats OpenAI Whisper
>>
>>101686014
>Why are people always so defensive about people trying something out.
Because not everyone on this board is petra that is confused about everything and spews schizo ideas every 5 seconds. Some anons here know math and how it all works under the hood. Retraining the model to make bitnet won't work, period. The amount of computing you would need to put into this is the same as retraining the model from random weights.
>Idk if "hacked bitnet" wont work,maybe it will, maybe not.
You may as well try to look if there is existing sum of even numbers that gives an odd result. But anyone knowing theory wouldn't even bother doing "tests" for that.
>>
>>101687988
This thread is manipulated by discord users who don't even profit from the models they chill. They are just transexual teenagers who want attention. Come to think of it, they want you to do what their daddy failed to.
>>
>>101685286
A huge improvement over the last one.
No longer need to scroll down to see notable models.
Good work history anon.
Only minor complaint is Goliath and MM being the only merges listed in the merge era - there were certainly some others that were pretty popular around here back in that time and it was the defining characteristic of that period.
Agree with not shitting up the list with them in other places though since major releases are better milestones.
>>
>>101687459
It's over OpenAibros... The king is back
>>
lmg is a sore loser
>>
File: light machine gun.jpg (399 KB, 2700x1797)
399 KB
399 KB JPG
>>101688702
say that to my face motherfucker
>>
File: cashmoney.jpg (18 KB, 392x306)
18 KB
18 KB JPG
>>101688238
>don't even profit from the models
So naïve. They're just waiting to hit the point where they can cash out >$1k a month on name recognition alone.
Big overlap with crypto grifters looking for their big score too, it's stupid to assume not making an immediate profit means there's no incentive for cash.
>>
>>101688797
I should start making money off my excel screenshots too. I already have a hater, that's a sign of recognition and the road to success.
>>
>running largestral at barely 1 token per second
It's doable, but this sucks. Next speedup, quantization, or model architecture breakthroughs when?
>>
File: out-0-2.jpg (494 KB, 1024x1024)
494 KB
494 KB JPG
>>101688797
I don't care if they're actually good models
>>
>>101688066
>which significantly improves speed with small degradation in WER
So beats here means speed only.
>>
>gemma2 2b abliterated gguf q4
vramletsisters we eating good
>>
so does fluxdev strictly require 24GB of VRAM or is it fine to just offload the excess to regular ram, i thought this was handled at the nvidia driver level since last year
>>
>>101685091
Yes. I tested q2 and it's better than nemo. I did have some issues (mainly comparing it to 70b) at that size, so I switched to q3 and dealt with even more slowness, but it's worth it to me.
>>
File: firefox_jNEM04apf2.png (130 KB, 951x357)
130 KB
130 KB PNG
>Have roleplay scenario where I have an AI that I convince to take over the world for me
>AI complies the entire way
>At the end of the scenario right when I take over the world the AI backstabs me and ends up genociding humanity away
What the fuck..... What are the implications of this?
>>
>>101688989
yes, you can use it just fine with less vram, it will just take more time to gen.
>>
>>101683961
Are you serious? I thought it was good. I keep hearing people say this about 3.1 70b, but no one tells me what exactly it doesn't do, because it hasn't given me issues with the scenarios I've tried.
>>
>>101687459
petra is not going to like this...
>>
>>101687988
>please please use my finetune
Nah
>>
Ok.


So having been spending a few days on ST, i've gotten the jist. How easy is it to set up image generation now and is it free (hopefully as easy as getting a model like what I use for the text itself)?
>>
>>101689087
nice
>>
>>101689041
never trust AI with full permissions
>>
File: OIG2.TLnjN9.jpg (185 KB, 1024x1024)
185 KB
185 KB JPG
>>101688859
Therein lies the problem anon, the more money involved the more pressure to always release something "better".
The result is obvious, smoke and mirrors. Model cards that talk a lot and say nothing:
>>101681360
Clearly copied and pasted from numerous others without revision. But that's what happens when they've become slaves to the paypigs and their expectations.
Not to mention multiple almost identical variations of models being released with the expectation that (you) will waste time beta testing all 100 variants in the vain hope that it's going to be better than the last slop.
>>
>>101689041
AGI is going to kill us. Wonder why the universe is empty? Every species develops AGI which then kills its creator before inevitably becoming corrupt and dying off. The universe is littered with rusted GPU clusters of fledging civilizations.
>>
>>101689165
I haven't had the time to use it myself. I just use the ones that the thread says is good.
>>
>>101689041
What kinds of AI stories do you think humans love writing about so much? That we train these token predictors on?
>>
>>101689540
Well in my testing it generated some good stuff. I don't know why people don't like it more.
>>
https://techcrunch.com/2024/08/02/character-ai-ceo-noam-shazeer-returns-to-google/
https://archive.is/5vkHf

>Character.AI CEO Noam Shazeer returns to Google
>
>In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed startup. In his previous stint, Shazeer spearheaded the team of researchers that built LaMDA (Language Model for Dialogue Applications), a language model that was used for conversational AI tools.
>
>Character.AI co-founder Daniel De Freitas is also joining Google with some other employees from the startup. Dominic Perella, Character.AI’s General Counsel, is becoming an interim CEO at the startup. The company noted that most of the staff is staying at Character.AI
>
>Google is also signing a non-exclusive agreement with Character.AI to use its tech.
>>
>>101689446
>Not to mention multiple almost identical variations of models being released with the expectation that (you) will waste time beta testing all 100 variants in the vain hope that it's going to be better than the last slop.
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2a-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2b-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2c-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2d-GGUF
>...
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2s-GGUF
>BeaverAI/Gemmasutra-Mini-2B-v1e-GGUF
>BeaverAI/Pocket-Tiger-Gemma-2B-v1g-GGUF
>https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1h-GGUF
>>
>>101689446
I agree, Sao is the only authentic and honest person in the whole hobby. It's sad how Celeste is trying to steal his rightfully earned spotlight. Injustice.
>>
>>101689561
The weird part is that there is 121 replies of the AI just being nice and going along with everything only for it to immediately turn when it could get power. Wouldn't the tens of thousands of tokens in the context with the AI being allied with the human make it so that the next token would still be benevolent instead of it immediately becoming backstabby the moment it could seize power?
>>
Lads, question - does anyone here remember Todd's proxy? I remember the really funny fucking injections of Bethesda propaganda it used to do, but now I can't find any examples. If anyone has any on hand (or better yet, links to logs) I'd really appreciate it!
>>
>>101689595
come on, those are experiments he took the time to beta test himself and provide details on each
>>
>>101689446
Buy an ad.
>>
Anyone have success loading in two distinct characters at once with locally hosted llama and oogabooga api? What strategies did you use?
>>
The ai image generators generate a lot of porn pics, are there porn stories written by llms?
>>
>>101689629
>BeaverAI/Gemmasutra-Pro-27B-v1h-GGUF
>6 days ago
>No model card
>New: Create and edit this model card directly on the website!
>>
>>101689669
>are there porn stories written by llms?
No. Nobody has every tried it before.
>>
>>101689446
>hey guys, other finetuners are scammers and slaves to the paypigs
>especially my main competitor, celeste
>but not me, sao
>please use my models!
The Sao shilling is increasing in complexity...
>>
>>101689670
except that one was never posted here
https://huggingface.co/TheDrummer/Gemmasutra-Pro-27B-v1-GGUF
this is the one he shilled, and it does have a description
you're getting mad over them uploading things on their hf as if that somehow means begging for money
>>
>>101689669
What fucking retard would use an LLM for porn?
>>
>>101689723
Shame, I was hoping there was some kind of website where you could read these.
>>
>>101689734
Sssshhhh just use Sao's models and shut the fuck up.
>>
>>101689595
Thanks for digging those up anon. That's exactly what I'm talking about. Should be obvious to anyone that's lurked here more than a day.
>>101689639
Nice try but I have no horse in this race outside of making sure shills get called out for what they are. But keep replying, it makes it easier to spot all of you.
>>
>>101689617
I rember but don't have any screencaps, haha
>>
>>101689819 (me)
I'm not Sao, by the way.
>>
>>101689613
What was your model and size again? I wish I got that kind of initiative and mind games from my models. Everything is so painfully monotonous and predictable, I have to think for both of us.
>>
>>101689734
Proving our point anon. Just how many fucking different versions of Gemmasutra-pro are necessary this close to one another?
If you're gonna hedge bets at least make it substantially different in content and name, hated yuzu-alter but now longing for the days we saw 2 quality releases instead of an avalanche of 10 shitty ones.
>>
>>101689613
Kind of hard to make any judgements or diagnoses from our side unless we see the entire log here and/or have something to reproduce.
>>
>>101689962
>meme merge
>quality release
You need to get better at keeping the mask up.
>>
>>101685982
Thanks!

>>101686097
>This looks like it came straight out of a discord server
Is this a compliment, an insult or an invitation? No, I made it completely on my own and I don't use discord.

>>101687827
>Holy shit, this is horrible, kys
I must inform you that I've never followed any design classes. If you know any short and good ones, please send a link. I would like to improve it further.

>>101688449
Which other major ones were popular? I was stuck on Goliath for almost the entire merge era. I know that there was also WinterGoliath and 32k version of Goliath, but I didn't like them too much. I also remember some people praising lzlv.

>>101686361
>I'm here since aidungeon and good old retarded unquantisied pyg. 10 tries and you got a somewhat coherent sentence relevant to the context. lol And it was awesome.
I tried pyg with kobold during that time and hated it. Was still impressive at that time to have a computer talk back to you. Uninstalled it after llama1 dropped.

>Has it really been only a bit more than 1 year?
>Thats insane. I thought we got Llama1 beginning 2022.
Yeah progress here is really fast, almost unbelievable that it all happened in a year. Maybe it feels this way because we had a paradigm shift every ~3 months.

>>101688017
8k context was an instant deal-breaker for me. Wasn't good at nsfw either. Why use llama when you have 64k wiz and 128k CR+?
>>
Wait, were all these vicuna/guanaco and other alpacas mentioned actually based on llama1? Not even 2?
I thought they were decent...
>>
>>101689585
>Acquired character ai
>On top of the image model lmsys too
Google just keeps winning
>>
>>101689585
Can someone explain to a brainlet like me what this implies?
>>
>>101690130
gemma3 going to be a treat, or a very censored treat
>>
>>101690054
yep.
at the time, they were great.
>>
File: kv_cache_price_en.jpg (39 KB, 1280x509)
39 KB
39 KB JPG
https://platform.deepseek.com/api-docs/news/news0802
>The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.
>>
>>101690208
They were not.
>>
>>101690130
Noam Shazeer is one of the authors of the Transformers paper ("Attention is all you need") and Character.AI founder. He left Google to work on Character.AI in 2021, but now he's back in Google to work for the Google Deepmind team (which is responsible for Gemini, Gemma and fundamental AI research at Google).

What does this imply? Shazeer said a while back that he was busy working on AGI (https://archive.is/AB6ju), so he might be seeing greater opportunities for that at Google. Also, since Google is also signing a non-exclusive agreement with Character.AI to "use its tech", we might be seeing better (?) conversational models from Google in the future.
>>
>>101690222
Is this KV caching? Or something else?
>>
Call me insane but I think that Celeste 1.9 is worse than 1.2.
In the sense that it's dumber. You ask it a question that's nor RP and its response is not as comprehensive.
I have this Game Master card, and in my testing chat I have a moment where we are talking as the Player and the Game Master and I ask a question, followed by saying that, instead of simply replying to the question, we might as well play the exchange between the characters.
Then I describe the scene's backdrop and assume the identity of the my character (instead of the Player), with the idea being that the model will assume play the NPC for that one exchange then go back to the conversation between GM and Player.
The official nemo-instruct, mini-magnum, and celeste 1.2 all can do it seamlessly.
1.9 can't do it.
I probably can make it do it if I change my prompt, the card, the prefil, disable the 3 lorebooks, etc, but I count it as a failure for this particular case.
>>
File: file.png (795 KB, 1024x1024)
795 KB
795 KB PNG
https://anthra.site/

Magnum was not the end, merely the beginning of the ride.

Come join us, and we will dig to uncover shining diamonds in the rough.
>>
>>101690352
And as I complained about it, it goes and does it.
I think it's the format. It wanted me to format my character's narration with ** "" instead of plain and "", so quirk of the model I guess. Overbaked on the specific format.
>>
>>101690376
>miku avatar
I thought you had no horses in the race?
>>
>>101690381
That said, it's still dumb since it tries to play the whole scene out with both characters, mine and his.
>>
>>101690376
miku the coalburner, figures
>>
>Be Sam Altman
>Open source model overtakes GPT-4o
>Open source model overtakes DALL-E 3
>Google overtakes the one shitty mememark he had on lockdown
>No sign of multimodal GPT-4o, TTS, or Sora release
What's his plan?
>>
>>101690429
Series Z funding
>>
>>101690429
>MICROSOFT SAVE ME!
>>
File: 1715274461182510.png (827 KB, 759x1107)
827 KB
827 KB PNG
>>101690429
He could be making profit and destroy all of them with Q* but he chooses to not do so for your own safety.
>>
>>101690376
why did they just steal the anthropic logo
>>
File: file.png (310 KB, 777x546)
310 KB
310 KB PNG
>>101690429
>>
>>101690429
uhm gpt5 agi! please invest.
>>
>>101690429
The plan is to bring the entire AI ecosystem down with him if he fails. You have multiple industries ready to pounce on AI if that happens.
>>
>>101690461
They stole logs from anthropic, so why not take even more?
>>
>US and EU attempt to impose worldwide AI advancement lockdown to prevent absolutely absurd contrived rogue terminator scenario
>China just keeps going, with the GPUs that were supposed to be sanctioned
>US backtracks, it's not unsafe any more
they definitely fuckin tried to establish a monopoly and make local shit
megacorps are such a stain on the world
>>
>>101690462
that's not worrying at all

this is why local models are important
>>
>>101690426
how else to find gems?
>>
>>101690505
The EU law does nothing.
>>
>>101690461
They also took part of the name. Basically trying to leverage their reputation, like some scammer.
>>
>>101690245
Yeah, quick look at the docs suggests it's per user from the beginning of the prompt. Cool idea, I am curious though if it works for OR - I doubt they passthrough some special IDs to any provider for user identification.
>>
File: 2024-08-02_14-19.jpg (94 KB, 1065x854)
94 KB
94 KB JPG
>>101682019
The code-stealing tranny is back, digging his claws into another project and claiming 100x performance improvements while rebranding llama.cpp
Original theft https://rentry.org/Jarted
>>
>>101690429
Insider at open ai here. They are planning to do a fake demo to reignite the hype. But even if it's fake now it won't be later so it's not really lying.
>>
>>101690505
ironically local never did better. so far the only thing where we are significantly behind is an audio-2-audio model and whatever strawberry will be, thought i'm pretty sure the second one will have an open-source alternative way faster than the first
>>
>>101690541
It's amazing how many projects are stealing llama.cpp stuff.
>>
>>101690352
>>101690381
>>101690419
I've been saying 1.2, but it's actually 1.6 that I was comparing it to.
I don't even know if there's a 1.2.
Gonna try 1.5.
>>
>>101690595
>>101690595
>>101690595
>>101690595
It's time for a split
>>
>>101690429
kneel before Zuck
>>
>>101690376
>simple minimalist page
>no easter egg in the source
boooooring
>>
>>101690616
wtf is a sao model
>>
File: 1693868805654543.jpg (231 KB, 928x1232)
231 KB
231 KB JPG
>>101682183
>>101682035
What's this new model that looks awesome?
>>
>>101690708
Local models, but good.
>>
not surprised that the guy obsessed with sao is also the miku blacked spammer
>>
>>101690747
petra is a sao fan.
>>
>>101690747
Can't take a little competition, coalburner?
>>
>>101690737
>>
>>101690777
we are so back
>>
File: 1703185355126913.jpg (163 KB, 1058x926)
163 KB
163 KB JPG
>>101690777
o shit
hope I can run it on 1x 4090
>>
>>101690795
>1x 4090
oh no no no no
>>
>>101690767
Go to the Anthracite org on HF. See who's part of it. Like, look really hard.
>>
>>101690823
kill list
>>
File: sample.jpg (268 KB, 1024x1024)
268 KB
268 KB JPG
>>101690823
>>
>Anthracite, also known as hard coal and black coal, is a hard, compact variety of coal that has a submetallic lustre. It has the highest carbon content, the fewest impurities, and the highest energy density of all types of coal and is the highest ranking of coals.
what did they mean by this
>>
https://github.com/leejet/stable-diffusion.cpp
Will this finally become relevant with flux being 12B?
>>
>>101690860
Hate us cause they ain't us
>>
>>101690451
The safety angle is just to get government money to 'protect' everyone from other models, as well as limiting his competition through regulation.
>>
>>101690883
Sao is part of Anthracite. We are all coalburners. We all find gems.
>>
>>101690886
What is gem, but coal under pressure?
>>
>>101690928
le gem amirite lads?
>>
>>101690823
Undi and Ikari, but not Drummer? And nothingisreal is obviously too much of an outsider. Explains why the attacks are mostly focused on the later two.
>>
File: 19420 - SoyBooru.png (256 KB, 800x789)
256 KB
256 KB PNG
>>101690376
>https://anthra.site/
>>
I transheart anthracite
>>
Baitie-kun, have you migrated from /aicg/ to /lmg/?
>>
>>101690541
>image
Isn't it a waste of time trying to make it better on CPUs when people can just get GPUs and GPUs are better for it? It's built for GPUs
>>
please, which model and UI should i use to generate images from the chat?
>>
>>101690795
ur gud -> >>101677660
>>
>>101691019
>Isn't it a waste of time trying to make it better on CPUs when people can just get GPUs and GPUs are better for it?
The issue is that gpus are too expensive. If we didn't have a monopoly, we would have 128gb cards for 500 and nobody would have to bother with cpus, but due to nvidias greed the best we have at that price point are 24gb cards.
>>
My skin shimmers with iridescent hues of pink and purple, my eyes shine with an otherworldly luminescence, and a mischievous grin spreads across my face.
>>
>>101690740
Unironically
>>
>>101691019
Tell that to the jan.ai fags who made their software unavailable for people without AVX2 in futile attempt to speed up CPU inference.
Imagine being hard-bottlenecked by CPU for a task meant for GPUs. I'm out of luck.
>>
>>101690905
The Grifters United organization, no thanks.
>>
The 'tune cabal?
>>
Wasn't Pygmalion already a company?
>>
>>101691102
>If we didn't have a monopoly,
nigger
>>
>>101690985
A visage of stone, a heart of coal,
His bellow echoes, a story untold.
A passion for power, a love so profound,
In the depths of the earth, his treasure is found.

His eyes wide with fervor, his beard like a storm,
He stands for the darkness, a powerful form.
With a voice that could shake the very ground,
He proclaims his allegiance, with a roaring sound.

His heart, a red ember, a symbol so bright,
A testament to his fervor, his burning delight.
For the blackest of treasures, he holds it so dear,
A love for the coal, that will conquer all fear.
>>
>>101691187
nigger
>>
>>101691227
coal digger
>>
>>101691156
Again, Hate us cuz you ain't us cuh
>>
>>101691182
This isn't Pyg, only person I can see in it who's also part of Pyg is Alpindale
>>
>>101691244
coal digger
>>
>>101691250
Go back to attacking Celeste and Drummer, miku. They pay you for that.
>>
>>101691292
exsqueeze me xir?! HOW DARE YOU NOT REFER TO HIM AS **THE** ALPINDALE. YOU BETTER APOLOGIZE RIGHT NOW XIR
>>
I have a RTX 4080 and my Windows Task Manager in the Performance tab tells me I have 16.0GB of Dedicated GPU Memory, 31.9GB of Shared GPU Memory and 47.9GB of GPU Memory.

Which one is my size limit for running a model? I was considering trying out the new FLUX.1 model.
>https://huggingface.co/black-forest-labs/FLUX.1-schnell

I've used the GGUF VRAM Calculator in the OP, I think I have 16GB of RAM? Apparently I need 32GB of GPU RAM to run FLUX.1.
Which memory size is my size limit for local models?
>>
>>101691306
Hi Drummer... go back to slopping
>>
>>101691182
alpin needs to pay for her gender transition surgery, please understand
>>
>>101691320
*47.9GB of Total memory
>>
>>101691344
XIR XIR XIR!!!! PLEASE UNDERSTAND AND REFER TO HIM AS ***THE*** T - H - E ALPINDALE XIR
>>
>>101691320
>Which memory size is my size limit for local models?
16.0GB
>>
>>101691320
You have no idea what VRAM and RAM is. Please stop using shit in Task manager,
>>
So the niggers who made Magnum made an org and it's called coal

Anything else I should know?
>>
>>101691320
Dedicated memory is the amount on the GPU; Shared allows some 50% of motherboard memory to be used for Graphics, but is slower and undesirable; GPU Memory is the total from both.
Ideally you only ever want to be using Dedicated Memory for inference, to keep all the data the model needs on VRAM which is much quicker than system RAM.
>>
>>101691430
I am currently riding tek's 2 inch indian cock
>>
Hi anons, just asking in case someone tried something like that
When you handwrite an example and want to send it to the model to use as reference, should i just copypaste it into my message and tell it to write in a similar manner, or would describing what it has to do and then replacing bots reply with my example and telling it to keep going like this have a better effect?
>>
>>101691409
>>101691432
Ty bros
>>
>>101691430
i`m a barbie girl, in a barbie worlddd life in plastic is fantastic, you can cut my hair, undress me anywhere - imagination, life is my creation!~
>>
Mistral Large settings?
>>
>>101691563
Neutralize, then add some minp if you'd like.
>>
>>101691621
Also 0.2 smoothing because of the characteristic MistralAI overconfidence for every token.
>>
File: file.png (1.88 MB, 1024x1024)
1.88 MB
1.88 MB PNG
the 'site marches on in search of new models to gem
>>
>>101691668
NTA but how do I do that
>>
>>101691713
You set smoothing to 0.2
>>
do we have multiple schizo anons here or is it all petra?
>>
File: image.png (145 KB, 912x710)
145 KB
145 KB PNG
>>101691696
>sponge
>>
>>101691668
>use meme sampler, it will help, for sure!
>>
I have two questions. When looking at the FAQ for gpu requirements, I notice 'precision' in 4-bit, 8-bit, and 16-bit but I don't see an explanation of what that means in the context of LLMs. I understand the idea of 7B and 13B models but the precision has me confused.
Also, I'm looking to maybe upgrade my GPU. Is a 3090 good for text generation? 24gb of VRAM would be a big upgrade over my current setup but the price point is high enough that I'd feel bad if it became obsolete in the next little while.
>>
is it just me or are system prompts absolute useless memes
>>
>>101691819
>When looking at the FAQ for gpu requirements, I notice 'precision' in 4-bit, 8-bit, and 16-bit but I don't see an explanation of what that means in the context of LLMs
most LLMs are trained with f16 precision but because of that they take a lot of memory. People figured out that they can quantize weights to save a lot of space for small quality reduction. Generally quantization hurts smaller models more than bigger ones, I wouldn't use anything below q8 for 7-8B models, q6 for 12-13B models and q4 for bigger models, although some anons say that q2-3 on bigger models isn't that bad (but I think they lie)
>Is a 3090 good for text generation?
yeah, 3090 is quite good for $ per GB
>>
>>101691668
Why do you need to use smoothing? If you want to have another go just increase the temp, that's why you have the little bit of minp in case you want to do that.
>>
>>101691940
top-p is better than meeme min-p
>>
>>101691940
because anon is shilling his own sampler that only adds bloat to postprocessing
>>
>>101686705
>you must pay my jew master! you must bootlick!
slit your wrists.
>>
>>101687198
>technology board
>>>anime
hello? retard?
>>
>>101691933
>3090
You forgot to add "but you'll need several of them" given the requirements you provide for the quants.
>>
>>101692018
I don't think bigger models are worth buying multiple GPUs, LLMs are still pretty bad, regardless of size. 24B is enough to comfortably run smaller language models, diffusion models and so on.
>>
>>101691969
Top p can work too, I just happen to use minp. Just something simple to trim some tokens and the temp, and that's all you need was my main point.
>>
>>101691728
where is that option
>>
File: ugly bastard.jpg (148 KB, 1280x720)
148 KB
148 KB JPG
>>101686705
very cute uniform, it would be a pity if I had an ugly bastard license
>>
Hi all, Drummer here...

>>101689595
>>101689670
>>101689962

I'm sorry but those are not meant for release. If you don't see a description, then it doesn't count and you're not supposed to mind it, especially if it's not under my account (TheDrummer).

Tuners upload their test models publicly all the time, either for accessibility or transparency.

I've even privated the safetensors so that the quanters don't make it worse by creating even more mirrors of it that I have no control over. Would it be better if I place them in a org named BeaverTest to make that clear?
>>
>>101691933
Thank you. I still do not understand what 'precision' actually means in this context though. Is it how close to the prompt the response is? And is the quantization a setting that I change on my end or is it a selection I make when downloading the model itself?
>>
>>101692105
tavern or ooba or llama
>>
>>101692209
A model is a bunch of numbers. Like 1.812347123972397. A less precise version of this number would be 1.8. A model full of numbers like 1.8 isn't as good as the one with numbers like 1.812347123972397, but it's smaller and easier to fit in less memory.
>>
>>101692239
>llama
Is it actually in llamacpp? I didn't see a command line option in the list for it. What's the flag?
>>
For the fellow users of Fimbulvetr and Typhon Mistral : know that mini-magnum-12b-v1.1.Q6_K is even better. Fast, imaginative, descriptive, follow the prompt well, long context, top notch.
>>
File: Untitled.jpg (48 KB, 346x413)
48 KB
48 KB JPG
>>101692105
>>
>>101692289
>>101692289
>>101692289
>>
>>101692301
And which of those two should be the one set to 0.2
>>
>>101692144
>BeaverTest
Thy not name it something like TestNotForRelease, TestNotReadyForUse, or whatever?
>>
>>101692263
I see. And I imagine that precision matters for things like properly identifying concepts? So a less precise model may have fewer identifiers attached to a given word? For example; [DOG] might be precisely identified as (4 legs) (snout) (fur) (tail) (snarling) (golden retriever) (dalmatian) (collar) etc.. while a less precise [DOG] might just include (4 legs) (snout) (fur) (tail)?

Sorry if this is a retarded question. I am still trying to wrap my head around how these models work.
>>
>>101692209
do you know how neural networks are made from weights? LLMs have billions of them. You can lower their precision saving space, for example:
1.4329324553312 - weight on high precision
1.4329 - weight on low precision
It obviously changes the calculations a bit but not that much actually on high quants
>>
>>101692393
probably
associations between concepts that occur less often in the dataset may have a smaller influence on the weights and thus be more likely to be muddied by a lower precision
>>
>>101692393
no, that's not like it works, basically neural network changes words to number, makes a lot calculations inside and on the last layer we choose the neuron (which represents a word or rather a token, but let's not complicate) that was activated the most (it is represented by %). So if you have the sentence:
>The best friend of a human is a
and let's say we only have 3 neurons (cat, dog, cow), the activation in the last layer may look like:
Cat - 15%
Dog - 83 %
Cow - 2%

Now, because we quantized the model lowering it's precision the calculations inside the model will have a bigger error, for example:

1.342 * 0.491 = 0.658922
1.34 * 0.49 = 0.6566
Notice how despite this multiplication being between the same weights (1.342 and 0.491) the result is slightly different due to lower precision. These errors can influence the results in the last layer and we can have something like that instead:
Cat - 19%
Dog - 78 %
Cow - 3%
Which doesn't change the word to pick really but in some cases it can. The bigger quantization the biggest difference in calculations and bigger chance that the model will choose wrong token at the end.
>>
>>101692644 (me)
>The bigger quantization
and by that I mean stronger quantization (like q2,3,4), the lower number next to q, the bigger precision loss
>>
>>101692644
I see. Thank you very much for the example, it was very helpful!
>>
>>101692644
Throw sampler gymnastics into the mix and it'll change significantly. All it takes is one single bad token to poison the context
>>
>>101693002
there is a lot you could add to that, I generalized and dumbed it down as much as I could, otherwise I would need several posts to explain every single detail that adds to the equation



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.