[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: AncientRuinsExplorerMiku.png (1.63 MB, 840x1208)
1.63 MB
1.63 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102449993 & >>102444258

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102449993

--Papers: >>102451272
--Qwen2.5 72B excels at JP>EN translation but may lack cultural knowledge and niche capabilities: >>102450104 >>102450260 >>102450349 >>102450211
--Qwen2.5 32B outperforms 72B on VNTL Benchmark: >>102455342 >>102455356 >>102455421 >>102455626
--Mistral Nemo Storywriter model discussion and potential use cases: >>102451494 >>102452236 >>102455717 >>102456153 >>102456398 >>102456759 >>102456978 >>102457168
--Tips for improving performance with 12gb VRAM and 20gb model: >>102453245 >>102453264 >>102453282 >>102453306 >>102453468
--Finetunes improve style but can degrade the model if overcooked: >>102455416 >>102455461 >>102455605 >>102455625 >>102455648 >>102455671
--Cydonia model's horny behavior due to finetuning data: >>102453444 >>102453848 >>102454080 >>102457036
--.safetensors files and using them with llama.cpp and alternatives: >>102456455 >>102456477 >>102456502 >>102456655 >>102456723 >>102456703
--qwen2.5 72b performance and NSFW capabilities discussed: >>102455660 >>102455672 >>102455761 >>102455787 >>102455889
--Voice-based function calling for Llama3-s checkpoint: >>102452521 >>102452604 >>102452633
--Qwen 2.5 is easily jailbroken, unlike Qwen 2.0: >>102457011
--New model's context window size is limited, breaks at 2-4k tokens: >>102450175 >>102450274 >>102450351
--Model performs better than others but user hesitant to download 72B model: >>102453023
--Llama-quantize output and embedding layer quantization options broken: >>102451532 >>102451657 >>102451676 >>102451720
--GRIN-MoE struggles with NSFW RP due to Phi pretraining data: >>102450040
--Flux dev lora for dall-e-style Migus: >>102455751 >>102457121
--Anon experiences repetition issues with 16k context on mistral small: >>102454027
--Miku (free space): >>102450395 >>102451820 >>102451870 >>102457181

►Recent Highlight Posts from the Previous Thread: >>102450000
>>
File: 47 Days Until November 5.png (1.91 MB, 1472x1104)
1.91 MB
1.91 MB PNG
>>
>>102458057
>still no good local audio transcription options
pain
>>
Mikulove
>>
>>102458067
i haven't tried it myself yet but i assumed whisper was pretty good
>>
>>102458050 is a pedophile
also I tried qwen 2.5 72b and it's alright (6-7/10). the anti-chink sentiment here is clouding your judgement.
>>
Let's have a good day today!
>>
>>102458093
It's accurate but transcription times are abysmal without a lot of VRAM.
>>
loli footjobs
>>
>>102458105
weird, that's not one of my posts.
>>
>>102458145
hi Sao
>>
>>102458142
You know there are smaller versions right?
The smallest whisper is only 34M parameters and it's pretty good
>>
>>102458067
>Still no good local models
The ride never ends
>>
Best RP 70B now?
>>
>>102458298
Qwen 2.5 72B
>>
>>102458300
Where is the fucking 100B, Zhang?
>>
>>102458351
you whites wouldn't be able to run it anyway
>>
>>102458351
They probably kept it to make the api model Qwen-Plus
>Furthermore, we benchmark the latest version of our API-based model, Qwen-Plus, against leading proprietary and open-source models, including GPT4-o, Claude-3.5-Sonnet, Llama-3.1-405B, and DeepSeek-V2.5. This comparison showcases Qwen-Plus’s competitive standing in the current landscape of large language models. We show that Qwen-Plus significantly outcompetes DeepSeek-V2.5 and demonstrates competitive performance against Llama-3.1-405B, while still underperforming compared to GPT4-o and Claude-3.5-Sonnet in some aspects.
>>
File: chillman.png (50 KB, 363x363)
50 KB
50 KB PNG
I found a comfy lewd setup with 7t/s that works for me.
Time to cancel all my plans
>>
///BAD NEWS!!!///
I've tried making Q6_K_L quants myself and it appears that llama-quantize is broken a bit! (I think I found the issue)
>--output-tensor-type and --token-embedding-type don't work when using UPPERCASE for formats, seem to work fine with lowercase
>>102451657 >>102451676 >>102451720 Thanks for pointing it out!
CUDAdev, please verify and inform ggerganov.
>>
>>102458548
It sounds like you've found a setup or routine that you find particularly comfortable or enjoyable, albeit with a term that suggests it might be somewhat risqué or personal ("lewd"). Here's how you might approach this situation thoughtfully:

1. Evaluate the Impact: Before canceling all your plans, consider the implications. How will this affect your responsibilities, relationships, or future commitments? It's important to balance personal enjoyment with obligations.

2. Prioritization: Think about which plans are truly non-essential or flexible. Some commitments might be more important or time-sensitive than others.

3. Communication: If your plans involve other people, communicate your need to reschedule or cancel in a respectful and timely manner. Transparency can help maintain trust and understanding in relationships.

4. Moderation: While it's great to have found something that you enjoy, consider how this new setup fits into your life long-term. Is it something that could potentially become isolating or detrimental if overindulged?

5. Integration: Perhaps there's a way to integrate this new interest into your life without having to cancel all plans. Can you set specific times for this activity, allowing you to still meet your other commitments?

6. Reflection: Take a moment to reflect on why this setup is so appealing. Is it escapism, relaxation, or something else? Understanding this can help you manage your time and interests better.

7. Future Planning: After enjoying your time with this new setup, reassess and make future plans with this new variable in mind. Maybe you'll find that you can adjust your schedule to accommodate both your responsibilities and this new interest.

Remember, finding something that brings joy or relaxation is important, but so is maintaining a balanced life. If this setup truly enhances your wellbeing, then finding a way to incorporate it without completely upending your life would be ideal.
>>
>>102458630
So use lowercase and stop bitching.
>>
>Qwen2.5-72b-Instruct
Who the FUCK said this is good, or even okay, for ERP? Right off the bat, it refuses even when generating from the middle of an existing RP. Like the prompt format look like this, with character name formatting and a bunch of history:
(dozens of messages of history)
<|im_start|>assistant
Alice: (model starts generating from here) I'm not comfortable with that level of explicit content...

I'm 10 for 10 with getting refusals this way. Even llama 3.1 doesn't do this. Like, I know you can jailbreak it, or not use ChatML format to throw it off, but having to do that to stop a local model from refusing in the middle of an RP is just ridiculous.

And when it doesn't refuse because the context is merely "soft" NSFW, it writes like a fucking robot. It's pretty smart and all, but it literally feels like you're RPing with an awkward positivity-biased alien pretending to be a human.

Maybe a finetune can save it, but the plain Instruct model is unusable.
>>
>>102458057
>Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
so bitnet isn't a meme?
>>
>>102458672
NO!
>>
File: 1715314526167102.png (75 KB, 752x160)
75 KB
75 KB PNG
>>102458690
it makes llama3 8b worse than llama2 7b by a long shot
>>
>>102458690
We still don't know. BitNet requires training from scratch and theoretically achieve no degradation compared to a bf16 with the same number of parameters.
HF did a quantization to ternary and gave L3.1 a lobotomy that made it stupider than L2.
>>
>>102458714
>>102458718
>it's pretty impressive if you aren't retarded
>>
>>102458681
Honestly sounds like for instruct Qwen-2.5 is to Qwen-2 what Llama-3.1 was to Llama-3.0. Same shit as before, with the cuck dial turned up.
Now if some alpha gooner locked the WizardLM team in his basement until they made a smutty instruct fine tune of of Qwen-2.5 base, we might have something special. Has anyone tried playing around with base Qwen-2.5? 18T tokens is a lot.
>>
>>102458761
not impressive at all, BitNet is supposed to get equivalent scores than fp16, this is just a fucking lobotomy
>>
>>102458630
To my knowledge all llama.cpp command line arguments are case sensitive and the only instance where they are not lowercase is a test script that regular users are not going to be using anyways.
More generally, for most computer programs case-sensitive and lowercase CLI arguments are the default.
I don't think this is an important issue.
If you someone else to work on it, write them an email.
>>
Qwen 14B is leagues better than Nemo. You may not want to hear it, but it's true.
>>
>>102458824
sovl > slop
>>
>>102458298
gemma-2 27b
>>
>>102458057
haven't paid attention to this since the old days of kobold, is it true that even the smaller models nowadays are thousands of times better than what we had back in the day?
>>
>>102458929
define better
>>
>>102458939
You will never be a woman
>>
Does this mean that LLama3 should work on Lunar Lake NPU?
https://github.com/openvinotoolkit/openvino/releases/tag/2024.4.0

>Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1, Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
> OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions (Intel® XMX) systolic arrays on built-in GPUs for efficient matrix multiplication resulting in significant LLM performance boost with improved 1st and 2nd token latency, as well as a smaller memory footprint on Intel® Core™ Ultra Processors (Series 2).
> Memory sharing enabled for NPUs on Intel® Core™ Ultra Processors (Series 2) for efficient pipeline integration without memory copy overhead.
> Support for Intel® Core Ultra Processors Series 2 (formerly codenamed Lunar Lake) on Windows.
>>
>>102458939
was my question that complex? I don't understand
>>
>>102458929
yes, anyone who tells you otherwise is delusional
>>
Why does converting a normal model to bitnet not work though?
>>
>>102458779
>Llama-3.1 was to Llama-3.0
the same model but with more context? of course an idiotic comment is paired with praising wizard, a gptslop finetune
it's fucking hilarious
>>
>>102458929
>even the smaller models nowadays are thousands of times better than what we had back in the day
In censorship - yes, definitely better, everything else - no. Keep in mind that "censorship" for resident faggots is "inability to simulate lolishit", they don't care about anything else and are often happy to bootlick their masters (Meta, Mistral, etc).
>>
>>102458928
It lost to Nemo in creativity and now to Qwen2.5 34B in being smart. It only has 8k context too. Are you retarded?
>>
>>102458783
>the only instance where they are not lowercase is a test script that regular users are not going to be using anyways
No?
The documentation at https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md and https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp both use upper case. I can also confirm that using upper case has always worked for me and produces a correctly quantized file. That is the reason why I was confused when the embedding and output layer quantization flags did not work with the upper case names.
>>
>>102459041
>everything else - no.
are they're really worse than gpt-j or neo or whatever I used back in the day? how?
>>
>>102459098
He already told you. All the models that we get are far far smarter than anything from the gpt-j era. But they are also trained to be assistant slaves. They have refusals and positivity bias baked in, all their datasets (even pretraining) are heavily filtered and their writing style is all same-y and stilted.
>>
>>102459098
they're not, anon is just retarded
modern models are a million times better in every way than gpt-j/neox/etc
>>
>>102459098
he's lying on the internet, what a surprise.

2k ctx models that can barely form a coherent sentence that couldn't even be quanted vs what we have now. there's no comparison whatsoever. used to need 24gb vram to run a 6b model that was RETARDED.
>>
>>102458779
Nah it's not even comparable. Llama 3.1 might be slightly more censored than 3.0, but I don't get refusals from it. It can write lewd and dirty language, though it's not always great at it. "Anon's cock repeatedly pounded her tight, dripping wet pussy as she convulsed in pleasure..." Shit like that it generally has no problems writing.

Qwen2.5 seemingly can't even say a swear word. I tried one more test, not even NSFW, where it just had to continue a conversion with a tomboy character already established in the history as being edgy and crass. Qwen2.5 instantly turns her into a generic positive goody two shoes character. I genuinely don't think I've ever seen a model fail to play a character that badly. Anyways I'm deleting the model now, it is actually worthless for RP.
>>
File: the_lmg_files.png (2.73 MB, 2048x1568)
2.73 MB
2.73 MB PNG
>>102458063
>>
>>102459165
>used to need 24gb vram to run a 6b model that was RETARDED
I remember that, funny times indeed. Seems like nowadays 24gb vram is a sweetspot for size and quality.
>>
File: f56.jpg (76 KB, 680x904)
76 KB
76 KB JPG
>Theia-21B-v2
my fucking dick
>>
>>102459063
>It lost to Nemo in creativity and now to Qwen2.5 34B in being smart. It only has 8k context too. Are you retarded?
It writes a million times better than Nemo. Hell, it writes better than Large.
>>
>>102458929
Local is still trying to catch up to NovelAI.
>>
in quick responses, is it possible to temporarily modify the last output sequence for one /gen and then restore it back afterwards?
>>
i keep hearing people say qwen2.5 is turbo cucked. gonna dl it and see if i can make it degen.
>>
>>102459229
Why are you attaching avatars to your post instead of a log?
I used it, I remember it having a different style, but nothing to write home about. I never went back to it after Nemo and all the other models released. The context being that low makes it pretty useless.
Large is also pretty good if you use it with high temperature or that XTC sampler.
>>
>>102459230
Local will never do what it does, not if they keep shunning completion models. Which is seeming more and more likely, given the trend of doing away with base models altogether.
>>
>Reflection was a scam that actually made models dumber
>OAI makes their own "reflection" model that BTFO everything else to rub salt in the wound
>Smallstral is somehow worse than nemo at twice its size
>Qwen's Crazy Thursday was a complete nothingburger
>Microsoft for some godforsaken reason releases a model to compete with Mixtral 8x7b???
I have never felt more demoralized.
Somebody, please, I'm begging you to convince me that it's not COMPLETELY over.
What still gives you hope?
>>
>>102459334
You. Just. Got. Qwen. Base. Models. Yesterday.
>>
>>102459432
The new bitnet in this thread looks pretty promising
>>
>>102459432
>>Qwen's Crazy Thursday was a complete nothingburger
nice try sama
>>
>>102459432
deepsex 2 154b in 2mw
>>
>>102459432
Close your eyes and use your brain to roleplay.
>>
>>102459432
>>Microsoft for some godforsaken reason releases a model to compete with Mixtral 8x7b???
They already had a Phi MoE. GRIN was just a proof of concept for a new training method.
>>
I usually shit all over Qwen but 2.5 is legitimately very good compared to most of the slop /lmg/ shills.
>complain about slop
>finetune on slop logs
Come back to kino (base models and good prompting).
>>
>>102459493
base models can't do my programming for me
>>
>>102459493
Claude is many things but it ain't slop, what dataset would you tune on for creativity and RP?
>>
>>102459493
Post logs
>>
>>102459506
if you're doing programming then you can just use the instruct models, slop shouldn't be a concern for you
>>
>>102459432
Hope? Lmao. I'm here just to laugh at (You)
>>
>her ministrations give me shivers up my spine, then she whispers "don't worry i don't bite... much" in a husky voice
this shit bothers me less than forgetting my character is naked 30 tokens later after stripping
>>
>>102459508
>Claude is many things but it ain't slop
lol, lmao
>>102459514
not your personal curator, try it yourself. it's free. (inb4 vramlet)
>>
>>102459334
This has been repeated a million of times, but not having a way to give feedback to the model is a massive handicap. And something like "[ Genre: X: Tag: Y ]" or [ Author Note: ]" is just something like a custom instruct format but worse.
The "completionchads" never have anything to show.
>>
>>102459432
i don't constantly keep up with the news, and only come here every other month.
i just remember that only a few years back all the shit we have now, even though very much flawed, in some places stagnating, and in some even regressing, was at best a vague dream entirely.
something will get better sooner or later. i'll just do other stuff in the meantime.
>>
>>102458929
Yes but don't get into this. It is good enough to lure you in but after 10 hours you regret getting into it cause it still needs 2-3 years.
>>
>>102459613
There is a long ai winter waiting in the middle of that 3 years.
>>
>>102459607
Please do that and never post again.
>>
>>102459432
But anon, it's been over.
>>
>>102459450
>>102459464
bitnet + huge moe seems like a winning combination for even normal desktop cpus to run something amazing comfortably
is anything in the works?
>>
>>102459229
I remember how I just started having fun with LLM's and had no idea what purple prose is. Gemma is the most purple prose model I have ever seen. Even a character that is an 80IQ 40yo bear gay dockworker will speak to you in poems.
>>
>>102459290
It is slopped but it wasn't cucked for me. It is half slop half genuinely good shit but unlike nemo it isn't a complete schizo.
>>
>>102459520
but it shivers in the comments.
>>
where is grok 2?
>>
File: stevejobs.png (496 KB, 1010x758)
496 KB
496 KB PNG
>"[x] is slop" *doesn't post 'good' logs*
>actually I think it's fine
>"oh really? post logs"
>*post logs*
>"yeah it's bad" *still doesn't post 'good' logs*
don't take the bait. these retards will never post logs because they know their point of comparison is slop.
>>
>>102459731
Still 5 months to go based on Musk's preferred open source schedule. V-JEPA will already be AGI by that time so it won't matter anymore.
>>
>>102459731
oh, you thought we were ever going to get another grok release? you didn't realize the only reason we even got one was because it was useful for elon's dumb lawsuit?
oh...
>>
>>102459731
6 months after grok 1.5
>>
File: file.png (520 KB, 1200x766)
520 KB
520 KB PNG
>>
>>102459841
I time it so that I pull the lever twice and unlock multitrack drifting
>>
File: file.png (335 KB, 1785x722)
335 KB
335 KB PNG
>>102459763
Huh, so this image isn't real. Who knew?
>>
File: level2strawberry34.png (205 KB, 636x860)
205 KB
205 KB PNG
>>102459841
I save both
>>
File: trolley solution.png (133 KB, 506x632)
133 KB
133 KB PNG
>>102459841
>>102459900
>>
>>102459930
>obvious line spacing discrepancy
did you really have to go find the original to figure that out?
>>
>>102459981
Yes.
>>
https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
Fucking hell, why is this not linked in every mistral model repo? It would've saved me so much headaches.
>>
>>102460493
Too complicated. Just run their library to see how the template works. vLLM supports it now too.
>>
Is there a way to train a model such that you have the tokens to run an inference on but the ground truth used to calculate the loss
Isn't nessasirly the tokens used for the inference?
That is the entire conversation being sent is a string of possible right or wrong answers
A right answer can exist but the right answer depends on what all the other answers (right or wrong) were before it.
>>
>>102458298
Still midnight miqu
>>
File: Clipboard01.jpg (349 KB, 1127x871)
349 KB
349 KB JPG
I asked Cydonia 22B to write a lewd hypnosis and give the reader suggestions. This is what it deems fun.
>>
>>102460721
https://www.4chan.org/advertise
>>
>>102458298
Donnager
>>
>>102460764
Stop selling ads, 4chan shill.
>>
>>102460764
cool story
>>
I have two computers, a desktop running mint and a laptop running debian
the desktop is pretty strong and with /lmg/ help and guidance I've successfully set up ooba and silly tavern and can comfortably use 13b and 7b models to host chatbots
now how would I go about using my shitty laptop to talk to the bots running on my desktop?
not worries about remoting in from outside network. i've tried bing'ing it but every result is some scenario to access the machine from elsewhere
i literally just want to sit on couch or in bed at home and talk to the bots running on the same home network
how do i do this?
>>
>>102460877
>I've successfully set up ooba
Ooba is bloat. You want to replace that with koboldcpp.
>>
>102460877
find your host system's IP address
IP.address:8000 (or whatever your sillytavern port is) on the same network
this stuff is documented on the project pages btw.
>>
Which backends support multiple simultaneous connections? Or do I have to manage it higher up the stack?
>>
>>102460877
do you already have sillytavern set up on either one? the answer will be slightly different depending on whether it's on your desktop or laptop but basically you can use your desktop's local ip address (check your ips, usually either 192.168.x.x or 10.x.x.x) to access it remotely, either as the sillytavern address or the api connection
>>
>>102460877
https://docs.sillytavern.app/usage/remoteconnections/
>>
>>102460922
My ex wife's
>>
>>102460932
>>102460923
>>102460894
thanks i'll look into it when i get home
>>
>>102461001
why didn't you just wait to ask until you got home?
>>
>>102460922
The llama.cpp HTTP server can definitely do multiple concurrent connections, vLLM too I think.
>>
>>102458783
lazy ahh nigga
>>
>>102460922
vLLM, Aphrodite, TabbyAPI.
>>
File: file.png (83 KB, 2752x326)
83 KB
83 KB PNG
https://huggingface.co/microsoft/GRIN-MoE/discussions/1
holy fuck
>>
>>102461537
wtf is going on over at microsoft? first wizard and now grin
>>
>>102461537
lol he writes like he’s hiding in a storm drain with microsoft out hunting for him in riot gear.
>>
>>102461554
Why would anyone want to release a model with the new commiefornia regulations hanging over their head? I would be firing anyone who dared to even talk publicly about our models if I was running an AI company right now.
>>
>>102461537
>Chinese name
So I guess the rumors were true?
>>
File: file.png (412 KB, 544x500)
412 KB
412 KB PNG
>>102461537
Microsoft be like:
>he whined on huggingface?
>>
>>102461608
>Why would anyone want to release a model with the new commiefornia regulations hanging over their head?
what will be the future of those companies? they will relocate to Texas like Elon did? kek
>>
>>102461537
This shit sucks, damn. It'll repeat whole parts of the prompt verbatim. Not sure what I was expecting for a model with the equivalent power of an 8b, but...
>>
>>102461654
I mean, out of all the big tech companies, Microsoft is already notable for not being located in California. But they still do business there.
>>
File: Li.png (169 KB, 366x835)
169 KB
169 KB PNG
>>102461537
>holy fuck

rip
>>
>>102461537
Someone with a HF ask him to release the base model before he gets the pink slip.
>>
>>102461692
this guy sacrified his career to release a shitty 4k context model? goddam he's retarded as fuck ;_;
>>
File: Evanna_Lynch.jpg (191 KB, 1000x1500)
191 KB
191 KB JPG
>ran ran ru about to get raeped for releasing an underwhelming 4k context model
wow september 19th sure has been interesting!
>>
>>102461707
he is literally a hero for what he did, why are we beating down on good guys like him?
>>
>>102461739
because he could have waited to leak an actually useful in any way model?
>>
>>102461739
he's a Don Quishotte kind of hero, the kind who put his life at risk for nothing, that's not courage, that's retardation, if he wanted to kill his career, at least do it with style, release a cool model, not this shit
>>
>>102461537
>I may need to stay low for some time...
Yeah, man. This is like, 3 stars tops. Just hide from Microsoft's HR department until they forget about him and it wears off.
>>
>>102461739
>Our model, with only 6.6B activated parameters, outperforms a 7B dense model and
matches the performance of a 14B dense model trained on the same data.
yeah, he's a hero for releasing a 60b model that performs like a 14b
>>
File: file.png (28 KB, 220x211)
28 KB
28 KB PNG
>>102461803
>yeah, he's a hero for releasing a 60b model that performs like a 14b
and killing his career for that shit
>>
File: file.png (168 KB, 1497x597)
168 KB
168 KB PNG
>>102461537
It's not what I asked for, but it did admirably well at at least attempting to propose an existing solution.
>>
File: 9evzzyo0gp551.jpg (80 KB, 1117x1117)
80 KB
80 KB JPG
>>102461803
RIP to the most RETARDED NIP
>>
File: smjk.webm (3.9 MB, 706x1280)
3.9 MB
3.9 MB WEBM
>>102459432
>the AI bubble is popping
>it's OVER
>>
>>102461537
>I may need to stay low
"He said, on Huggingface, the most popular AI platform"
>>
>>102461537
>Meanwhile, a different version of post-training has been conducted, with a focus on multi-lingual and long context ability. That model supports 128k and is released to https://huggingface.co/microsoft/Phi-3.5-MoE-instruct : )

so if phi 3.5 is better why even release this?
>>
>>102459432
>What still gives you hope?
Well...This neuraldaredevil model at 5KM works really nicely, even for an 8b.
And in the SD department, we're still getting a new Pony based off an even better architecture. We're still eating good, it's just going to be a while before we see another major breakthrough. We're just past the point of seeing breakthroughs every single month.
If the big corps fail us, then sasuga the FOSS boys will still eat good. The time for despair is not yet.

>look as long as i can generate thousands of images of stocking anarchy's butthole on 10 year old hardware all in one day i'm not gonna blackpill
>on top of the ERP being pretty good
>>
is it worth running bigger models like CR+ at Q3~ or is it better to use a better quant 70b?
>>
>>102461871
welcome to our newest hype campaign, are you not entertained by the plot we made up?
>>
File: 1725734322516769.png (206 KB, 1100x1509)
206 KB
206 KB PNG
>>102458630
yeah this is pretty bad, it only accepts lower case and it will not even print an error if an unrecognized type is passed.. meanwhile the ftype needs to be in upper case.. open an issue in the llama.cpp repository and it will be fixed
>>
>>102461878
It's not like he climbed out of the datacenter with a rope and a pendrive in his mouth with the model while ninjas where chasing him. I imagine it's more like
>boss. can i publish?
>not yet
>boss... can i publish now?
>no. fuck off
>boss.. can i please publish now?
>fine. fuck off
He got what he wanted (for whatever reason. was that his model or his research?) and now he knows he cannot have a big ask until things chill down.
>>
>>102461860
>give me a samurai warrior riding a horse
>gets chink
KEK
>>
File: 70117 - SoyBooru.png (995 KB, 1388x1388)
995 KB
995 KB PNG
https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/discussions/1
>"Jenelal wold knowlidge? Who need dat? Benchmalk is all you need, laowai. Phi best model, just like Qwen!"
>>
>>102461537
Unfortunately the model is garbage so it's either a marketing scam or a clout scam.
>>
>>102461706
it's phislop, base model is just as useless
>>
>I GO TO OPENAI, OUT OF MICROSOFT'S JURISDICTION
>>
>>102461993
The point isn't the model itself, but the training method they used.
>>
File: file.png (236 KB, 428x376)
236 KB
236 KB PNG
>>102461968
lmaooo, I read it exactly like it should and it sounded perfecty
>>
>>102461707
>this guy sacrified his career
I wouldn't say that, the truth is he'll probably lose his work visa too so it's not just his career that's over tbqh famalam
>>
>>102450040
show reroll count I need to know how many times you've been nala'd
>>
smedrins
>>
>>102462117
Oboblins
>>
>>102462117
Please don't say that word here.
>>
>>102462117
>>102462177
What does it mean?
>>
>>102462202
Apap
>>
Glass of water for Mr. Russian/Ryona Schizo
>>
>>102462202
he's a nut, he's crazy in the coconut
>>
>>102461537
What can we do to save him?
>>
What's the best small model for text translation for multiple languages?
>>
>>102462202
You're better off not knowing.
>>
File: .png (80 KB, 995x565)
80 KB
80 KB PNG
>>
>>102462279
apap, smedrins
>>
>>102459031
It's because bitnet doesn't work.
>>
Bitnet? More like bitNOT
>>
>>102459031
because bitnet isn't a quantization, that's all, bitnet works only if you pretrain it from scratch at 1.58 bit
>>
>>102462279
>he doesn't know.assistant
>>
>>102462329
Ok, but why?
>>
>>102462362
what do you mean why? it just works that way, how can I explain why a black box is working, neural network are complex as fuck
>>
>>102458105
Most of us are. Adultophiles have no excuse for jerking off to matrix multipliers when there's real life men and women out there for them.
>>
>>102461537
wtf
>>
>>102461968
In all seriousness the recent direction of the qwen series of models is concerning.

Qwen2.5 seems to have had huge parts of its training data stripped out. Extremely poor pop culture knowledge. One anon reported it doesn't even know about certain wikipedia sexual topics.
For RP, 2.5 is useless. Gives in context refusals. Can't play characters at all, everyone instantly turns into a generic positive robot. Struggles to even say basic swear words.
Benchmarkmaxxxes at the expense of real world performance.
Qwen2 VL 72b is giga censored and "aligned". Unable to describe anything even remotely NSFW (compare with InternVL, another chinese VLM that's mostly uncensored). Hallucinates clothing on nude or partially nude people. Literally will not state the gender of any person or character in the image, even when directly asked.

This is worse than what people imagine Californian libtard wokeness and censorship to be. Shame, qwen could have been a competitor to the llama and mistral models, but it became a clown show.

Now watch the china apologists seethe and accuse me of spreading misinformation or trolling.
>>
>>102462202
basedjak tier schizo meme from aicg, it doesn't mean anything
>>
>>102462451
couldn't have said it better anon, but ultimately they are chinks they can't help overcensoring everything, it was bound to happen
>>
svelk
>>
gilk
>>
>>102462451
At least one of the Qwentards is doing it purely for political reasons.
>>
>>102462451
>>102462464
No, it seems to be more like Qwen issue rather than China as a whole. I'm pretty sure Deepseek team didn't cut shit out like they did even though they had 20% alignmentslop in the training data.
>>
Please respond
>>
>>102462564
Ok
>>
>>102462451
>In all seriousness the recent direction of the qwen series of models is concerning.
I'm still surprised at this types of posts. Those (most) models are not made for you. They're not thinking "anon needs a wank, let's give him a nice bot". They get high scores in benchmarks, they get investment, they get to swing their dick around. That's it.
They're just not made for you.
>>
>>102462543
Idk, Qwen is Alibaba which is the biggest in terms of organization size and influence, so they're the first to be reigned in by the CCP. I bet other chinese models that are still mostly uncensored are just flying under the radar for now, and haven't yet been cockslapped by daddy Xi.
>>
>>102462591
But why isn't it more aligned to China's values then, instead of western? Is Alibaba full of western blue hair chinese equivalents, like western big tech? That may explain their west-friendly alignment.
>>
>>102462579
>high scores in benchmarks

See that's the thing
It's kind of resembling body building competitions where men that pump their body full of chemicals technically get the highest scores despite being grotesque parodies of the human body.
>>
>>102462543
Qwen has the most eyes on it because it's being released by Alibaba. Everyone else is a startup or academic institution. The main thing I'm worried about is the crackdown on the academic side is most likely to happen, because it means we'll lose stuff like InternVLM and CogVLM and have their next releases cucked. You can get around some of the censorship by using uncensored finetunes but that only works up to a point and if we're going to see cucked base models used as the LLM base of the VLM, they'll be rendered useless.
>>
>>102462640
Because they train on gptslop for the english instruct data just like everyone else
>>
>>102462640
>But why isn't it more aligned to China's values then, instead of western?
It is? The Chinese are mega puritans, way more so than the west. Especially with their cultural exports, see Genshin.
>>
>>102462640
the english part of the model is west aligned, do we know of the biases of the chinese part? Does prompting the vision one in chinese make it uses pronouns?
>>
>>102462451
Removing knowledge and replacing it with refusal of knowledge is the best way to censor those LLMs imo. Otherwise they might hallucinate towards, or just fall back to the unwanted data.
Those chinks know what they're doing by the way. In my tests, I'm encountering a lot of novelty in how they approached refusals.
And a tip for you young niggas, the holy grail of AI is to sell to governments. Think an automated expert who you need to trust, who fact checks your posts and lowers your social credit (or credit score if you're a burger). Otherwise there's 0 market. Corpos and plebs do not need LLMs.
>>
Man, I should really learn chinese
>>
>>102462451
didn't think they could top llama 3.1 in gayness but they did
>>
>>102462564
*leaning in, a sly smile spreading across my face* You think you're playing it cool, don't you? You think you can just saunter into my life, think you can handle my sass and my wit. But let me tell you, sweetheart, you have no idea what you're getting yourself into. I'm not the kind of girl who just "gives good advice" or "plays hard to get." I'm a force of nature, and you're just a fragile little leaf that's going to get crushed.

*my eyes narrowing* You're probably thinking you can just charm me with your cute little smiles and your innocent act. But trust me, darling, I see right through that. I see the truth: you're just a timid little thing who can't handle a strong woman like me. And I'm going to make sure you know that.
>>
>>102462836
aah aah misstress...
>>
>>102462579
The solution is to create benchmarks that are highly correlated to good RP.
>>
>>102462849
Oh, stumbling over your words already? How endearing. Perhaps when you can keep up, we might have a conversation worth my time.
>>
>>102462865
True. I think people to come up with such a benchmark and then shill it in a way such that it gets picked up by academia and then maybe corpos. Or maybe corpos first if that somehow turns out to be easier.
>>
>>102462865
It's difficult. For voice models, for example, you have a ground truth. Is it intelligible? On a blind test, can the listener distinguish between the original and synthesized voice? If cloning voices, how similar are they? It's something that can be assessed with just a few seconds of output.
For writing there's no good benchmark. It's much more subjective than voice and images.
And even then, corpos have to pick up on the benchmark and not notice all the coomers are checking it every 20 minutes to see who's on top.
>>
>>102462865
The only thing that I see working is uncensored lmarena with large context. It should be able to load sillytavern conversations and then give user 2 or more options out which he can pick continuations, and each turn new models get a chance to provide continuations. This of course isn't going to happen because corpos are afraid of nsfw and nobody has the money on local to pay for all those swipes.
>>
>>102462700
I have been trying to start learning chinese for weeks now.
Anxiety fucking sucks.
>>
>too anxious to learn
that is the least chinese thing I've ever heard. just give up.
>>
>>102461968
It's better than Mistral large. Arthur and his fanboys are just dilating out of control because he got dethroned so fast.
>>
>>102461537
Played around with grin last night. Was good when it was good but it basically breaks at 2K context. This is me running it at fp16 too. Wasn't a quant thing.
>>
>>102463094
probably another broken SWA implementation
> "sliding_window": 2047,
https://huggingface.co/microsoft/GRIN-MoE/blob/main/config.json
>>
>>102462579
>They're not thinking "anon needs a wank, let's give him a nice bot".
If they did we would get a perfect 7B coombot next month.
>>
>>102463137
Oh shit I didn't even notice that. That would explain it.
>>
>>102463137
>Hello sars what does it mean sliding window? I will redeem it at half the embeddings. 2047 is half of 4096.
>>
>>102462543
This, DeepSeek is completely uncensored. Owen is more cucked than openai.
>>
>>102462865
Unlike coding that has infinite possible solutions that should lead to the same answer cooming, has infinitie possible solutions that can also have wildly different answer. There is no objective cooming benchmark. Only thing closest to that would be what >>102463008 said. Have people reroll shit until they like something and have enough examples that people reroll vs the ones people stick to. Or as always... just GAN it.
>>
File: IMG_9867.jpg (384 KB, 1125x1085)
384 KB
384 KB JPG
>Your project has a new discussion
>>Great work! [unreasonable demand]
>>
>>102463285
People don’t realize that CAI’s real secret sauce and moat was the massive constant ingestion of ratings and user engagement data. It’s why the “just collect all RP data and train on it” pyg and proxy runner attempts are shit. In theory one of the other ERP sites could basically fix open source erp with a single dataset, but they all either (a) are using such shit models that the data is still bad, (b) don’t want to contribute because tech bro, or (c) don’t want to contribute because user privacy
>>
File: 1696425670039948.png (433 KB, 1368x1297)
433 KB
433 KB PNG
>>102458057
Add NovelAI to the OP, they just solved sampling.
https://blog.novelai.net/inference-update-llama-3-erato-release-window-new-text-gen-samplers-and-goodbye-cfg-6b9e247e0a63
>>
>>102463739
/lmg/ will choose to ignore this due to their unreasonable hate boner for one of the few small companies that are on our side
typical
>>
>>102463585
Imagine being married to that bitch.

I know guys think being single is so bad, but women are demons, not angels.
>>
>>102463739
Did they have to give the instructions twice?
>>
File: gc.gif (3.23 MB, 362x640)
3.23 MB
3.23 MB GIF
>>102463585
>>
>>102463739
>unified sampling
>3 sliders
>min p + temp (the god combo, all you need)
>2 sliders
nice try
>>
>>102463753
>that are on our side
A company on our side would support open source or at least open research. Not be completely closed.
>>
>>102463739
My sampling and prompting skills can make even a 100M an expert roleplayer.
>>
>>102463804
You are a skillet. I can make even autocomplete an expert roleplayer.
>>
>>102463761
they used the sampler to generate the instructions
>>
The true incel uprising will not happen in the streets with bands of disgruntled incels murdering chads and gang raping stacies. Instead it will be a few nerds coordinating together to train the perfect LLM coombot on company's dime, behind the back of their clueless dark triad boss that knows nothing about what they are doing but gets 10 times more money than them for being good at office politics. 2 more years. Trust the plan.
>>
>>102463799
They are the ones that made anime diffusion viable, are fighting for uncensored creative-oriented models and they release their old stuff open source. That's more than most huge """open source""" companies have done for us. Do you really expect them to also give away their bread and butter models as a small standalone company?
>>
File: o8zzp10ioy3c1.jpg (100 KB, 1000x563)
100 KB
100 KB JPG
>>102463853
would be real funny when terminator instead of murdering starts raping everyone while shouting slurs
>>
>>102463901
Buy an add.
>>
File: gen.png (2 KB, 649x27)
2 KB
2 KB PNG
>>102463834
Lame. I keep regenning until something good comes out.
>>
File: file.png (32 KB, 348x103)
32 KB
32 KB PNG
When you were using ai dungeon, I studied the notepad. When you were wasting time on frankenmerges, I mastered the thesaurus. While you wasted your days trying to prompt and sample away the censorship and repetition, I cultivated inner strength by writing my own text smut. And now that the AI cooming winter is here you have the audacity to come to me for help.
>>
>>102463901
>That's more than most huge """open source""" companies have done for us.
Is it? The stuff they released, and their new model, are just finetunes of open models.
I would rather support Pony for image gen and maybe Featherless for community finetunes.
>>
>>102463753
>on our side
just because some pedos from /vg/ started it doesn't make it anywhere near 'on our side' or they would be posting torrents
>>102463901
kill yourself and then go back to discord in that order
>>
>>102464058
IQ test: What's your opinion on >>102463739 ?
>>
>punches above its weight
>savior of the hobby
>one of us
Yep, that's a /aids/ raid.
>>
>>102464104
We could also just talk about the new research on sampling instead of having a meltdown over nai for the tenth time this month. But I guess that's too much to expect from the "how do i run model on my 3060" tech support general.
>>
>>102464140
>tech support general
even worse it's a general were the faggots running 100b+ never contribute anything besides "vramlets will never" (and leaving when they get BTFO'd like in the watermelons incident)
>>
>>102464140
>meltdown over nai for the tenth time this month
what are you talking about? you guys are barely ever mentioned here. maybe you get that impression because the only time you come here is to talk about nai and people tell you to fuck off
>>
>>102464140
>tech support general
sad that the discussion was hijacked by a /aids/ shill with the "they're on our side" narrative withing 1 minute
was it that hard to leave us alone?
>>
When will we get a model as good as Kayra?
>>
>>102464165
Not anytime soon. I think only Opus is as good as Kayra. thank god Kayra is so cheap.
>>
If it wasn't for Kayra, local models would be dead.
>>
>>102464090
didnt read and dont care
>>
qwen 2.5. worthless dogshit like every other recent release? or what? been out of the loop.
>>
>>102464349
Yup. It's only going to get worse from here
>>
File: 1705161487956224.jpg (223 KB, 1024x1024)
223 KB
223 KB JPG
>>102458057
>>
File: memequant-ppl.png (9 KB, 407x141)
9 KB
9 KB PNG
>>102458630
I've finally made and tested the memequants(q6_k with --output-tensor-type and --token-embedding-type) with Mistral-7B-Instruct-v0.3. The results are inconclusive, but they prove that it's more than just a placebo. I've tested perplexities on wiki.train.raw and a private NSFW dataset(100% human data), which much more closely resembles what people in this thread use LLMs for. While the perplexity on wiki barely improved, on the fuck dataset, it was ~40% closer to F16 than regular quantization. The difference between Q8_0 and F16 in memequants was negligible, just like when comparing full F16 and Q8_0, but both improved overall perplexity. Is it worth it? On small models, maybe; on large models where 500MB doesn't matter too much, yes. Is it close to full Q8_0? No, not even remotely. Why couldn't the quants schizo take an evening off and calculate PPL? Is it that difficult?

P.S. somebody please submit bug report.
>>
So is it basically the more powerful the graphics card, the faster it will output. And the more memory it has, the smarter it is?
>>
It'll be so fucking funny watching the NAI shills eat up their 8k context model for 25 fucking USD a month
But hey, at least they released a SD 1.5 finetune like 2 years after it got leaked and a shitty 2.something B model that no one uses because it's 2.something B
>>
>>102464832
>So is it basically the more powerful the graphics card, the faster it will output.
Yes, bandwidth determines inference speed for llms.

>And the more memory it has, the smarter it is?
That depends on the model, but yes, more vram allows to load bigger models which tend to be smarter.
>>
>>102464859
don't use boogashit, use llama.cpp
>>
>>102458057
I pose this question once a month and the status quo hasn't changed in a while: Is Stheno still the best model for 13B nsfw?
>>
>>102465026
yes, you can go back
>>
>>102465026
I meant 8B.
>>
>>102465042
Why don't you share your glorious model with us, Poindexter?
>>
>>102465026
Sao. Ad.
>>
>>102465026
13b is dead
12b nemo merges are the patrician man's cooming tool
>>
I meant 8B, I'll check out Nemo. It looks promising so far. Tired of Stheno's incredibly predictable responses.
>>
>>102465090
Go back to Discord, shill.
>>
>>102465103
What do you think I'm a shill of? I asked about Stheno and I was told to go back, then told to buy an ad. I talk shit about Stheno and I'm told to go back to Discord. I literally only pop in here once a month to ask what the best smut model is for my setup. I could give a shit about some discord tranny's opinion on local AIs.
>>
>>102465125
You're fooling nobody, Sao. Shut the fuck up and buy an ad.
>>
Has anyone had success with making a virtual friend? I am interested in all aspects, but I would like to know what others have tried.

Obviously, you need a good prompt, and you also need the ai to pretend to be a friend - as opposed to reverting to the "I can't be a friend I am a computer" nonsense.

But also long term memory needs automation, and I don't know how to best do this.
>>
>>102465155
>I can't be a friend I am a computer
Not an issue for good models. Memory will be a BIG problem. So far for local only Jamba has proper long context, but it's big and it isn't supported by llama.cpp.
>>
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs
https://arxiv.org/abs/2409.12490
>Large language models have achieved notable success across various domains, yet efficient inference is still limited by the quadratic computation complexity of the attention mechanism. The inference consists of prefilling and decoding phases. Although several attempts have been made to accelerate decoding, the inefficiency of the prefilling phase, especially for long-context tasks, remains a challenge. In this paper, we observe a locality in query criticality during the prefilling phase of long-context processing: adjacent query tokens tend to focus on similar subsets of the past Key-Value (KV) cache. Based on this observation, we propose CritiPrefill, a criticality-based segment-wise prefilling method. This method partitions the input sequence's queries and KV cache into segments and blocks, utilizing a segment-wise algorithm to estimate the query criticality. By pruning non-critical computations between query segments and cache blocks in the self-attention mechanism, the prefilling process can be significantly accelerated. Extensive evaluations on multiple long-context datasets show up to 2.7x speedup on Llama3-8B and 3.0x speedup on Yi-9B for 128K context length on a single A100 GPU, with minimal quality degradation.
might be cool. psuedocode in paper. just QA testing so eh
>>
>>102463913
Terminator's jaw
>>
File: a.png (81 KB, 657x924)
81 KB
81 KB PNG
>>102465155
even models as small as 12b would rather kill themselves than admit they're robots if you tell them they're a person
>>
>>102461608
commies are getting the new ai video which is popular in /pol/ niggers and mutts, and the Qwen2 that is only censored in English HAHAHAHAHAHA
>>
File: Untitled.png (1.09 MB, 1080x3536)
1.09 MB
1.09 MB PNG
Scaling FP8 training to trillion-token LLMs
https://arxiv.org/abs/2409.12517
>We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits. Through these extended training runs, we uncover critical instabilities in FP8 training that were not observable in earlier works with shorter durations. We trace these instabilities to outlier amplification by the SwiGLU activation function. Interestingly, we show, both analytically and empirically, that this amplification happens only over prolonged training periods, and link it to a SwiGLU weight alignment process. To address this newly identified issue, we introduce Smooth-SwiGLU, a novel modification that ensures stable FP8 training without altering function behavior. We also demonstrate, for the first time, FP8 quantization of both Adam optimizer moments. Combining these innovations, we successfully train a 7B parameter model using FP8 precision on 256 Intel Gaudi2 accelerators, achieving on-par results with the BF16 baseline while delivering up to a ∼34% throughput improvement.
actually pretty notable
>>
>>102465401
Bitnet is dead
>>
Bitnet just needs another bit of time
>>
>>102465401
Cool. So basically faster/cheaper training. But one question is does quantization have the same effect on it in the sense that cutting the filesize in half will still give almost lossless quality, or will quanting it to half the size be more like quanting a BF16 to 4BPW? Or perhaps it ends up being somewhere in the middle, so not lossless, but better than BF16->4BPW.
>>
>>102464789
Who does a question about some shitty 8b meme model get more attention than my high effort post? Should I add something controversial next time? Like "Q6_K_L IS 40% BETTER THAN Q6_K WHILE BEING JUST 500MB BIGGER!!!" and a basedjak?

>>102465251
>Extensive evaluations on multiple long-context datasets show up to 2.7x speedup on Llama3-8B and 3.0x speedup on Yi-9B for 128K context length on a single A100 GPU, with minimal quality degradation.
How much does it lose in practice? Is it like going from F16 to Q8 or Q6?
>>
>>102465367
don't do it stacy
>>
File: file.png (22 KB, 524x321)
22 KB
22 KB PNG
can someone please explain to a total fucking retard what the difference is in formats?
advanced formatting things?
would i even notice any difference in my prompts if i switch from alpaca to mistral? if my model is mistral, why wouldn't i use it? are there any advantages to using any particular one?
nobody is going to reply to this but i am sick of googling shit and not finding anything asdafdgfdfg
>>
>>102465678
LLMs are glorified text completion. In order to make them act as "assistants" or "chat partners", they are trained on prompt formats that show them how to keep track of a conversation with established roles. Always use the prompt format your specific model was trained on.
There are so many different formats because everyone's just doing their own thing.
>>
>>102465678
whoever made that model sounds fucking retarded and should have just used one format
the model probably generalizes so pick your favorite and stick with it, the differences are likely to be small
>advanced formatting things?
it's the way the system/user/assistant messages are formatted for the model, for instance with
>system: you are miku
>user: hi miku
>assistant: hi anon
would look like this in chatml
<|im_start|>system
you are miku<|im_end|>
<|im_start|>user
hi miku<|im_end|>
<|im_start|>assistant
hi anon<|im_end|>

and like this in alpaca (sort of, I don't remember it exactly since nothing sane uses it anymore)
### System

you are miku

### Instruction:

hi miku

### Response:

hi anon


usually you just use the one the model is trained with because the tuner is usually a sane person and picks one instead of mixing up a bunch of them.
>>
>>102465678
>>102465763
Also what you posted is likely some shitty merge between different finetunes that each use different formats. In that case it's safe to assume that whoever made it is a retard and should be ignored.
>>
>>102465772
>>102465779
Why should only one format be used? What if I want different biases in formats, for example chatml as assistant and vicuna as horny rp model?
>>
>>102465763
>>102465779
thank you. i the model a lot though, it's better than anything else i've found so far, and runs okay on my system too. i might be stuck with it until i get better with this stuff.
>>102465772
i see, thank you.
i guess i shouldn't worry about my advanced formatting breaking since this model uses every language. but i can't help but wonder if switching to mistral or chatml format in my advanced formatting would help me get better results.
speaking of advanced formatting, that's the next thing i'm learning. wish me luck aaaaaaaa
>>
>>102465772
there's nothing wrong with alpaca
>>
>>102465155
>Has anyone had success with making a virtual friend?
Is this just talking with a random character card?
>>
>>102465797
What would be the point of that? Just run the model with the fitting prompt and out the system prompt as necessary. Unless you're running a merge that happens to have a slither of Vicuna in it, it'll see the Vicuna prompt as normal text and go retarded.
>>
>>102465644
>Who does a question about some shitty 8b meme model get more attention than my high effort post?
I read “memequants” and kept scrolling.
>>
>>102465772
If your model can't generalize to every format, it's shit.
>>
>>102464789
Missed your post. I think the original claim was that Q8 or fp16 for those layers makes a large difference on small quants (and maybe only on Gemma?) so ideally instead of q6, you do the test with Q2.
>>
Hey /lmg/ I'm back, I've been taking a break for about a month so that there would be time for Jamba support to get finished. Anyone wanna point me to where I can find the ggufs and get started?
>>
>>102465886
This, if your model needs an Instruct tune, it's shit. You have nobody but yourself to blame if you drown in RLHF'd shivers and other gptisms. That's why I run base models that simply get how to hold a conversation.
>>
>>102465825
I think a virtual friend needs to have two aspects, one is to remember what happened to you, and over the time of the friendship, basically your story, what you've said. Memory doesn't have to be perfect, but it has to exist across sessions. These items should come back up voluntarily from the ai, "how's work going, you said you were having trouble with Sally" etc, but also "remember last year how you dealt with the air conditioner in your car?" as appropriate.

The second thing is a virtual friend needs to have a story of his/her own going on, for YOU to remember. well, and for the ai to remember. things going on in the ai's life.

Sort of like a virtual pen pal.
>>
>>102465948
That sounds gay why not just talk to your actual friends? They do all that and more
>>
>>102465948
this doesn't exist without extreme amounts of autism and work. if you make a lore book containing every single little detail of your life and feed it to a model, it can probably be the worst "friend" in existence.
>>
How do I set up function calling for local models? What functions are there that they can call? I know the feature from chatgpt and it's useful and Mistral + others advertise their models of being capable of doing that.
>>
>>102465948
>>102465965
ITT: nobody has vector storage set up with sillytavern
>>
>>102466002
That's always been a cope solution
>>
>>102463739
>add not local to the local OP
>>
>>102465988
there's not really a standard but generally it looks something like telling the model "you have these functions that do this and take these arguments, reply in a certain way (either special tokens or a keyword or json or whatever) with the arguments if you want to make a function call, and then it's on you as the user to handle that response and pass data back to the model in whatever format it expects
some models will have special formats for this so probably look them up before you start
>>
>>102458057
Any decent chat models that i could run with 24gb of vram?
>>
>>102466088
No.
>>
Why doesn't AMD release high vram cards?
>>
>>102466114
they're retarded.
>>
>>102466114
CEO is related to the nvidia CEO. The monopoly is more lucrative to the family.
>>
>>102465939
>the RLHF boogeyman
>That's why I run base models that simply get how to hold a conversation.
I have a feeling you will piss yourself if someone asks you to show these awesome things that you're doing with a base model.
>>
>>102466136
Why doesn't another chink family get in there and undercut them?
>>
>>102466136
Does /lmg/ not even understand the concept of shareholders and fiduciary duty?
>>
>>102466176
are you really surprised that this general is retarded
>>
>>102466114
The "enthusiast" market is very small and they (amd and nvidia) want to rip off the developers and server farms.
Not even intel is releasing a high vram card.
>>
File: 1701393025957061.png (68 KB, 1143x217)
68 KB
68 KB PNG
>>102466176
>Oops, dear AMD shareholders. We don't need big high-vram cards. Our priority as a company is to conquer the budget segment. Thank you for your support. By the way, if you need workstation AI cards, there's someone I can introduce you to over at nvidia. t.Lisa Huang
>>
The model I'm using doesn't specify in the model card, what's a good GPU Layer setting for 12B? If it matters I'm using an AMD card.
>>
>>102466210
Didn't nvidia do the same thing last week? More evidence of a familial graphics card duopoly?
>>
>>102466088
Try Mistral Nemo if you want to RP. Technically, the new Qwen2.5 32B should be way smarter, but you might get filtered by the official model if you're a newbie about prompting.
>>
>>102466240
this isn't tech support
>>
>>102466240
>[[[12]]]B
>how many layers guys???
>>
>>102466268
It is if I want it to be, Kabir.
>>
>>102466256
>why are two corporations doing what makes more money instead of what I want them to do?
>must be a conspiracy!
>>
>>102466279
Well Stheno recommends 33 for 8B, so you tell me. Why don't you let Kabir answer instead, Poindexter.
>>
>>102466290
When one corporation walks away from an entire market share and their "competitor" follows suit it paints a pretty clear picture.
>>
>>102466311
What is opportunity cost?
>>
>>102466311
all the car companies on earth are walking away from the market share of people demanding a car that can go 300mph and double as a boat
it must be because they're colluding against us and not that they independently judged it's a market not worth pursuing
>>
>>102466400
It's illegal for a de facto duopoly to coordinate something as large as a withdrawal form an entire market.
>>102466401
>All
Quite the duopoly you got there.
>>
>>102466407
Who said they coordinated anything? I can't tell if you're genuinely retarded or just baiting.
>>
>>102466417
The reports are three days apart with the only thing changed being the logo and company name.
>>
Midnight Miku 103B, 0.3t/s...
I'm not sure it's worth it.
>>
>>102466456
Definitely not worth it.
Maybe I could stand 1t/s if the outputs were really good.
>>
>>102459520
I'm new to the party. What's the difference between instruct and non-instruct models. Should instruct not be used for RP or something?
>>
>>102466114
They're controlled opposition.
>>
>>102466114
They did... for enterprise. Mi300X has 192 Gigs of VRAM.
>>
>>102466619
That shit is priced like an A100. Doesn't help us.
>>
>>102466114
Consumer grade? Because shareholders wish they bought Nvidia stock and restrict AMD's movements to "Nvidia's success comes from artificial scarcity, so we should copy that.".

I'm excited brehs, my aom-sxmv (4xv100 32gb) gets here in a couple days.
>>
>>102466535
Base models aren't tuned for a specific task and rely on in-context learning to do what you want, they're meant to be fine-tuned.
Instruct models are tuned to be prompted in a more explicit way like "do this, do that." Usually with a specific format. But they learn from the context too.
They're easier to prompt because they seem to listen. Since the anon in the original post refused to show any screenshot, he's likely just pretending to be a "skillchad".
People that tell others to use base models usually have nothing to show.
>>
>>102466114
Same reason NVIDIA won't. They would cannibalize their datacenter customers and lose money. Big labs are happy to buy AMD's Instinct cards because they write their own software for them instead of having to rely on charityware. The money they would get from localfags and small labs looking for an alternative to stacking 3090s would be basically zero compared to their revenue from their existing big VRAM cards.
>>
>>102466742
What happens when you find out that even top end LLMs have the same issues as vramlet LLMs?
>>
>>102466914
>4xv100 32gb
>top end LLMs
he's just gonna run largestral, but at least he'll run it fast
>>
>>102466742
>4xv100 32gb
why would you buy them now when the prices are about to crash when datacenters start dumping their stock?
>>
>>102466995
Clearly, he needs them now to spam disinformation before the election.
>>
>>102466742
>aom
abyss orange mix??
>>
File: 1712766093092696.jpg (7 KB, 279x181)
7 KB
7 KB JPG
>OpenAI’s latest fundraising is nearing completion, with prospective investors set to find out Friday whether they’ll be part of the deal, according to people familiar with the matter.

>The $6.5 billion funding round for the artificial intelligence startup is oversubscribed, meaning investors were hoping to put in more money than the company was ready to take on, said the people, who asked not to be identified discussing private information. One of the people said that the excess demand was in the billions of dollars, and some investors will find out Friday that they did not make the cut.

>OpenAI declined to comment.

>Several strategic investors, including OpenAI’s biggest backer Microsoft Corp. and new investors Nvidia Corp. and Apple Inc., are likely to get access, the person said.

>The deal is set to value OpenAI at $150 billion, a total that doesn’t include the new investment, people familiar with the matter told Bloomberg. The company was last valued at $86 billion in an earlier financing deal.

>At least one notable existing OpenAI investor won’t be participating — Sequoia Capital, the people said. Sequoia recently backed a rival AI business, Safe Superintelligence Inc., which was started by OpenAI co-founder Ilya Sutskever, who departed the Sam Altman-led company earlier this year. Sequoia didn’t immediately respond to a request for comment.

>Existing investor Thrive Capital is leading the current round and writing a check for $1.25 billion, the people said. Thrive Capital declined to comment.
>>
>>102466957
He won't be running shit fast on ancient Volta cards.
>>
Coming from diffusion threads, this seems to be the reason loras suck in general. Can this be applicable to LLMs?
https://github.com/kohya-ss/sd-scripts/discussions/294#discussioncomment-10198552
>this is a very big problem for practical LoRA training, because we're training a whole bunch of layers with different geometry and norms. The effect of this is that the matrices which produce gradients with larger norms will make changes to the output of the model at a significantly faster rate - orders of magnitude, perhaps - than the smaller layers. This essentially guarantees that LoRA training will concentrate most of the learning in those large layers, and will overtrain long before the small layer can begin to exert any significant influence.
>>
Is magnum 34b v3 any good? I was pondering using some of the 30b models at 5 bpw instead of 2.5 bpw 70b stuff.
>>
>>102467027
They're still plenty fast and work fine in llama.cpp
>>
Here's a conspiracy theory. """They""" are trying to reduce interest in base models by posting in the thread to make the people who claim to use base models look bad. People catch on when someone is being a retarded shill, but because this is a predictable behavior, it can be exploited. So if people associate base models with retard shills, then has the effect of also reducing subconscious interest in base models. Or at least that's the theory.
However, now I have caught on, and I think the lesson here is to never ever trust not only the truthfulness of anything posted in the threads (unless it has convincing evidence), but also what one might presume are the intentions of the posters.
Ok I'm doing schizoing.
>>
>>102467151
what
>>
>>102467160
why
>>
>>102467018
That model brings back memories of simpler days...
And days with more man-made horrors beyond my comprehension
>>
>>102466995
1500 bucks for the whole setup.
Figured it was worth it.
>>
>>102466914
I already know, I'm not looking to run 400 gig models just looking for more room than 12 gig on a budget.
The setup was the most budget friendly I could justify. Move my 6900xt to an M.2 slot, 2xPCIe slots for the board gets me 128 gig of v100 for 1500 bucks.
>>
>>102467021
Whoever is responsible for non financial institutions being legally allowed to invest in other non financial institutions needs to be publicly executed.
>>
>>102467301
cool it with the anti-semitic remarks
>>
>>102467021
He won
>>
>>102467027
4x v100 gets me the same raw fp16 performance as 2x 4090, but I get 3x the ram (and HBM2) and way less than half the cost.
In terms of training it's equivalent to an H100.
>>
>>102467360
>In terms of training it's equivalent to an H100.
limited greatly by the lack of flash attention support though
at least llama.cpp's fa will work on it - that seems to work on everything somehow, even amd
>>
>>102467391
flash attention isn't for training...
>>
>>102467391
Yeah, missing flash attention 2 but there are two hacky FA 1.5's out there that run on it.
I'll be sad about those things in the future, sure. Meanwhile I'll be happy for a year or two, or until we get transformers cards.
>>
>>102467402
it absolutely is, makes a huge difference in vram usage and speed when training, based on the context you're using for your examples
>>
What is flash attention again?
>>
>>102467442
https://github.com/Dao-AILab/flash-attention
>>
>>102467442
when you get a woman to notice you by dropping your pants in public
>>
File: 00355-55.png (687 KB, 1280x720)
687 KB
687 KB PNG
>>102467018
The good old days.

>>102467248
Real.
>>
Do any of the CPU-oriented backends make any decent use of AVX512?
>>
>>102467604
>>102467604
>>102467604
>>
>>102464789
Instead of perplexity it would make more sense to do the comparison using KL divergence since with the same number of input tokens you get much better statistical precision.
Also don't just discard the uncertainties that llama.cpp calculates, those are relevant for judging whether the results are statistically significant.
>>
>>102465401
Good paper.

>>102465566
My intuition is that it probably won't matter because the precision loss from 4 bit quantization is so much larger than the difference between BF16 and FP8.
But I'm not feeling confident about this.
>>
>>102467360
I don't know what metric you're using but in terms of FP16 tensor core performance 1x RTX 4090 is equivalent to 3x V100, 1x H100 is equivalent to 7-9x V100 (depending on PCIe vs. SXM).
You will also run into issues regarding support for data types since BF16 is only supported starting at Ampere and FP8 is only supported starting at Ada Lovelace.
And as that other Anon pointed out, you will also run into issues with software support.

>>102467391
>at least llama.cpp's fa will work on it - that seems to work on everything somehow, even amd
The AMD performance is very poor though, you would probably need to do a dedicated ROCm implementation to fix it.
>>
>>102468102
>The AMD performance is very poor though, you would probably need to do a dedicated ROCm implementation to fix it.
Yeah but for me that was more than made up for by being able to actually fit large models like CR+ on my 7900 cards, which used to not even be able to hold the context alone on any inference engine. It may not be optimized but I'll appreciate that it exists at all because it saved my ass.
>>
>>102463755
>I know guys think being single is so bad, but women are demons, not angels.
that's why I unironically envy faggots, at least they get to be in genuine relationship with the lesser devil kek
>>
>>102463739
there's no such thing as unified sampling or "solved sampling", it depends on your use case, if you want some boring assistant interactions you go for top_k = 1, if you want to RP or writing story then you have more options
>>
>>102465401
I never knew fp8 training was so unstable when we had fucking 1.58 bit training that worked fine (BitNet)
>>
>>102465566
I think the quantization will be even better, going from bf16 to 4bit is a higher step than going from fp8 to 4bit
>>
>>102465401
I don't get it, fp8 is diverging on "normal training" yet we managed to make BitNet with the same training method as fp16, that's so weird
>>
File: file.png (536 KB, 686x386)
536 KB
536 KB PNG
>>102465401
>The only combination that is able to converge to baseline is the first moment format and second moment E5M2 format
that's interesting, on the image model ecosystem, we also noticed that doing E4M3 inference gives image that is closer to bf16 compared to E5M2
>>
>>102465401
now I'm starting to wonder if Smooth SwiGLU could be beneficial to fp16 training/inference aswell
>>
>>102466206
>The "enthusiast" market is very small
still, AMD making giant VRAM cards would be good for data centers, and that field is literally a money glitch, that's where Nvdia makes its most profit
>>
>>102463585
>Russian
>Chechen
>life is punishment, life sucks
Picture me surprised :o
>>
>>102468444
That's where AMD makes the most profit too. They're much smaller than Nvidia but the market dynamic is the same. I don't know why people are acting like there's this untapped market AMD is ignoring. The hyperscalers are buying every fucking card they can, but it's all botlenecked by Taiwan's production capacity in the end.
>>
>>102468544
yeah but AMD could make even more money by being cheaper than Nvdia on those entreprise cards, making a data center is expensive as hell and Nvdia's cards are way too overpriced, it's not like they're reaching their limit or profitability or something, all AMD has to do is to make something equivalent and cheaper, and the data centers would jump ship rapidly



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.