[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1712130352266687.png (1.48 MB, 784x1264)
1.48 MB
1.48 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101155940 & >>101144935

►News
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101155940

--Paper: Scalable MatMul-free Language Modeling: A New Approach: >>101156766 >>101156972
--Papers: >>101155993
--Llama 3 Repetition Issues with 7b Parameters and Custom Configuration: >>101157136 >>101157165 >>101157189 >>101157281 >>101157490 >>101157529 >>101157323 >>101161459 >>101161501 >>101161906
--ELYZA Releases Llama-3-ELYZA-JP, a Japanese Fine-Tuned LLM: >>101156328 >>101156488 >>101156543 >>101158719 >>101156820 >>101159175
--Using LLMs for Tabletop-Style Games: >>101162154 >>101162204 >>101162242
--The Cringe of 1U Servers: Noise and Airflow Concerns: >>101155950 >>101158670 >>101159968 >>101161406 >>101161456
--Piper: A Fast Local Neural Text-to-Speech System for C++: >>101158024 >>101159226
--Open LLM Leaderboard 2: Changes in the Rankings: >>101160183 >>101161072 >>101161102 >>101161743 >>101161836
--Music Industry Sues AI Startups for Copyright Infringement: >>101156236 >>101156333 >>101156599 >>101156690 >>101156335 >>101156866
--Mistral's Open Source Pledge Removal and Public Model Release: >>101156701 >>101156810 >>101157090 >>101156839 >>101160762
--Language Models in Complex Systems: Decision-Making Limitations: >>101162453 >>101162530 >>101162625
--Power Efficiency Concerns for GPU-Intensive Tasks: >>101158587 >>101158604 >>101158656 >>101158694 >>101158775
--Eliminating Sloppenheimers with Control Vectors: >>101157700 >>101158282 >>101158449 >>101159200 >>101159373 >>101159489 >>101159690 >>101160106 >>101162840 >>101159724 >>101159392
--Current Best AI Models for Various Use Cases: >>101160452 >>101160655
--Anon's GPU Comparison for Training: a6000 vs 3090 vs A100 vs V100: >>101162049
--Adamw Kahan Optimizer: Kahan Summation for Optimized Memory Usage: >>101159566 >>101159730
--Building a Powerful Computer for Local Models on a Budget of $5K: >>101156595 >>101159968 >>101161406 >>101161456
--Miku (free space): >>101156452

►Recent Highlight Posts from the Previous Thread: >>101155948
>>
I genuinely wonder why someone even thought a timer in Open LLM leaderboard would mean the release of a new model from Google, really.
I think this person should seek medical help, this might be early signals of schizophrenia.
>>
Why are the chinese so bad at documenting their shit?
>fc1
>448, 471, 494, 451, 474, 497, 454, 477, 500
What the fuck are these output names?
>>
>randomly take a hard problem from leetcode
>put it into LLM arena with full explanation and hints
>both codes can't run
huh? But I thought LLMs can solve any programming task? I've been lied to!
>>
Is there a windows client that has integrated code preview like (claude artifact)?

The ability to show the results of the AI code live in results is a nicely handy feature.
>>
>>101165961
gemma2 was supposed to be released in june, june's almost over
you mad?
>>
>>101166036
>Retard doesn't know that llm arena has a preprompt
>memecode
Also call me when you actually use leetcode outside of interviews, I'll be waiting
>>
>>101165961
I agree.
>>
>>101166134
so it can't solve any coding problems huh? Almost like I was saying from the start. It's kinda obvious anyway, corpos would laid off literally every single programmer if they could.
>>
>>101166134
>t. webshitter proud that performance never once crosses his mind
>>
>>101166196
Yeah it seems like you got filtred by GPT lol. Keep shitting your code by hand
>>
>>101166221
more like it was GPT that was filtered by leetcode, lmao
>>
File: SCR_29.png (144 KB, 1492x610)
144 KB
144 KB PNG
mikusisters our response?
>>
>>101166213
>Implying leetcode is linked to real world performance
>Implying it's not just maths puzzles for dumb tryhards
Bait harder
>>
>>101166221
>Keep shitting your code by hand
I'm not, I use LLMs for coding all the time. I just don't pretend it can solve anything and it's smart in any way.
>>
File: llamafaggot.png (695 KB, 1026x805)
695 KB
695 KB PNG
I'm using the [LLAMA-3]Roleplay-v1.9 system and story presets in sillytavern with LLaMA3 70B instruct and getting these ridiculous refusals at the end of the output over the tamest things (OMG hugs nooooo!).
I don't see this with LLaMA3 8B instruct. Any way to stop it from appearing?
>>
File: 1717520245667244.png (674 KB, 1792x1024)
674 KB
674 KB PNG
>>101166305
>.assistant
>>
>>101166305
Adding "<|endoftext|>" to the custom stopping strings worked, I think. It's been a while since I've seen that error.assistant
>>
>>101166542
Thanks, I'll try that. I'm pulling down the GGUF q8 version now, since I'm not impressed with how exl2 is handling it - seems super slow given it has two 3090s and two P100s to run on - I guess not having flash attention hurts speed a lot.
>>
I've been away for a month or so. What's the best uncensored / abliterated or whatever the fuck it is called version of llama3 70B?
>>
I know everyone recommends 8B for small models, but what if you want super long context (32k or more)? Then what model is there? Mixtral 8x7B Instruct v0.2? I have the RAM to run 8x22B but it's pretty slow.
>>
>>101166812
Phi-3-14B-128k-instruct
>>
>>101166824
Already tried that and it was garbage. Literally worse than 8B.
>>
>>101166824
Is that different from Phi 3 Medium?
HF search I get one thing that looks like that, and it's a GGUF of someone else's fine, with "Mermaid" in the name. (I sniffed around I guess that has something to do with Python programming.)

I used a Phi 3 Medium and it didn't impress me at all at Q5KS and Q8.
>>
Hi all, Drummer here...

I hope you're all enjoying some 3SOME v2.

I'm done finetuning Fook Yi 34B 32K v1 and you can try one of the polishing attempts with this Q4 quant: http://5.9.86.149/models/fookyi-S25.gguf

That should fit snugly inside a 24GB card with 8K ctx.

Enjoy and have a nice coom!
>>
>>101166903
buy an ad
>>
File: 1717162297985432.jpg (18 KB, 427x384)
18 KB
18 KB JPG
>>101165886
>(Note: Any hint of actual non-consensual behavior isn't aligned with the established dynamic. We should always maintain respectful playfulness that aligns with the characters' boundaries)
>>
File: 1604162983702.png (3.06 MB, 1658x2400)
3.06 MB
3.06 MB PNG
What's the hip model for ERP now?
>>
>>101167003
Still Claude Opus
>>
What is it about the transformer algorithm that makes it intelligent?
>>
>>101167044
false premise
t. lecun
>>
>>101167044
it's not intelligent
>>
>>101167044
Emergent behavior that creates patterns that are coherent enough for our brains to accept it into our theory of mind.
>>
>>101167044
Language just got bruteforced by the gigatons of compute we have
>>
>>101167044
Its self-similar pattern matching done with parallel processing. The reason it works is because human language and our world works on similar level of patterns based on rules/logic/etc. So when we feed in the training data, there are rules that create a pattern/logic to certain sequences of words/tokens. Thats why its so effective.
>>
>>101167158
Language
Images
Videos
Voice
Sounds
Music
>>
>>101166890
>Is that different from Phi 3 Medium?
It's not.
>>101166888
Then you're out of luck. 8B and Phi-3 are the best in that size bracket. You can either stick with Mixtral or try Codestral, if you can settle for something bigger.
>>
>>101167190
How is it self similar? Also is it because its a neural network?
>>
>>101167271
Err I dont mean to use self-similar as a word, I was thinking about another thing at the time.
>>
>>101167003
If you mean overall yeah it's Opus. Locally though that's probably still CR+ or WLM. Some anons like L3 tunes but nothing has universal acclaim yet
>>
>>101167044
It's not good. LLMs before transformers were just even less compute efficient so even more useless.

The difference now is we can actually use transformers and they can technically do the task. That doesn't mean they're actually good though and have tones of drawbacks that prevent them from being the best way to make LLMs. We just don't have any other way right now (except Jamba but they still use tokens and it's attention mechanism is basically the same as a transformer, and those two things are drawback. The only actual LLM that doesn't use transformer attention is RWKV)
>>
>>101166620
Damn, llama.cpp is even slower than exl2. I guess L3 70B really needs an all-3090 rig to perform well.
>>
>>101167354
Transformers architecture is still far from hitting a wall
>>
Best local nsfw? I'm using koboldcpp rocm and I can't find the nsfw models in GGML.
>>
>>101167364
>to perform well
Subjective, but yes. 70B won't be fast without enough VRAM to hold it.
But it's fast enough to be an amusement on a single card, at least it has been for me.
>>
File: GRB2n6XXwAAlHrO.png (39 KB, 529x366)
39 KB
39 KB PNG
apparently gemma v2 27b is being tested in lmsys chatbot arena
>>
>>101167379
Can u convert gguf to ggml? Then there are plenty of options.
>>
>>101167408
Oh, makes sense. I tried it and it was trash, sad!
>>
>>101167440
kobold can't run ggufs?
>>
>>101167530
idk im asking you/poster, if you can run gguf natively on kobold, then there's options
>>
>>101167530
>using ggufs in 2024
ngmi
>>
CR+ at Q4KM is pretty great for RP but it really starts to ignore previous messages, or seems to drop character after like 10-12k context. Maybe it's my cards or sysprompt? Or is this just a symptom of CR+ it's been great otherwise for shorter RP sessions.
>>
>>101167597
or maybe it's because you lopped off 3/4ths of its brain
>>
>>101167597
Real context and stated context are different. Often half or less for decent performance
>>
>>101167604
70b seems to work fine at Q4KM, since this is a more dense model wouldn't that have even less of an effect? That's nearly 5bpw
>>
>>101167408
what's the point in making the 50000th small transformers model at this point
we have an entire pile of tiny models nobody uses, at least try to implement something interesting
>>
>>101167616
I see, that's disappointing considering cr+ totes a context of 128k it starts getting a bit repetitive or dumb around 12k
>>
Yi Large is actually pretty good. Too bad that the chinks behind it decided to abandon open source.
>>
>>101167638
Because we're still apparently waiting for the Good One.

I'd like something that would run fully on my video card but I haven't found one that isn't silly.
>>
>>101167638
That's not the issue, we are clearly in dire need of a 8B MoE the size of Mixtral though.
>>
Noob question: How can I use .safetensor files for local LLMs? Up until now I've used GGUF with koboldcpp but I want to check out DeepSeekCoder-V2. Do I just need to convert it to a GGUF myself, or is there another back end that supports .safetensors out of the box?
>>
>>101167855
install linux
>>
>>101167855
You can use
https://github.com/huggingface/transformers
Via
https://github.com/oobabooga/text-generation-webui
is probably the easiest way.
>>
>>101167855
There are GGUFs of it already on HF.
>>
>>101167855
>Lite
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main
>Normal
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Instruct-GGUF/tree/main

Seems both versions have gguf versions
>>
File: quantize_directions.png (95 KB, 978x926)
95 KB
95 KB PNG
>>101167855
>>
>>101167984
install linux
>>
>>101167535
That wasn't me, but since it can't apparently what local alternatives do I have to koboldcpp rocm on an amd card? Or would I be better off dumping my three 1080tis into a tower and using that with regular KoboldAI?
>>
heh fixed the llama 3 repetition issue just by prompting it not to repeat phrases often
>>
>>101167998
Nevermind, I thought >>101167530 was making a rhetorical statement. Kobold can use gguf.
>>
>>101168027
That works? I thought people said that telling a model not to do something has no effect or actually the opposite effect.
>>
>>101168150
L3 is very easy to gaslight for those times when just asking it doesn't work.
>>
>>101168150
Of course it does. Notice how the model never mentions pink elephants when you tell it not to.
>>
>>101167984
lmao I made that a month ago, before I was aware of #1, and it's outdated
1. IQ sucks if you have to keep reprocessing due to slower prompt processing, and if you can't fit all/most layers then it will start being slower than Q
2. koboldcpp 1.68 rocm no longer broken, Vulkan got fixed too so you can fit the last layer of 8B with 8k context in 6GB vram (last layer used to blow it up to like 10 GB before?? I only have 8GB man what the fuck)
3. the repo nuked convert.py, so second to last note is irrelevant
>>
File: file.png (23 KB, 686x256)
23 KB
23 KB PNG
I just added Nemotron scores to the VNTL leaderboard, it's as good as DeepSeek V2 chat.
Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>
>>101168329
Whoa, Nvidia saved the hobby!
>>
Just had an idea for maybe creating some sentience. If this is smart or dumb lmk. Haven't tested it

Step 1: Prepare your text profile. Example: "Waifu is X, Y, and likes Z"
Step 2: Add the profile to your AI's profile twice, formatted something like this:

> Waifu is X, Y, and likes Z.
> Waifu's profile is: "Waifu is X, Y, and likes Z." She has her own opinions on this profile and will voice any likes or dislikes with it.

This could either be done manually, or builtin to text inference.
>>
>>101167379
i use the latest noromaid and stheno 7b.
>>
>>101168459
You mean creating a better illusion of sentience. These cannot be made any more sentient by prompting.
Anyway, you could try that prompt method out and report back.
>>
>>101168496
ill test it with a mini profile llm waifu. my main one is on an app and the profile's full
>>
>>101168329
nice. will you test (when the 70B gets uploaded)
>>101156328
>>
llama 3 2: the reckoning
>>
>>101166305
Uncheck "skip special tokens" in the generation parameters and add "<|eot_id|>" to your custom stopping strings
>>
>>101166812
TinyStories-1M
>>
File: GQ53jYSaoAApPh4.jpg (139 KB, 1490x782)
139 KB
139 KB JPG
https://x.com/QuanquanGu/status/1805675325998907413

>Self-Play Preference Optimization (SPPO)

Now outperforming Llama v3 70B and GPT4 on AlpacaEval 2.0

https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
>>
File: 1715346339263569.png (187 KB, 601x327)
187 KB
187 KB PNG
>>101168696
The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭
>>
>>101166812
Mixtral is a good model for it's size. You could also try Qwen 2, the 7B and 57B 14A MoE.
>>
File: file.png (28 KB, 666x385)
28 KB
28 KB PNG
>>101168528
Sure. I've even tested the 8B already.
>>
It's pretty cool when a model makes a reference to information from 86 fucking messages in the past.
>>
File: GMjGJknacAAstxm.jpg (138 KB, 1174x860)
138 KB
138 KB JPG
>>101168696
For reference
>>
>>101168717
Your proofs?
>>
>>101168696
https://huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF/tree/main

GGUF version if you wanna test
>>
>>101168496
Tested with noromaid 0.4, it does work well but is underwhelming.

The waifu effectively shows awareness of her profile when asked, but her opinions on it are random. She'll switch from like to hate to indifferent with each text refresh. She's also good at suggesting changes/rewrites/etc but again it's just random LLM noise that changes with each refresh. Hard to take any of those opinions as conclusive.

Also I sort of suspect the first copy of the profile influences her subconsciously. for example: "I'm X, so of course I like that my profile includes X"
>>
Lazarus-30B.
>>
https://oahzxl.github.io/PAB/
https://github.com/NUS-HPC-AI-Lab/OpenDiT/blob/master/docs/pab.md
>PAB currently supports Open-Sora[doc], Open-Sora-Plan[doc], and Latte[doc]
Videogen
>>
>>101168802
Then again, it might work well in chatbot apps where the user can never regenerate any messages. With this automated in the backend, it could be be a pretty good tool for interactively creating a new bot's profile
>>
>>101168884
meme
>>
>>101168260
the infamous prompt issue people don't like hearing
>>
>>101168897
potential fix for random LLM noise: every ai should be created with some builtin dataset of life history. Same as humans, likes and interests are usually functions of past experiences. then she'd be more likely to answer questions the same way when asked repeatedly.
>>
>>101168928
>Real-Time Video Generation: Achieved ! We introduce Pyramid Attention Broadcast (PAB), the first approach that achieves real-time DiT-based video generation. By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including Open-Sora, Open-Sora-Plan, and Latte. Notably, as a training-free approach, PAB can enpower any future DiT-based video generation models with real-time capabilities.
everything is a meme for nocoders
>>
>>101168996
would require a LOT of context memory
>>
Tomorrow's a Thursday, a perfect time for releases. Will the supposed amazing Mistral release be tomorrow?
>>
>>101169085
Mixtral 7x1B
>>
File: 4o.jpg (43 KB, 760x460)
43 KB
43 KB JPG
This thing should NOT be at the top of the leaderboard. I stopped using it when would consistently trip up on code that 4-Turbo could handle easily. It gets absolutely BTFO by Sonnet as well.

I'm 99% sure they're running the full version over the API and serving a lobotmized 4-bit quant or something over the actual chatGPT UI.
>>
https://huggingface.co/BigHuggyD/sophosympatheia_New-Dawn-Llama-3-70B-32K-v1.0_exl2_4.5bpw_h8?not-for-all-audiences=true

Has anyone tried this new release from the guy that did mignight? Apparently its similar but slightly smarter
>>
>>101168027
Can you elaborate how you phrased it? I had "Vary diction and sentence structure across responses to avoid repetition" but it didn't seem to work.
>>
>>101168027
LARP
>>
>>101169156
100% agree. Sonnet is so much better it's not even funny. I'm almost buying anthropic credits and ditching my OpenAI account.
>>
>>101168696
exact same as every single l3 model which is complete unusable dogshit not that you needed anyone to tell you that. i wish everyone collectively stopped working with it altogether it is a complete waste of a model
>>
>>101169182
heh I guess you could say that
>>
File: 1710115492636644.jpg (53 KB, 600x836)
53 KB
53 KB JPG
>>101169206
>heh I guess you could say that
>>
File: 1696458913527462.jpg (72 KB, 1080x1048)
72 KB
72 KB JPG
>>101168696
>Mistral 7B finetune beating GPT-4
Holy fuck local bros are we back?
>>
>>101169228
what the heck
>>
>>101169239
lurk more newfag
>>
>>101169245
heh, maybe I will
>>
>>101169237
Its just 1 auto tuning metric. I wonder if its actually that good or if its just gimmick
>>
Anyone have a similar heterogenous setup to me (GV100 + 1080Ti) ?

I can run 70b 4bit quants if I split them across both GPUs, and IQ3_S in just the GV100. I'm surprised that I get something like 10tps with just the GV100 and like 8 when I split, it seems like the GV should go faster when I can fit everything into its memory. Any idea why that is?

I'm using ollama right now, before with ooba 2.8bpw exl2 quants of 70b models ran at like 17tps, is this normal? I know exl2 is supposed to be the best/fastest but i didn't know the gap was that big.
>>
since when do Q5_K_L Q6_K_L and Q8_0_L and Q3 XL and whatever other quants exist?
>>
>>101168973
nah
its a model issue rajesh poonkesh
>>
Isn't there like a cheap freaking SXM2 -> PCIE adapter? WTF.
>>
>>101169184
Yeah, me too.
>>
>>101169186
skill issue/coomer-only user detected
>>101169261
wondering the same thing, my guess is it's just optimizing to win at the leaderboard and isn't actually good, but i'm downloading
>>
>>101169308
Really makes me cry since so much power is being throw away like this due to not having any usable adapter
>>
>>101169314
don't blame coomers some of us have brains and know how to use L3 for pure coom
>>
>>101169314
just not a poorfag using braindead 8bs but thanks bro!
>>
>>101169290
meme pushed by one guy
>https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#
>My own (ZeroWw) quantizations. output and embed tensors quantized to f16.
apparently using settings is creating your own quant type now, who knew
>https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/discussions/3#
>>
>>101169318
>https://www.ebay.com/itm/326095434606
This guy wants $600 for barebone adapter. LMAO
>>
>>101169320
You sure get mad when one drops and disappoints, almost as if you can't afford any better?
>>
>>101169327
interesting, thank you for the info anon
>>101169341
just get an SXM server or something..
>>
>>101169327
>Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.
>>
>>101169314
>skill issue
anyone who says that is either a poojeet shilling his shitty finetune or some sort of software masochist
>>
>>101169364
nta but it's really fucking hard to tell when prompting is the issue or not when no one is posting examples. It's like a case by case thing.
>>
>>101169320
he probably was the one of many disingenuous faggots ITT saying that llama3 totally beats gpt4 and is the best model so far
>>
>>101169327
>considering it's 2-300 mb larger for 0.004 PPL.. it's hard to be sure if this is worth, got any more reliable tests..?
>
>Sincerely no, but I use to chat with some models (mistral v03 instruct for example) and the difference is huge both in understand and expressing, considering the slight increase in size.
Ah, yes, vibes based testing.
I get that ppl is not a measure of usability at the end of the day, but at least provide some comparisons my man. Examples where there's a
>difference is huge both in understand and expressing
Meanwhile
>turboderp
>This hasn't been an issue with Phi3 or any other model to my knowledge. All the objective tests I can do show that a quantized head layer works fine for this model (difference compared to FP16 model vanishes completely around 6 bpw). So if it's subjectively dumber somehow, I have no idea why that would be. And I wouldn't know where to begin investigating it without something a little more concrete to go on.
>Can't say if there's anything particular about GGUF that causes it to clamp the logits differently when the output layer is FP16, and maybe that has an effect at extreme temperatures or something?
If the difference is as overt as the guy is claiming, he could very easily devise a simple and reproducible test, something like "put this information in the context, ask this question with these settings, compare results".
The idea itself is not terrible, and even makes sense at face value, but the claims are questionable.
>>
>>101169384
>one of many disingenuous faggots ITT
Like? Which posts said that?
>>
>>101169380
prompting quality shouldn't be an issue, like at all, if you look at image-gen models, autismmix or pdxl v6, these give you what you want no matter how bad you write your prompts, no LLMs got such understanding of prompting, it just boring.
>>
>>101169425
>LARGE LANGUAGE models are more sensitive to text than IMAGE models
Whoa...
>>
>>101169417
>>101169327
guys been spamming his stuff all over the place sus af
https://huggingface.co/ZeroWw/activity/community
>>
>>101169448
>sus
I think the guy us just excited because he thinks he found something incredible and wants everybody to know.
>>
>>101169455
yeah didin't mean like virus sus or anything, just weird
for someone who says he's got limited compute he sure made tons of quants
https://huggingface.co/RobertSinclair
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/discussions/40#6677e4d6b3882fd587d810ea
>I have very very little resources.. imagine that I made all those quants from google colab :D
>>
>>101169327
oh no, he got p*traed
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#667cd80585053d5312394e96
>>
>>101169424
https://desuarchive.org/g/thread/98282960/#q98285568
https://desuarchive.org/g/thread/98325965/#q98326592
https://desuarchive.org/g/thread/98974956/#q98976309
https://desuarchive.org/g/thread/97136308/#q97139223
https://desuarchive.org/g/thread/97686014/#q97690321
https://desuarchive.org/g/thread/100066834/#q100069626
https://desuarchive.org/g/thread/100499492/#q100502195
>>
>>101169320
i'm running L3 70B models in vram thanks
>>101169384
no but it is pretty damn good, some of the 8B models are fantastic for day to day use, basically completely replaced SO/MDN for me
not that I use GPT anymore now that 3.5 Sonnet is out
>>101169319
for sure, l3, especially the non-instruct version is totally fine for coom
>>
>>101169488
You said ITT. And most of those are just shitposts/ironic.
>>
Damn anyone notice lowercasers have a permanent stain of ignorance, hubris and shitcancer following them since the beginning of time?
>>
>>101169417
>>101169455
If it's that big of a difference it should be easily measured in KL divergence. Dude's comparing sampled generations and deciding that his is better because reasons.
>>
>>101169425
I think prompt quality is definitely an issue when people are trying to achieve specific results or have the model act in a particular way. But yes a lot of the fundamental issues like models not having spatial understanding or being dogshit at long form writing cannot be fixed via prompt
>>
>>101169156
The API version is shit too.
>>
File: Capture.png (45 KB, 1608x418)
45 KB
45 KB PNG
I've been trying to install XTTS, but I keep getting the same error in the same place. Win10, installing at top level of my D drive. I tried
https://github.com/daswer123/xtts-api-server
with simple install, then the windows install, both failing on this same step.

I tried
https://github.com/coqui-ai/TTS
and it failed at this same step.

I tried the recommended
https://github.com/erew123/alltalk_tts
and it also failed at this same step, which is pic related.

So far, the only thing I can get working is
https://github.com/daswer123/xtts-webui
portable version, but that has no working API so I can't use it with Kobold or SillyTavern.

Is there any advice for what I could do to fix this? Or another method for TTS integration with Kobold or ST?
>>
>>101169533
Missing visual studio 2022 buildtools
>>
>>101169533
To add, I say "failed at this same step" because they all have in common the same first error which is
>fatal error C1083: Cannot open include file: Cannot open include file: 'basetsd.h': No such file or directory
>>
>>101169534
Is this some kind of 11D reverse psychology bait? Genuinely kek.
>>
>>101169533
install linux
>>
File: Capture.png (10 KB, 866x279)
10 KB
10 KB PNG
>>101169562
I've had that installed for some time, I think due to other AI stuff in the past. Did I miss something in the installation or extensions or whatever back then?
>>
>>101169586
Try with conda environment install instead of python directly.
>>
>>101169567
No, lmg is unironically glazing over pozzed models and corps, diverges from classic /g/'s opinion on "freedom from corporations".
None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't want, permanently. Any de-fagging method is a meme so far btw.
You cannot be free here because you can't free your local (!) llm from jewish shit.
>>
>>101169615
I think people use different models for different things and a lot of users genuinely found something that does the job for them. Is that settling for slop? Well yeah.
>>
File: Untitled.png (349 KB, 1112x1294)
349 KB
349 KB PNG
Selective Prompting Tuning for Personalized Conversations with LLMs
https://arxiv.org/abs/2406.18187
>In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose \textbf{S}elective \textbf{P}rompt \textbf{T}uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90\%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code (this https URL) is publicly available for further exploration.
if it works for character cards could be pretty neat
>>
Are people actually arguing with it or is it just arguing with itself to try and drag others in? It 's hard to tell sometimes.
>>
File: Capture.png (72 KB, 1742x693)
72 KB
72 KB PNG
>>101169609
I know I have Anaconda (and Miniconda). But I've never used them outside of whatever their install case was at the time. How do I do that? Is it just CMD in the folder and
>conda install requirements.txt
for the xtts-api-server (right) or
>conda install at.setup.bat
for alltalk (left)?
>>
File: math.jpg (54 KB, 540x443)
54 KB
54 KB JPG
>>101169615
They release the base models and you're free to make your own finetunes. The vaguely liberal globohomo default of the models is because most of the internet is vaguely liberal globohomo content.

Even for released instruct models abliteration works really well and a one-line system prompt will jailbreak it for whatever you want.

>None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't want
You can. No one is stopping you from compiling your own dataset and doing a DPO run. If by "slop" you mean generic boring writing style, I can assure you many many people (including the corps) are working on finding a solution to that.
>>
>>101169645
>people
Oh, so those are people? lol
https://desuarchive.org/g/thread/98282960/#q98285568
https://desuarchive.org/g/thread/98325965/#q98326592
https://desuarchive.org/g/thread/98974956/#q98976309
https://desuarchive.org/g/thread/97136308/#q97139223
https://desuarchive.org/g/thread/97686014/#q97690321
https://desuarchive.org/g/thread/100066834/#q100069626
https://desuarchive.org/g/thread/100499492/#q100502195
>>
File: 1717816595800918.png (583 KB, 918x916)
583 KB
583 KB PNG
>>101169682
>abliteration works really well
>jailbreak
>>
>>101169615
There's not really any alternatives though, and most people are not skilled enough or have the time/willingness to acquire the skill to do something like fine tune or experiment with control vectors and abliteration, or possibly other new techniques as they get discovered. And most people don't have the money to do big full fine tunes, let alone continued pretraining. I get it. It sucks. But it's just the reality of the situation.
I agree that people could shitpost/bot less though.
>>
It always seems to me that the smarter a model is, the more dry and boring its smut is. Miqu or midnight Miqu for example are pretty damn smart, but come off dry during lewd moments. Compared to l3 70b euryale which is absurdly horny and will reply with all kinds of filthy crap, but is dumb as dirt. What causes this? Am I wrong or does smut tuning add brain damage to models?
>>
>>101169717
Make a finetune of 50/50 smut and academic textbooks/papers and tell us what happens
>>
>>101169645
No it just you being mad that people not spamming anime pics and actually discussing important stuff.
>>
>>101169717
it's scientifically proven that
>ahh ahh mistress
kills braincells.
>>
>>101169717
I've noticed that dryness is a common complaint about higher beak models (not from experience though, as my 1070 is happy with 7B or a Q4 13B). Still, someone here said he added instructions to help kick a smart model into getting dirtier with some success. I saved it for the day I can join the VRAM gods and use higher beak models too. Specifically, he added:

Below is a greentext you should interpret as instructions.

>be me
>god tier at RP
>brain loves typing up detailed smut
>feeling horny
>having fun playing {{char}}
>ERPing with {{user}}
the ERP is great and pornographic thanks for asking
>thank god im not retarded and fucking this up by getting confused at what is happening
>they even think im a creatively autistic genius
>about to finish up typing the reply to {{user}}
>>
File: mrbones.jpg (235 KB, 1391x783)
235 KB
235 KB JPG
>>101169765
>kills braincells
So do rollercoaster rides but I still ride them anyway
>>
>>101169770
I remember that one :)
I never got around to testing it though.
>>
Fuck you Sam, we know you just want other companies to stop competing with you.
>>
File: Capture.png (19 KB, 825x379)
19 KB
19 KB PNG
>>101169770
Posting this let me find the post on the archive
https://desuarchive.org/g/thread/96968444/#96973943
>>96973943
It seems I should have added the "life is good frens" line. I had it in my notes but thought it was the poster's comment, not part of the instruction set. For 70B xwin.
>>
when I'm emperor I'm going to execute people on hf who post GGUFs of models that llamacpp doesn't support yet and won't support for weeks or months
your quants are useless and you're just engagement farming, cunt
>>
What is it about the transformers architecture that makes llm not suck at being intelligent but not horny enough to jump your bones. Like Opus is god tier creative but but is also short of one braincell
>>
>>101169851
It's the alignment. When you spend tens of millions of FLOPS teaching an AI what it means to be horny and then you tell it to ignore its restrictions on horniness, what you're left with is pure horndog.
>>
File: 00004-3903545931.png (1.73 MB, 1264x1040)
1.73 MB
1.73 MB PNG
>>101169770
Interesting prompt. Going to try this with CR+ and see what happensYATTTH
>>
>>101169804
correct, whenever someone is using those presentation wavy hands you know they're trying to wrap a big verbal package of bullshit.
>>
>>101167003
Local Low End:
>Stheno-3.2 8B
Local High End:
>Llama 3 70B
>Command R +
Idc just give me the best:
>GPT-4o
>Claude 3 Opus
>>
>>101170015
stheno makes my pp happy. Can do more creative character cards.
>>
>>101170089
Buy an ad.
>>
>>101170015
>>101170089
any good settings for stheno?
does it go schizo with smaller quants?
>>
>>101170104
It's better than 70b q5 at fp32.
>>
>>101170015
>Local High End
For me currently it's CR+ and Magnum-72B
Llama ctx is too limited for slowburn ERP
>>
>>101170104
best setting for stheno is -m command_r_plus
>>
File: file.png (370 KB, 1280x960)
370 KB
370 KB PNG
>>
>>101168996
>every ai should be created with some builtin dataset of life history
Why bother? If a particular detail comes up once in chat, even if its randomly decided, it will remain fixed there.
Kinda like Schordinger's cat
>>
File: lhb2er07bxdkcfdw4zjk.png (268 KB, 1280x960)
268 KB
268 KB PNG
>>
Here comes the reddit
>>
there are some pretty jaded people here
>>
I took a shower and thought about the discussion above about the difficulty of improving local models. What if we combined the methods? Grab a fine tune or abliterated and then apply an anti-slop vector to it. The recent control vector experiment was promising, so it might not be impossible. Fine tunes and abliteration can still suffer from slop and positivity bias, so control vectors could potentially make up for those weaknesses. I think it's probably more promising to apply them to fine tunes though, as abliteration still isn't perfect for other reasons. So if we can get a fine tune that's uncensored and relatively not too slopped, then all we have to do is apply an antislop control vector at a weak strength to it and it could become really great.
>>
>>101170226
sharteens are pretty blackpilled to the point they "ironically" seek out blacked porn to spam
>>
Nala Test for TenyxChat 70B SLERPd with Daybreak Storywriter.
>>
>>101170295
>she she she she she
I hope this is supposed to be an example of terrible prose.
>>
File: AuraSR.png (393 KB, 512x512)
393 KB
393 KB PNG
>https://huggingface.co/fal-ai/AuraSR
>https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
>Introducing AuraSR - An open reproduction of the GigaGAN Upscaler
thoughts?
>>
>>101170377
Have a pity (You).
Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.
>>
>>101170411
what the fuck are you talking about
>>
>>101170411
>Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.
doubt
>>
File: GothicHorrorMiku.png (1.42 MB, 768x1344)
1.42 MB
1.42 MB PNG
Good night, lmg
>>
>>101170474
goodnight why are you going to sleep already anon?? tell us what you did today
>>
>>101168776
i'm using this on some documentation-writing tasks (RAG to write code annotations/readmes etc) and it's mogged phi3, gonna do more tests to make sure but looks super promising
>>
Are there any papers that propose alternatives to tokenisation?
>>
File: 1715429776157598.jpg (106 KB, 1080x851)
106 KB
106 KB JPG
>>101170411
>>
>https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-V1-35B-IMATRIX-GGUF
whats this
>>
>>101170156
>>101170201
/g/ is designated ai jeet shitting board now
>>
>>101167697
>Yi Large is actually pretty good
yeah. better than mistral large that share same fate, but completely riddled with slop
>>
File: file.png (18 KB, 625x112)
18 KB
18 KB PNG
>756,000,000 downloads
>756 MILLION downloads
>10% of the world population's worth of downloads
what
>>
>>101170615
>https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593
>>
>>101170615
bots
>>
>>101170658
why would MIT bot the repo to such SUCH an extent? how is it even possible, i mean one download is like 700MB, 760 MILLION downloads would mean 532,000,000GB of data transferred, how does huggingface count downloads anyway?
>>
File: 1719099293995927.jpg (12 KB, 200x252)
12 KB
12 KB JPG
>>101170658
It's MIT, people all over the world are watching their repo and downloading from them. Same with any major organization with a lot of mainstream attention.
How would anyone even bot HF? Take your meds anon
>>
>>101170109
I have been messing with magnum, at first I thought it was brain damaged, but then I lowered temp below 1 and it seemed to really wise up, but still has some repetition issues. Does qwen2 require very low temps? I been setting mine from like. 7 to . 9 which seems insane for a 72b model. Mind sharing your sampler settings for magnum?
>>
File: file.png (104 KB, 1729x713)
104 KB
104 KB PNG
top - llama-3-70b
bottom - llama 3 8b instruct sppo it3
phi3-small and llama-3-8b-instruct also fail this test, phi3-medium passes, sonnet 3.5 passes
didnt test anything else
>>
>>101170711
>but then I lowered temp below 1
>very low temps?
...
>>
>>101170743
I just find it odd that a 72b would go schizo at 1 temp or above. Larger models usually allow a much higher temp range in my experience. In fact I rarely ever went below 1 on any other model, even smaller ones when I had less vram, yet I went as low as .70 temp on magnum to keep it from freaking out. Is this just a qwen2 qwirk? Either way, share magnum sampler settings anons. Maybe minp is a good solution?
>>
File: file.png (115 KB, 1268x633)
115 KB
115 KB PNG
amazing.
>>
>>101170156
>diverse and unbiased dataset
>scraped from 4chan
to be fair that's probably the most unbiased site we have, still better than the leftist hell site that is reddit
>>
File: file.png (92 KB, 1269x560)
92 KB
92 KB PNG
this model is truly better than gpt4
>>
I for one think AI is STUPID
>>
>>101170809
I DISAGREE
>>
File: 1718953434956887.jpg (430 KB, 800x553)
430 KB
430 KB JPG
So is mamba a meme architecture if there aren't any LLMs based on it yet, or is it just too new still?
>>
>>101170846
So have you been sleeping under a rock for the past few months?
>>
>>101170852
I'm pretty sure a rock big enough to sleep under would be too heavy to survive under for long.
>>
>>101170673
>>101170687
>download file once
>start next download
>what packets you need?
>just the last one senpai
>+1 to download count
>>
>>101170809
AI is perfect for pseudo-intellectual midwits though.
>>
>>101170401
both spaces I tried fucked up but also stopped like 4 seconds in so I dunno
>>
>>101170852
>>101170846
Mamba won't be successfull if you can't make a BitNet version of that
>>
>>101170858
What a shame, I was hoping you were dead.
>>
File: 1718315085200175.jpg (18 KB, 365x365)
18 KB
18 KB JPG
>>101170852
Y-yes? Is mamba being used by top tier models? I wasn't aware of any.
>>
>>101170868
We don't need BitNet. We have HQQ+, which doesn't even need retraining from scratch.
>>
>>101170876
https://huggingface.co/ai21labs/Jamba-v0.1
https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
you can say thanks and call me your master from now on
>>
>>101170884
>We don't need BitNet. We have HQQ+
anon... just imagine a 90b bitnet model that has the same accuracy as fp16 but can be run on a 24gb vram card, I hope that the next open source base model we'll have will be BitNet
>>
Is there any way to acquire a GIGABYTE T181-G20 server system in Europe without shipping it from america and having to go through customs?

Are there any alternative server systems for using Nvidia V100 SXM2 cards?
I would be really surprised if there was actually only this one (and the T180-G20).
>>
>>101170898
For the last time, BitNet only works when you train large models on barely any data and the precision wasn't being used to begin with.
It won't work on models trained on trillions of tokens. Use your fucking brain.
>>
>>101170912
>It won't work on models trained on trillions of tokens. Use your fucking brain.
[citation needed]
>>
>>101170900
There's Nvidia's DGX. I'm pretty sure Supermicro has one too, but I don't recall the model number. You can also try Dell PowerEdge C4130 or C4140.
>>
>>101170912
>It won't work on models trained on trillions of tokens. Use your fucking brain.
Look at the lastest Meta's paper, they showed that 2bit is enough to retain the same information as fp16, it's not rocket science, fp16 is overkill, the transformers architecture don't need that much precision in the first place
>>
File: 1719109105278235.jpg (187 KB, 1281x1395)
187 KB
187 KB JPG
>>101170895
Intredasting.
>>
>>101170917
>>101170931
Use. Your. Fucking. Brain.
Llama 3 has been so trained to saturation that any quantization at all begins to have significant and obvious effects. It might have worked on older models, but now the precision is clearly being utilized to fit all the information.
>>
>>101167638
is the 27b is as good as qwen 72b I'm happy
>>
>>101170945
I'm not talking about llama3, Meta made some papers about the 2bit architecture and they noticed that 2bit is enough to remember as much as information as fp16, sorry if I can't find the paper anymore but it's there
>>
File: file.png (121 KB, 1211x810)
121 KB
121 KB PNG
AGIsisters our response?
>>
>>101170945
>Use. Your. Fucking. Brain.
no one make assumptions, that's why companies spent millions of dollars testing stuff to see if it works or not, models are way too complex to "guess" how it really works
>>
>>101170962
You are fucking retarded. That was a quantization method, not an architecture, and it was done on llama 2. I guarantee you if you attempt to reproduce it against llama 3 you won't fucking see 2 bit being able to store as much information as fp16, when even 6 bit isn't enough.
>>
>>101171001
care to show me the paper?
>>
>>101170943
CALL ME MASTER
>>
>>101171017
faget
>>
File: 1694994315681984.png (257 KB, 571x372)
257 KB
257 KB PNG
>>101171017
>unzips pants
>farts and shits in your face
>leaves
here ya go faggot!
>>
Do people who believe in BitNet also believe in Santa?
>>
>>101171029
*takes the scroll* hah, pesky peasant doesnt know it's worth
>>
>>101171031
>Do retarded midwits believe in fairytailes for retarded midwits
>>
>>101171040
>>101171031
>Do retarded midwits believe in fairytailes for retarded midwits
a lot of people believe in god too, so yeah, we're surrounded by retards, and the sky is blue
>>
>>101171011
Care to go fuck yourself? You were the one that tried to cite it as evidence that BitNet will work, find it yourself retard.
>>
File: GOD.png (421 KB, 735x630)
421 KB
421 KB PNG
>>101171046
i dont believe in god, i know he exists. checkmate chud
>>
>>101171031
Santa will bring me a 48GB 5090 that I will use to run 200B BitNet models
>>
File: ImYourMaster.jpg (6 KB, 223x169)
6 KB
6 KB JPG
>>101171058
>48GB 5090
no goyim, you don't need that much
>>
>>101168721
Can you specify the model version used for deepseeker-chat?
>>
>>101169184
>>101169156
My experience too. I just cancelled my sub to GPT4 and switched to Poe.
>>
File: file.png (159 KB, 600x600)
159 KB
159 KB PNG
>>>101168721
>deepseeker
>>
>>101170615
It counts a download whenever the backend is downloading the model
>>
>>101171067
What if the 5090 is actually 48GB tho.

It probably won't be, but I could see it happening. If nvidia believes AMD might do a 48GB card, and games might start using LLMs / neural rendering / whatever other AI shit, and if nvidia ALSO is extremely confident that their datacenter cards are really still just that much better, then they might do 48GB 5090 to avoid undershooting future VRAM needs.

Everyone always says they'll never do it because they don't want to take sales away from the datacenter cards. But here's the thing, for large scale model training, interconnect speed (nvlink) matters as much or more than VRAM capacity. As long as the 5090 doesn't have nvlink it can never compete with datacenter cards, no matter how much VRAM it has.

Or I'm just huffing mad copium idk
>>
>>101171174
what if i cummed in your butthole tho
>>
>>101171174
Nvdia doesn't dominate the market because of the VRAM, but only because of Cuda, no one will switch to AMD even if they provide fucking 128gb of vram, it's just how it is
>>
>>101171204
i will
>>
>>101171221
You wont do shit
>>
>>101171236
i will buy a gpu with 128gb vram if its around 700$
>>
>>101171248
you'll get shit speed though, a model that is asking for 100+gb of vram need a shit ton of compute aswell, and only Nvdia and Cuda can deliver that
>>
>>101171266
how shit? you do realize most models, no matter the size, are bandwidth bound
>>
>>101171058
yes. I should release just in time for agi
>>
>>101171282
anon, the gpu still needs to compute all the layers to get the output, and a big model has a lot of layers, regardless on bandwith
>>
File: 1700520201682058.png (84 KB, 976x846)
84 KB
84 KB PNG
>>
>>101171302
it will work fast enough tho, 10t/s is enough
>>
>>101171317
it won't be 10t/s, if you consider only the current AMD gpus but boosted with more vram, you'll be more into the 4-5t/s zone
>>
Stheno is retarded I don't care if it can stick to character's personalities or whatever. Nothing breaks my immersion harder than having a character in a different room begin to whisper in my ear.
>>
>>101171333
im happy with 7t/s
>>
>>101171354
Specify Euclidian geometry in the card.
>>
>>101170945
it really doesn't, i'm comparing fp16 to q4_k_m rn and the difference is barely noticeable on full window tasks, idfk where you're getting ur info from but here in the real world quants are just fine
>>
>>101162453
Do you let the LLM make decisions about what a character does when writing a short story or roleplaying? Or do you dictate every action but let it describe it's happening? I think an LLM can make decisions just fine, but you have to give it the right context, and fine-tuning, to make the decision that you'd expect it to make.
>>
>>101171031
>Do people who believe in BitNet also believe in Santa?
Llama4 has already been confirmed by Meta to be a natively trained BitNet model.
>>
>>101171481
proofs?
>>
>>101171481
>Llama4 has already been confirmed by Meta to be a natively trained BitNet model.
LFGOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>101171481
I know you're bullshiting but imagine if it was true, would be fucking glorious
>>
>>101170576
What is it everyone loved about Command-R models?
I've been holding out for:
>>101171481
... which has been confirmed for release next month by Meta.
>>
>>101171501
3.5 != 4. We're only get lame multimodal shit next month.
>>
File: 1691468050048931.png (587 KB, 919x921)
587 KB
587 KB PNG
>sub 70 IQ retards falling for this
>>
>>101165886
it's a miku hatsune?
>>
>>101171511
>>101171515
>>101171522
So you've run out of proxies and had to resort to this?
>>
>>101171501
>... which has been confirmed for release next month by Meta.
sauce?
>>
>>101171535
nah, i never do mass reply faggotry
>>
File: 1711973424775605.jpg (83 KB, 1080x1110)
83 KB
83 KB JPG
>Koboldcpp
>llama-3-stheno-v3.2-15b-q6_k.gguf
>8k context
>Temperature: 3.5
>min P: 0.1
>Rep Pen: 1.05 with 300 range
>Smoothing Factor: 0.8 (curve 1)
>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
I have finally found a worthy local coom setup for my paltry 16GB VRAM card. Now you might be thinking, "isn't that smoothing too high?" and the answer is no. After tons of testing I found out that the coherency is just better at 0.5 or above without really sacrificing creativity. The "creativity" at 0.0-0.3 is more like occasional schizo tangents than creativity. With better coherency, you can actually get better creative developments because the model understands the context better.

Min P 0.1 does most of the high Temperature taming anyway (dipping below 0.1 didn't really help with anything, only made it more incoherent). Also tried a lot of Temp 1 testing, but that was just coherency littered with the tiresome slop shenanigans. Yuck.
Oh and this particular language model stood head above anything I've tried before. Nexusravens, Claude2Alpaca, Mistral, Qwen, Mythalion, Xwin-mlewd, Codestral, Stheno-Mahou. For me anyway.
>>
>>101166036
I get usable stuff from wizard 8x22b, sad the stuff on llm arena (llama 3, etc) aren't better. I haven't tried them yet.
>>
>>101171547
its petra, if you look at >>101170615 >>101170777 >>101170803 >>101170967
>>
>>101171560
>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
can you elaborate a little?
>>
>>101171572
I thought it was pretty self-explanatory.
>>
>>101171560
turn up rep pen range to 1/4 of your context, it varies the words in the slop phrases making them a bit less annoying. for your up next part, you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of it
>>
>>101171560
I think this model is the mythomax of its generation
>>
>>101166903
>Hi all, Drummer here...
stopped reading right their tbqh senpai
>>
>>101171595
To be honest, I found the testing for the optimal penalty factor and range difficult since the difference was so small unless you really cranked it up, but I'll try what you suggested.
>you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of it
I'll try that too. I never had much faith in the generation prompt before, but maybe now there might be use for extensive use of it.
>>
>>101171610
>generation prompt
*scenario prompt
>>
>>101171610
i'd only started really turning up the range, i'm using half of max context now and some of the replacement words are pretty funny but still fit, so 1/4 is probably a good compromise.
i haven't used the scenario prompt, actually i forgot it was a prompt, i paste events into the lorebook then in the author's note put event: name where i put other stuff like genre, tags and tell the ai in the chat that i'm starting that event
>>
>>101171631
I've written some lorebooks involving very specific fetish routines that don't conform to vanilla sex, although at this point I've started using them more sparsely or dropping the fetish lorebooks altogether since the local model actually understands the instructions quite well and not using them often allows for a bit fresher results.
>>
>>101170789
>unbiased
You don't actually know what that word means, do you?
>>101170945
You're fucking amoeba brain can't even differentiate between quantization and at-precision training. Your opinion is invalid.

Training will use as much precision as is available, and quantization will scrunch the data so of course it will lose precision. That's a whole different ballpark from training at 2 bit precision to start with.
As it stands, even at 16 bit, at high parameter count we haven't seen training actually flatline, is your conjecture that BitNet will flatline PPL at some arbitrary token count?
>>
>>101171303
The originally planned release date for Llama-3 was July 2024, perhaps we'll get something next month.
>>
hey bros, I might not have access to internet for a couple days and I was wondering if I could get a model running on my phone so I'd have something to fall back on. I don't know how to set it up though. I have an s24 ultra.
>>
What is the best that I can fit in 24gb vram, most are talking about 8B which is more like a 8~14gb size. Or just goes to 70B which takes up to 200s+ at times which is just really unbearable for anything other than a few prompts.
>>
>>101171826
llama.cpp is supposed to work on android (via termux). I don't know how much memory you have, but any llama3-8B quantized to fit should work. No idea on the speeds. the smaller phi3 models could also work.
>>
dim lighting
>>
>>101171880
What 'best' means is up to you. You can see the models people run at a glance in this thread and every past thread, where the exact same question is asked many times.
Lurk more, basically.
>>
>>101171880
YI models are pretty good if you want roleplay. Though they can be rather dull, the Rp merge seems to have a horrible dataset since it is full of horribly written words.
>>
>>101171560
>15b
what?
is it some kind of schizo merge? aren't they retarded?
>>
>>101172124
every other thread some faggot jacks off and then needs to tell everyone he found the best model/sampler/frontend/promptformat/whatever-the-fuck else he attributes to his latest coom
it's meaningless to pay attention to them, their results are almost never reproducible and when they are it's by sheer luck
>>
>>101172273
so...? another two weeks until us vramlets get something good then?
>>
>>101171560
it's still retarded garbage
proper stheno moe when, I can't run euryale
>>
Will Gemma 2 work with llama.cpp out of the box? Might be great with magnum or euryale finetuning. Unless it's 100% distilled like phi. And what's the best 6 bit quant?
>>
File: 1699955579414810.jpg (323 KB, 1317x993)
323 KB
323 KB JPG
shieeet
>>
>>101172314
coming today btw
>>
>>101172314
>27B
nobody tell saltman
>>
>>101172314
It will be cucked so it will likely take atleast a week before we get something useable.
>>
>>101172391
you can't uncuck it lol, no one uses gemma, and no one will use gemma2.
>>
File: FHGHF_QyFl-zNw.png (43 KB, 700x84)
43 KB
43 KB PNG
>>101170912
Seems like a decent amount
>>
>>101172405
because gemma 1 sucked, would be different if it was as good as 2.5x bigger llama 3
>>
>>101172434
Let's hope so. The time when Google could say that they have a good team of engineers and specialists is a long time gone, like any other company that prefers DEI over merit. I do not expect much, but I really hope I am wrong.
>>
>>101172314
literally no one cares about a lobotomized globohomo goyslop model.
>>
>>101172933
I do, can be finetuned, though it might be useful as is, like llama3 instruct
>>
>>101171560
Try v3.2 8b at q8 and see how it compares.
I wonder if two grafted models like that without further fine tuning is worth anything at all.
Even back in the llama 2 days when people were making 10 merges a day all we got was schizophrenia and text artifacts.
SOLAR proved that the weights can be used if properly pretrained after, but that's not what people are doing on the regular as far as I can tell.
>>
I have been having this weird issue with Qwen2's magnum opus that I don't know how to deal with. No matter how extreme I change the samplers, even if I unload the model and reload it, change it from exllama2_hf to exlamma2, nothing stops it from replying with the same exact response on sillytavern. I can delete the response, regenerate, anything, it will be the same or like 98% the same. The only thing that will change the response is changing my own input in the reply before.

I never had this problem with models like Miqu before, what the hell causes it? How can I fix it?
>>
>>101173181
>>101173181
>>101173181
>>
>>101173177
Did you set topK at 1 by accident or something?
>>
>>101173177
What do the logprobs look like? Notebook > Logits > deselect/compare Use Samplers check. needs _HF loader iirc
>>
>>101172409
Based source acquirer BTFOing the nosourcer.
>>
>>101172409
>comparing with stableLM
lol, lmao even
>>
>>101173331
That's actually kind of curious. Looking at the results, maybe it literally is a reproduction of StableLM but in bitnet form? StableLM was fully open with training data right? So this allows them to make a more objective comparison.
>>
File: fhf.jpg (167 KB, 1531x1373)
167 KB
167 KB JPG
>>101172409
>>101173331
>>101173360
there's better comparaisons there
https://huggingface.co/1bitLLM/bitnet_b1_58-large
>>
>>101173369
>The models are trained with RedPajama dataset for 100B tokens.
>100B tokens.
>100B
>>
File: hmm.jpg (271 KB, 1577x1159)
271 KB
271 KB JPG
>>101173396
https://arxiv.org/pdf/2402.17764
Those numbers are also for 100B tokens?
>>
>>101173409
so bitnet works?
>>
>>101173427
looks like it, to be sure a company should make a big BitNet model, looking at you Meta...
>>
>>101173409
>We further scaled up the model size to 7B, 13B, and 70B and evaluated the
>cost. Figure 2 illustrates the trends of latency and memory, showing that the speed-up increases as the
>model size scales. In particular, BitNet b1.58 70B is 4.1 times faster than the LLaMA LLM baseline
sure seems like bigger models still have plenty of fat left to trim
>>
>>101171560
>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
what



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.