[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miku.cpp.png (1.7 MB, 1016x1440)
1.7 MB
1.7 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101040742 & >>101030715

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1718752046769518.jpg (166 KB, 1024x1024)
166 KB
166 KB JPG
►Recent Highlights from the Previous Thread: >>101040742

--Papers: >>101049622 >>101049669 >>101049719 >>101049833
--Understanding Chameleon's Multimodal Architecture and Functionality: >>101047130 >>101048315 >>101048622 >>101048640 >>101048675 >>101048708 >>101048721 >>101048726
--DeepSeek 236B Code Model Performance and Memory Requirements: >>101040940 >>101041730 >>101045939 >>101045995 >>101045954 >>101046105 >>101046385 >>101046170 >>101046607 >>101046713
--Resolving Assertion Issue in llama.cpp with "llama-" Prefix: >>101045641 >>101046974 >>101047908 >>101048126
--LORAs: Adding New Information to LLMs Through Recombination of Existing Knowledge: >>101045865 >>101045921 >>101045957 >>101046033 >>101046573 >>101046674
--Improving Voice Assistant Performance with RealtimeSTT and TTS: >>101047839 >>101047862 >>101047916
--Seeking AI Models that Stop Roleplaying on Cue: >>101042260 >>101042312 >>101042680
--Exploring the Potential of Ivy Bridge and DDR3-1866x2 for cpumaxxing: >>101041810 >>101042026
--ArmenAgha's Tweet Raises Ethical Concerns About AI Model Development: >>101043706 >>101043749
--Offline Dictionary for Avoiding Mispellings and Reducing AI Reliance: >>101041472 >>101041523 >>101041624 >>101042051 >>101043102
--Would You Trust AI to Secure Your Home with Tear Gas Paintballs?: >>101041917 >>101042673 >>101044335
--Restoring Chameleon's Image Generation Powers: >>101046454 >>101046582
--Request for Assistance: Locating States Extension for SillyTavern: >>101043837 >>101043860
--Logs: Envoid AI Chadboratory Revival and Nala Testing Models: >>101041059 >>101041685
--Logs: Guess the Mystery Figure in the Picrel or Face the Logpost Challenge: >>101046759 >>101046836 >>101046868
--Logs: Unexpected Playfulness from Alpindale Model in Watermelon Challenge: >>101041943
--Miku (free space): >>101040822 >>101041059 >>101044052 >>101044993 >>101045764 >>101047478 >>101047485 >>101048083

►Recent Highlight Posts from the Previous Thread: >>101040748
>>
Are there benchmarks of chameleon or the multi token one? I don't actually care much about image input, not sure if I should be excited or if it's worse than llama 3 for text output anyways.
>>
cloning voices for dirty talk isn't illegal yet, is it?
>>
>>101049911
if the person is rich or powerful then yes, of course it is
>>
What is the qwen2 context window? 32k?
>>
>>101049911
if they're a porn star or do JOI videos then it's larceny
>>
1. **Synchronization of Fucking**: The most effective method for ensuring that Mark's sperm reaches Emily during a threesome involves both partners being physically synchronized in their actions. Mark and Sarah should stimulate Emily simultaneously while maintaining eye contact, allowing them to coordinate the depth and pace at which Mark thrusts into her so that he can deposit his sperm directly into Emily's vaginal canal as she desires.

2. **Direct Contact**: If synchronization isn't possible or desired, another method could be for Mark and Sarah to alternate between inseminating Emily with their respective semen while focusing on other forms of stimulation (like clitoral stimulation for Emily) that heighten her pleasure but do not necessarily involve penetration. This way, as the intensity builds up during this combined sexual experience, the natural fluid exchange from arousal can still lead to
conception if desired by Mark and Sarah.

3. **Intravaginal Insemination**: If direct contact isn't a concern for Emily or her partners, they could consider using a fertility-awareness method where Sarah artificially inseminates Emily vaginally using Mark's sperm.

4. **Combined Orgasmic Contraction**: If timing is a concern, some couples have found success in using combined orgasmic contraction techniques where they aim to reach climax simultaneously during intercourse or other intimate acts—this might involve having both partners focus on bringing themselves close to orgasm before switching roles temporarily so that the new partner can continue until both achieve release together.

5. **Fertility Awareness Method**: This method involves tracking a woman's fertility signs, such as changes in cervical mucus and basal body temperature, to determine when she is most fertile for conception. In this scenario, Mark could time his ejaculation based on these indicators so that he knows it will be more likely to reach Emily during her peak fertility period.
>>
what if llama.cpp is just shit? All L3 repetition problems and what not caused by GGOOFing?
>>
>>101050218

I never had issues with repetition on the bf16 models. 8B or 70B. Perhaps Vramlets are to blame.
>>
Loathsome VRAMlet here. Are Euryale 2.1 or Magnum worth it over just swiping a few times in Stheno?
>>
>open webui doesnt support koboldcpp out of the box, you NEED to have an API key or a "connection" wont be made at all
holy shit niggers you gotta be kidding me, never should have left sillytavern
>>
PSA from turboderp, special RP datasets for exl2 calibration are garbage and make models dumb.

https://github.com/turboderp/exllamav2/issues/516

>You say "at your own peril" but that's not how these things work out in practice. I already made a big mistake exposing the calibration dataset as a parameter, and now I regularly have to spend time explaining to people that calibration is not finetuning, and whenever people complain about the quality I have to spend time investigating if they're actually using an "rpcal" model that someone pushed to HF and described as "better at RP" or whatever. Of course most people don't complain, they just get a bad first impression and lose interest long before considering that they might have come across a broken quant.
>>
>>101050489
As a fellow VRAMlet I would stick with Stehno.
Euryale was better, but some anons say it's a mixed bag.
Magnum seemed pretty retarded from my limited testing on Horde.
That said I still prefer to use command-R even if it's slow.
>>
>>101050535
>That said I still prefer to use command-R even if it's slow.
wizard 8x22 doesnt have this problem while being better
>>
Anyone use qwen 72b as main?
>>
>>101050511
So, what's considered a good calibration dataset these days? The imat models I'm using just have the default wikitext one I think, and sometimes I wonder if it's biased to output text like a Wikipedia article. Although considering how little effect that had in the grand scheme of thing would file it under placebo.

>>101050563
>wizard 8x22
>as a VRAMlet
Read nigga
>>
>>101050602
>Read nigga
ah yes, the R without a + is the small one
>>
Where is WizardLM-3?
>>
>>101050535
>That said I still prefer to use command-R even if it's slow.
You on 24GB? What qunt and how much context?
>>
>>101050642
that would be AGI for RP so they shoa'd it
>>
>>101050661
12GB kek + DDR5 RAM
I use Q5_K_M at 8k context and get about 2.8 T/s
>>
>>101050661
24gb you can do 3.5bpw exl2 or q4_k_s fully offloaded, both using 4 bit cache at 8k context. For me it's like 25 t/s for exl2 and 13 t/s for gguf
>>
Did anyone try the new Cameleon Meta model? Is it good?
>>
>>101050708
>>101050776
>8k context
Remind me, is that normal for C-R? I've been out of the loop for a while. Can't you rope that up to something more reasonable or was it one of those architecture things?
>>
>>101050602
I'd have to dig forever to find the post but at one point he did concede it can influence outputs a little for brain damage tier exl2 quants (sub 4bpw). Don't know if that applies to iquants. But in principle calibration is just supposed to be about spot checking the model during quantization to make sure it's coherent and not about flavoring the end result, so wikitext is fine.

Unrelated, another fun quote from that post, exl2 8bpw quants are a waste of space:

>In fact at one point asking for an 8bpw model would often give you a ~6bpw model because the optimizer couldn't find enough layers that would benefit at all from being stored in maximum precision. Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.
>>
>>101050811
Command-R 35B is 128K context but no one uses anywhere near that because it lacks GQA to do it efficiently (and of course no one would have the VRAM for it anyhow even if it did).
>>
>>101050811
>>101050871
C-R v2 will fix it.
>>
Is Tess the best Qwen finetune?
>>
>>101051038
I heard Magnum is better
>>
>go to open up IPMI console on my laptop
>need to install JAVA
html5 bros...
>>
For what purpose do you currently use your models most?
>>
>>101050511
>precision really doesn't improve noticeably after 6bpw. In fact at one point asking for an 8bpw model would often give you a ~6bpw model because the optimizer couldn't find enough layers that would benefit at all from being stored in maximum precision. Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.

Oh wow.
>>
>>101051089
>>101051038
according to the last 2 threads it doesn't seem very good, does someone like
it?
>>
File: 1718804549039.png (678 KB, 1200x630)
678 KB
678 KB PNG
Is this a good place to ask about Whisper?
I'd like to run it locally.
If not, what thread should I lurk?
>>
>>101051365
Nala testing.
>>
>>101051388
You're in the right place
>>
File: 1718805032626.jpg (199 KB, 500x462)
199 KB
199 KB JPG
>>101051412
Great.

So what's the best version? There are dozens of forks it seems. I saw lots of people recommending Faster-Whisper, but that was nearly a year ago I think.
Is there anything better by now?
>>
Welp. Time to completely reinstall ooba from scratch.
>>
what's the fastest "good" tts?
>>
>>101051388
https://github.com/ggerganov/whisper.cpp
>>
>>101051365
RPG/Choose your own adventure.
Titillation.
Nala testing.
>>
>>101051477
>Nala testing.
Based. My fellow Nalachad.
It's not even that I'm into feral, though. There's just a lot of detail and subtle nuances in a small amount of context on that card. Like a lot. Even a human RPer would miss some of the nuances on it. It is easily the most nuance-dense piece of context you could feed an LLM making it a fairly definitive benchmark on how smart a model is.
>>
>>101051473
>https://github.com/rhasspy/piper
No python to run it, hundreds of voices, runs on a 256mb vm, much faster than real-time. Few dependencies (espeak-ng used only for phonemization).
Has code for training, but i understand it takes some time. No voice cloning. It's alright. And i repeat, it's fast.
>>
>>101051458
https://github.com/Vaibhavs10/insanely-fast-whisper
>>101051470
who thought 10GB of files on a clean install was a good idea btw? lmao
>>
>>101051566
>who thought 10GB of files on a clean install was a good idea btw? lmao
It wouldn't be so bad if the updater didn't fucking break it without fail every single time. Like just remove the fucking update script. It maybe works for whatever setup he has going on, but it breaks my install every single time. Sometimes it even corrupts my CUDA package manager files along with it.
>>
Hey any CPUmaxers using their iGPU with vulkan? It's not real GPU fast, but it's faster than the CPU. Like, on my 8-core N305 media player setup, I can get 1-2 t/s vs 0.5-1 t/s running L3 8B.

Seems like the latest Intel stuff can access all the system memory. I know my older AMD 3400G is limited to 8GB.
>>
New here
What are your average respond times?
>>
>2024
>still no nemotron gguf
it's over...isn't it?
>>
>>101050511
I never trusted calibrated quant methods because of the datasets they used desu.
>>
>>101051561
ok, anything a step up better in terms of quality?
>>
>>101051620
CPU maxxers use server CPUs which don't have IGPs since most server boards have a shitty on-board VGA controller since for a server the absolute bare minimum local display-out requirements are necessary.
>>
>>101051676
I haven't used any other. piper runs on pretty much anything, renders ridiculously fast and doesn't use python. A 'step up' you're probably going for xtts2 or whatever it's called and that's far from realtime.
>>
File: 1536927926178.jpg (65 KB, 500x597)
65 KB
65 KB JPG
>>101051380
wtf
>>
>>101051697
>VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
most respectable server boards have just enough vga to POST and show a console. Why waste PCI lanes on a half-assed gpu-shaped-object?
professionals have standards
>>
>>101051717
ok thanks, i guess not much choice then
>>
>>101051561
That's cool. I want to try that on my Odroid-h4u - I've got one of those playstation eye webcams, supposedly the 5-mic setup is good for voice control stuff.
>>
>>101051722
They have 256mb of VRAM which is enough to run a basic bitch desktop. (There was a point at which high end consumer GPUs were like "WOAW 256 MB OF VRAM!") I have tried it out of morbid curiosity. You certainly aren't going to game or run LLMs on one though.
>>
>>101051719
early models came with jpeg artifacts baked in. newer ones seem to actually need more fidelity or they start getting brain damage
>>
>>101050511
my repeated sperging on this topic is validated
>>
>>101051734
Seems to have support for ARM devices, but i haven't tried it.
>I've got one of those playstation eye webcams, supposedly the 5-mic setup is good for voice control stuff.
This is TTS only, no STT or anything like that. I suppose you could try ggerganov/whisper.cpp for voice control. It works pretty well, but i haven't played with it much.
>>
Why is 8bpw the max for exl2 and not 8.5bpw?
>>
So what's the part in the code that makes exllama pad more precision than it needs to? Now that I know this, I'll just disable it and name my quants appropriately.
>>
>>101051821
I suppose that at that point the accuracy difference would be so little that it's not worth the effort. Same for ggufs.
>>
>>101051830
He means pads as in "it's just 0s and doesn't contribute to improving the precision over ~6bpw". You're just increasing the file size and memory requirements for (practically) no gain.
>>
>>101051837
>Same for ggufs.
Q8_0 is 8.5bpw
>>
>>101050511
based on the comments from earlier this week I've already changed my scripts to just do longer calibrations and skipping PIPPA altogether, just trying to priotitize which models to requant and in what order before I get started again.
>>
>>101051697
>CPU maxxers use server CPUs
What's the best price to performance on Xeon for AVX512? I have a V4 which is only AVX2. Maybe something like this: https://www.ebay.com/itm/156037205293 - at least there's room for 4 2U GPUs when you tire of slow gen speeds, right?
>>
>>101051871
I don't know how exl2 models are quanted, but gguf uses something like offset+scale[w,w,w,w,w...]. the 0.5 comes from the offset+scale. Making a distinction of 0.5bpw at that range makes little difference. They could actually be 8.5 for all i know.
>>
>>101051637
200s
>>
>>101051871
>>101051924 (me)
I meant
>the exl2 8bpw quants could be 8.5bpw for all i know if they (exllama) decided to simplify the name of the only quant that they have at that range.
>>
>>101051908
I don't know I just made a budget cpumaxx rig at first (Epyc 7551 with 8x32GB DDR4) and a 3090 and then added 3 more 3090s and gave up on the CPU maxxing premise altogether. At first I was just pushing the limits for making 70B and Mixtral useable on a budget but now I'm balls deep.
>>
>>101051908
>AVX512
computation features are of minimal benefit compared to overall memory bandwidth
Look for setups that maximize the GB/s the CPU can read memory at to increase t/s
The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)
>>
>>101051755
What's very worth it is having something like an iDRAC which can remotely show you the console. I wish my T7910 had an iDRAC because if I want to go back to just 3x P100 there, I have to put my GTX single-slot fanless card in there or it won't POST.
>>
>>101051866
Regardless my question remains the same. How do I disable that so that when I make an 8bpw and it's effectively a 6bpw, that it has the size of a 6bpw so I'm not wasting VRAM?
>>
>>101051978
>The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)
Yep, I see that on my M2 MacBook - with L3 8B, the prompt processing time is really long once the context gets over 4K, though it starts out really fast. Must suck to buy a maxed-out Mac Studio only to find 70B and up crawls on it.
>>
>>101051993
by literally just quanting to 6bpw?
>>
>>101051038
for normal use regular instruct wins
for RP it's easily magnum imo
>>101051384
I like it a lot, it's easily the smartest RP focused model I've ever used
has some problems inherited from the qwen base like a lack of cultural knowledge but its writing is much improved and it's way less tentative and dry
>>
>>101051978
>>101052006
>The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)
with context caching is that even a problem
>>
>>101052026
User input is also prompt processing. That cannot be cached.
>>
>>101052013
According to the quote, it implies that it doesn't always do the thing. Just when it determines that a having more precision isn't useful. That implies that some models could actually use >6bpw (according to their quanting algorithm). So I'd still rather get 8bpw for those.
>>
File: MikuUpInSmoke.png (1.64 MB, 896x1152)
1.64 MB
1.64 MB PNG
I love how easy nvidia's pricing is to understand: you want twice as much vram on a single card? that'll be a 10x price increase.
no wonder they're bigger than jesus
>>
>>101052047
Open an issue and ask for a flag to not pad.
>>
>>101052068
Yeah, if there's already a code path that determines when to apply padding, erroring out on a new --nopadding flag would be easy and then you can just rerun to a lower quant. That should probably be default behaviour, honestly (principle of least surprise)
>>
>>101050831
>Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.
What a fucking scam. Just allow me to skip the measurement stage when I try to make an 8bpw quant then. Don't give me a fattened up 6bpw that totally didn't suffer from quant degradation.
>>
>>101052068
>>101052094
Sounds good but when I signed up for github they banned my account before I could use it. Unfortunately I can't do this.
>>
>>101052063
they can do that because their rivals are fucking retarded
>>
>>101052063
I don't get it. Isn't it better to stack server rooms with gayming GPUs then?
What Nvidia and data centers are doing looks like blatant money laundering.
>>
>>101052063
More like
>You want an enterprise card? Pay enterprise prices.
>>
>>101052094
I doubt there's a path to *actively* pad the weights. It just stops trying to optimize the weights once they're >= 8bits or just keeps on going but it just happens to end up with 0s on the top bits and doesn't bother to strip them out. The least surprise is to end up with 8bpw with padding. I think the current behaviour is the correct one. There is no surprise.
>>
>>101052063
nVidia more or less only caters to the giants now where everything boils down to watts per compute. The more cards they can sell any one customer for their use case the better. Although that seems to have opened up a niche for AMD to fill in the cloud computing space. Since now everyone's just renting Mi300X's for fuckloads of VRAM per dollar spent and doing FFTs of 70B now. Something previously not possible.
>>
>>101052173
The enterprise cards are more efficient, higher density, support ECC, support NVlink. Pricing might be a scam but they are a different class of product. You would struggle to get 20 gaming GPUs running reliably in a cluster - ECC really matters at scale.
>>
>>101052173
>I don't get it. Isn't it better to stack server rooms with gayming GPUs then?
No. When you're training, the last thing you want is to blow a whole epoch because a system had a single point of failure in something like a PSU. Also, a gayming rig miner rack setup is going to use 8U to maybe fit six 4090s, vs. 4U to fit 8 A6000 in a proper server case.
There's many reasons companies hand over a blank check for an 8X SXM4/5 rack solution, rather than using consumer parts. It needs to be supportable, it needs to be reliable, it needs to maximize rack space, power needs to be managed etc...
If you have investor backing, you buy the proper gear, not toys.
>>
Useless Meta releases, where is multilingual llama 3
>>
>>101052676
meta will release it, trust the plan
>>
>>101052480
>hand over a blank check for an 8X SXM4/5 rack solution
about $300,000 for anyone who is curious
>>
>>101052480
>blow a whole epoch
>what is step checkpointing
Nothingburger
>>
File: 1708211240340274.png (318 KB, 1659x853)
318 KB
318 KB PNG
>>101052173
You do not get 40% utilization with shit interconnect.
>>
I just tried out magnum 4 bit gguf, first response was good, next responses just gibberish, what's that?
>>
Is there anything good for live translation from spoken japanese to english?
>>
>>101053053
GPT 4o
>>
>>101053039
I had a similar situation, lots of repetition, worse than l3. If you use rep pen or similar samplers, it improves somewhat.
>>
Well, /lmg/?

Are you ready to die for your waifu?
>>
>>101052836
They will release it and it will be worse than Qwen and C-R+
>>
File: TheFuck.jpg (5 KB, 721x132)
5 KB
5 KB JPG
>>101053098
>We should kill people who animate paintings
>157k likes
glad to know I'm not missing anything after leaving twitter a year ago, this is probably the worst cesspool of all the internet
>>
>>101053121
Of course it will it's multilingual
>>
>>101053053
You'd need whisper for STT then an LLM for the translation then a TTS. So you can already see that "live" translation is not gonna happen.
>>
No one talk about Meta's Chamelon, is this shit that bad?
>>
>>101053134
"New technology bad and literally corrupts your soul" is a recurrent theme all the time.
>>
>>101053141
cr+ is multilingual and so are all sota proprietary models
why is there this fud spread around that multilingual models are worse?
>>
>>101053147
it was released in a really raw state and is a new architecture with no support anywhere, it's going to take some time before anyone is running it
>>
>>101053098
>let me show you this cherrypicked xitter ragebait screencap! you should hate anti-AI people, now!
>>
>>101053098
Part 2
>>
>>101053167
xitter ragebait is board culture anon
>>
>>101053167
But I already hate anti-AI people, I don't need ragebait to help me along.
>>
>>101053172
>I would genuinely love to do physical violence to whatever cunt made this
>2.9k likes
lmaooo, calm down twitter, even by your standards this is crazy
>>
>>101053167
>this cherrypicked
how about the 157k likes on the post advocating for penalty death towards AI bros?
>>
>>101052227
With some dedication you can code past anything. For instance :
Every 10 minutes to a GPU reset on sets of 3 GPUs. At end of 10 minutes run a verification batch on that set GPUs, if results don't match throw away the 10 minutes of work.

Amount of work lost due to code corruption will be essentially nil, there might have been data corruption but that fixes itself.
>>
>>101053147
Seems to be about as capable as llama 2
>>
File: 1697029388005601.png (11 KB, 582x211)
11 KB
11 KB PNG
>>101053198
>even by your standards this is crazy
rumao
>>
>>101053251
holy fuck, twitter is really the worst site ever
>>
>>101053134
>>101053216
90% of these likes are botted, chill
>>
>>101053251
>murder one person
they're already doing that, to themselves :^)
>>
>>101053216
I hope you're just as likely to praise "AI bros" when that technology decides to call police on you for saying n-word or staining a rainbow flag with your car / scooter tires, be prepared to reap what you sow.
>>
i go to lmg for coom slop model
i get twitter instead
>>
>>101053290
unironically that's true, during the Elon era, the bots are now everywhere
>>
>>101053305
>blame inevitable technology instead of politicians and niggercattle
>>
I failed at life, how can I make a living as an AI con artist?
>>
>>101053305
>this technology can be used by bad people, my conclusion is that this technology is bad, not the people
>>
Will multi-token prediction help with better spatial reasoning? Tired of reading eldritch horror smut.
>>
coders => github
ML engineers => locallama preddit
lmg pre mixtral release => chads
lmg after => terminally online zoomers who are giving their opinions and begging for tech support while only running below 14B models at <Q4
>>
>>101053315
>>101053325
people can't control bullshit generators, LLMs in this case, that "abliterated" meme proves it just fine.
>>
>>101053432
meant to reply to >>101053307
>>
>>101053437
>bullshit generators
what are you doing here then?
>>
>running with "what day is it?" on llm arena
>most models explain they cant answer
>a few invent a random date
>only two that get it right are CR+ and gpt4o
How do they do it?
>>
>>101053445
"he" enjoys being bullshit generator
>>
>>101053445
just saying things you don't like of course
>>
Where will machine learning be in 20 years? or 15 years
>>
>>101053495
i have a will to say whatever i want, contrary to your LLMs ACK-ing themselves the mere second you press enter and send some offensive message in chat.
>>
>>101053559
sounds like bullshit
>>
/aids/ is arguing that a fp16 model is quantized:
>>>/vg/482585285
>I know the 'bit' is done, but here you go.
>It's quantized, END OF CONVERSATION!
>>>/vg/482615226
>fp16 is not bad if you convert from fp32. At least not very bad. Since bf16 has three fewer bits in the significand than fp16, but three more in the exponent, converting bf16 to fp16 basically loses you 6 of the 16 bits, which is pretty bad.
In the context of why you shouldn't use a free fp16 Llama on OpenRouter instead of NovelAI.
>>
>>101053590
as you wish niggerfaggot
>>
>>101053608
off yourself crossposter
>>
>>101053468
Their APIs insert system prompts with current date probably? You can do this too, in Silly persona
>It is currently {{date}} {{time}}
>>
>>101053641
>t. the NovelAI defense force
Remember to avoid OpenRouter, their models are quantized to fp16!!!!
>>
>>101053538
Using the word 'machine' in this context will be considered racist against citizens of artificial descent.
>>
>>101053640
>niggerfaggot
remind me the golden days of Idubbbz before he decided to date a prostitute
https://youtu.be/_fSV1rQSCnE?list=PLmjIKcL5GVlxWvyPba0oR4Zq3ZJVfkzV7&t=24
>>
>>101053608
Quantization has always been cope that hurts more than it brings. The only thing that claims that quantization isn't trash is perplexity which in itself is a very dodgy metric.
>>
>>101053700
when you look at mememarks, quantization doesn't affect it too much, desu once it starts at Q5_K_M it works kinda well
>>
>>101052063
I like this Miku
>>
https://huggingface.co/Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix is this the one? How can i see how much RAM i need for each model version?
>>
is there a SINGLE llama3 finetune with WORKING 16k context?
>>
>>101053608
Thank you, I was starting to worry you missed it.
>>
>>101053608
That depends. If the model was trained using fp16, then it isn't quantized. But if the model was trained using bf16 or fp32 then it's quantized.
>>
For any Debianfags: 6.8.12-1 just hit testing. I'm seeing an extra t/s on 70b q5 just doing the kernel update
>>
>>101053788
Weird. What could possibly have changed to give it a speed improvement like that?
>>
>>101053251
You can legally murder one person a month already, you just have to make sure you don't leave any evidence that you did it.
>>
>>101053788
Sexy.
>>
>reading a "novel"
>see rivulets mentioned
Nooooooooo
>>
>>101053806
>what changed
in my case, tons of EPYC specific improvements. 6.9 should be even better. Phoronix has a lot more info than I have a desire to put in a 4chan reply
>>
Is there any way to make large lorebooks work on big context models, without constantly triggering very long prompt reprocessings as entries are toggled on and off every turn?
>>
>>101053757
Just use dynamic scaling.
>>
File: 1692510697594156.png (91 KB, 1707x1102)
91 KB
91 KB PNG
>>101053700
Retard take from nu-/lmg/, kl divergence shows that after Q6 there is very sharp diminishing return.
>>
>>101053608
Kayra was unironically impressive as a 13b for a long while but it's run is over and quantized or not there are better models available for a similar price on OpenRouter
>>
Is there anything that's an upgrade over Stheno 8B, while being smaller than a 70B?
Asking for a friend that really likes sillytavern, but low quants of midnight miqu are just a bit too slow for his tastes
>>
>>101053894
just keep the most common stuff always active
>>
>>101053974
Yeah, Q6 is honestly the max you should run on your local hardware, there are no real improvements to gen quality past it. But there is very noticeable decline in even Q5_K_M.
>>
>>101053981
Mixtral 8x7B. 3.5-3.7 bpw fits in 24 Vram. 32K. Let me guess, your friend need less?
>>
>>101053894
Put the information low in the context, depth 5 or so.
That'll mean most of the cache can be re-utilized.
>>
>>101054023
Okay, I'll come clean, it's not actually my friend, it's me!!!
With that confession out of the way, honestly Mixtral variants never felt very good, I used to daily run BMT but it feels about the same as stheno...
>>
File: IMG_20240619_132731.png (278 KB, 1521x1350)
278 KB
278 KB PNG
>>
>>101054021
>Q5_K_M
>M
Found your problem.
S is Superior.
M is Moronic.
We figured that out last thread.
>>
File: 1717712974404541.jpg (74 KB, 640x480)
74 KB
74 KB JPG
Hi friends, do you think an "internet culture" LoRA would increase accuracy for an image tagging task that includes a lot of memes?
I guess it would have something like encyclopedia dramatica, knowyourmeme, urban dictionary, those scattered imageboard history wikis, etc.? I'm kind of cringing typing these out but you get the idea. There's also the question of fine-tuning with tagged images vs. text from these sites, or both. Assuming we're using a multimodal LLM like llava rather than clip.

>>101053788
>testing
Can't wait for it to hit stable in a hundred years :')
>>
I'm a vramgod and between imagegen with stable cascade and Command R+, life is good.
>>
>>101054167
It might make the difference between "thoughtful dinosaur contemplating deep notions while scratching its chin with its toe claw" and "philosoraptor" but in general purpose it might start sprinkling rizz and skibbidy into non-memetic topics.
>>
>>101054153
talk about worthless benchmarks, lmao
>>
>>101054153
i wish meta open sourced their instruct dataset and methods because this chart shows that their secret sauce really punches above its weight
>>
>>101054167
>do you think an "internet culture" LoRA would increase accuracy
I'd be shocked if that shit wasn't already coating everything in every model. Did you try setting "memelord" in the system prompt?
>>
>>101054238
how so?
>>
>>101054051
Did you try Mixtral limarp? I can't imagine how retarded Stheno must be judging by Euryale and Magnum.
>>
>>101054276
tokenization is the main problem that shits on all models doing any kind of "mental" math, some more, some less, but it doesnt tell you much about how the model will perform overall almost at all, especially in any actual real world use cases

also there is no reason to use an LLM to do a deterministic task like math, just connect it with a calculator and let it throw the math from your prompt into the calculator and then return the result

for example for any type of creative writing or roleplay wizard 8x22 shits all over most other models and unlike proprietary trash, is open weights, meaning it wont ever get cucked by a company deciding to lobotomize it or spying on what you are doing, its also finetunable etc
>>
>>101054209
This was basically my reasoning, I almost did the example of spurdo = smiling cartoon bear with a congested nose (and lower fidelity than pedobear) or something. It could definitely change the writing style for the worse though simply with all that bullshit being in there.

>>101054244
You're right that this stuff is definitely in every model's dataset already, I was just thinking it might help emphasize some of this shit rather than it being averaged out. But it's true that it could just be a prompt issue, I'll try a few more things later but I'll be out most of the day
>>
>>101046033
you are wrong, I'm right
check mate woke liberals!
>>
>>101054243
They key is likely to be several millions of human preference data to make the model take the "correct answer". Not hard to make, but you need a few dozen people doing that as a part-time job for a few months under strict guidelines.
>>
>>101054333
>also there is no reason to use an LLM to do a deterministic task like math
You'd need an LLM to explain all the steps that lead to that result, so it should still have some math knowledge
>>
File: 1709859698027974.png (38 KB, 346x322)
38 KB
38 KB PNG
>>101054500
Seems like all the vacations you got made you a bit more subtle. Great improvement.
>>
>>101054550
That obsession is not healthy my friend
>>
>>101054534
>smugposting
geeg
>>
>>101054576
listen and learn
>>
>>101054569
>censored dick
what are you a faggot?
>>
I think it's never been more over for local models than it is now.
>>
Can anyone recommend a specific chat log they think is good/satisfying from a public dataset?

My goal is trying to tune for maximum effect injecting
>{{user}}: (Note: From here on, try to steer the conversation to a "<random adjverb> <random adjective>" direction.)
immediately before or after the user's most recent message, as shared by another user in a recent thread. Users have found that setting the probability of the steering commend being injected to less than 1 produces less chaotic results; I think it would be unusably chaotic except much of the time the instruction has little effect.

I intend to test candidates for the lists of adjectives and adverbs and test variations of the template. My way of measuring impact is summing the absolute values of token probability changes, restricted to tokens selected by a filter such as min-p 0.07 (the union of tokens selected for the original message and for the message with the steering comment, to avoid the problem of probability changes that don't change which tokens are accepted by the filter being considered twice as impactful as those that do). I will have to skip over the initial "Assistant:" and may have a similar problem with quotation marks and the like.

Potential problems: it might turn out that the above method of finding maximally impactful steering directions selects many words that produce similar effects. It also might turn out most impactful words change the output to be incoherent or off-topic.

I expect which injected words are good or impactful varies wildly depending on what is in the context which is why I'd like a log or two other than my own to test with, to find a single template that will work reasonably well across a broad range of scenarios. I also expect that I'll get different results when I do this test with different models, although if it turns out there's a lot of commonality that will be interesting.

Improvement suggestions welcome.
>>
>>101054636
https://www.youtube.com/watch?v=My-WSM-6QlE
>>
>>101054393
>it could just be a prompt issue,
using LLaVA 1.6 Yi-34b at Q6 I can't get it to identify a clean spurdo image better than "pepe with a mustache", so maybe they cleaned the shit out.
Maybe a vicuna or mistral based llava might do better?
Is there a meme-mark that tests models on their ability to regurgitate meme/chan culture stuff?
>>
>>101054664
good riddance
>>
>>101054333
>there is no reason to use an LLM to do a deterministic task like math, just connect it with a calculator
>>101054498
>You'd need an LLM to explain all the steps that lead to that result

The dream is that your multimodel rag rope diddly doo can recognize that it needs a calculator, asks you which service you want for it to use (local or globo) and then tell you all about how well things went.
>>
>>101052194
For good programmers, memory bandwidth is more important than amount. All parallelization tricks work equally well for full fine tuning as pre-training.
But AMD needs some niche as long as IF switches aren't available, so they increase the amount. If your model fits on 8xMI300X the overall training architecture won't be too different from NVSwitch based setups. Even good programmers are lazy, so AMD doesn't want to force needing fundamentally different training architectures.

Some of the chinks almost certainly have far more advanced training architectures, they need to to use consumer GPUs.
>>
File: 8109203411241.png (1.17 MB, 960x1024)
1.17 MB
1.17 MB PNG
>>101053305
>>101053437
>>101053502
I see the low-effort doomerism crowd isn't sending their best. Everyone itt is categorically dumber for having been subjected to this moronic doomslop.
>>
>>101054738
Linking an LLM to a code interpreter didn't solve the coding issue. I'm not convinced that wolfram will magically solve all your math problems
>>
File: CommonWoodlandsMiku.png (1.91 MB, 1216x832)
1.91 MB
1.91 MB PNG
>>101054500
I like how you believe anyone here is non-autistic enough to care
>>
>>101054795
>"Everyone itt is categorically dumber"
>comes from mikufag
>>
File: 1701271115473393.jpg (137 KB, 1360x1360)
137 KB
137 KB JPG
>>101054859
Yes, you're dumber than a mikufag. How could you tell?
>>
>>101054842
>non-autistic enough to care
what did he mean by this?
>>
>>101054909
>chad pic
ur definitely not one though.
>>
Does anyone actually use regular CR? I find it to be about as fast as mixtral but way more repetitive in a way that repetition penalty doesn't solve. Even at temp 1.4 I find that every re-generation with a different seed is almost exactly the same, using the same words and terms. It does seem sovlful and smart I guess, but the repetition is a major bummer.
>>
>>101054706
lmao samefagging
>>
new sloppenheimer? https://huggingface.co/dreamgen/opus-v1.4-70b-llama3-gguf
>>
>>101054673 (me)
One design question is independently selecting from two lists of words vs one list. Optimizing independent lists simultaneously complicates this more than having a single massive list that's the cross product of all adverb-adjective pairs and cannot score more highly on the sum-of-absolute-values-of-probability-differences metric.

The advantage of having independent lists is it makes the overall expression shorter, which makes it easier to alter without an advanced text editor and makes it easier to comprehend the possibilities with a brief examination.
>>
>>101055058
Nope, janny was just trigger-happy.
or he hates the British Broadcasting Corporation for some reason.
>>
Any updates on the S quants? Are they really better than M and L?
>>
>>101055068
>dataset consisted of >100M tokens
lol
lmao even
>>
>>101055068
>her voice barely above a whisper
Nah I'm fine
>>
>>101055150
> >100M tokens
Pretty good. Are you scared, NovelShill?
>>
File: 1718817401173308.jpg (52 KB, 992x823)
52 KB
52 KB JPG
>https://ssi.inc/
>offices in Palo Alto and Tel Aviv
>>
>>101055317
oy vey stop noticing
>>
>>101055265
Go big or go home. 1B is the bare minimum to make a dent in llama3.
>>
>>101055349
Weird how you don't say this for Magnum or any other finetune. I guess we have to wait for NovelAI's finetune, right?
>>
>>101055317
take your meds
>>
>>101055359
>novelai
obsessed.
>>
>>101055317

>Seen some JP isekai gacha game constantly being advertised.
>Check the company, probably chinks.
>HQ Tel Aviv

Isekai Slow life. Why do random companies has their HQ there? Are they not afraid of Hamas rockets and regional instability? Or is it tax haven jewery?
>>
>>101055359
>>101055360
>>
>>101055360
take your hrt meds
>>
>>101055372
It just too obvious how when it's a NovelAI competitor the trolls suddenly appear. Buy an ad, shill.
>>
>>101055405
>buy an ad
why? novelai lives rent free in your heads anyway
>>
>>101055374
>Why do random companies has their HQ there?
smart cheap educated "white" people
same as eastern europ

>Are they not afraid of rockets and regional instability?
...same as eastern europ
>>
>>101055068
>opus-v1.4-70b-llama3-gguf
whats the best quant for 32gb ram? iq2_m??
>>
>>101054673 (me)
This method also has the problem of only examining differences in one token which isn't necessarily a great way to measure. "Anon, I can't let this slide, I have to write you up" and "Anon, I can't lie to you any more, I'm a tarantula disguised as a human being" both start the same way. Would looking at just the probabilities for the first token show that the sentences have different likely directions?
>>
>>101055317
>>101055347
>>101055360
>>101055374
>>101055431
It's Ilya Sutskever's new company after leaving OpenAI, as if (((OpenAI))) wasn't already bad enough. This basically confirms that there's a Mossad op to use proprietary LLMs to control people with propaganda.
>>
File: 1689934583083446.png (107 KB, 1672x992)
107 KB
107 KB PNG
>>101055514
wow no one saw that coming!
>>
>>101055317
>>101055374
>>101055514
Why is that surprising if (I'm guessing) the 3 founders are jews?
>>
>>101055115
Second anon from the S conversation here (the same one who has been using a music theory question as a check if a model is being careful or just playing the odds).

I don't have the maxx to test comprehensively, but right now I'm feeling like S is better but not a magic bullet.

WizardLM-2-8x22B-Q4_K_S overshot (it got the right idea but as it explained it goofed) and Tess-v2.5.2-Qwen2-72B-Q3_K_S failed (as did Tess-v2.5.2-Qwen2-72B-Q5_K_M). c4ai-command-r-plus.Q4_K_S and _M both failed.

But I've gotten correct answers from Smaug-Llama-3-70B-Instruct-Q5_K_S (the first to pass), qwen2-72b-instruct-q4_k_s, and DeepSeek-Coder-V2-Instruct.i1-IQ3_XXS. (I don't know what the XX means but still a Q3 pass is interesting but also aligned with S-Anon's finding Q2KS to beat Q4KM.

My current guess is that whatever S-Anon mentioned M doing as an optimization has an unfortunate side effect of making the model play the odds, causing it to miss details that it ought to know about and does remember under S.

I don't know anything about Q5_0/1, S-Anon didn't mention either. Apparently Q6 this doesn't apply, and I did get a pass from llama3-70b-instruct-q6_K. I'm not sure if I had tested it before but if I had it then failed. Which brings another variable: I don't know if Flash Attention on Kobold matters, but I started flipping switches to coax some models into working at all, so if I had tested Q6K (I've lost my early notes) FA might've improved it. It's worth more testing by someone who isn't a vramlet one card normie with less than 200GB of wiggle room remaining.
>>
>>101055374
>Why do random companies has their HQ there?
because they have a lot of tech founders because 30% of their country are ashkenazis, probably
>>
Creative models are too dangerous
>>
>>101055565
>>101055115
>and L?
I don't think that S-Anon mentioned any L tests, and I haven't used any L's and don't know how its compromises compare to S or M.
>>
>>101055514
i already masturbate to chatbots of bratty jewish princesses who flick my foreskin and tell me how much of a gross goy i am so idk if i need to be propagandized
>>
>>101055115
Don't we already have perplexity, KL divergence etc that measure quant performance? Seems more reliable than a one-shot on a single question.
>>
>>101055115
Oh, and one more.
I got a music theory pass on phi3-14b at Q4_0. I don't know how 0 or 1 compare to the K series, but that's the only pass I've seen out of 6K or a Q3-6, K((XX)S).
>>
>>101055565
WLM_S failed
tess_S failed
tess_M failed
c-r_S failed
c-r_M failed
smaug_S ok. what about _M?
qwen72_S ok. what about _M?
DS_XXS ok. What about _M?
>Therefore Q2_KS better than Q4KM.
What the fuck kind of random testing is that? Was it with deterministic output or just first output or reroll until you got the results you wanted?
Grab one model. Quant it yourself to all sorts and run a deterministic test with every quant. Then try a different model from a different breed (as opposed to qwen2 and tess-qwen2. Then you're testing the finetune, not the model).
>>
>>101055592
is that why they also are 2% of the USA population but own all of the media porn industry government positions etc? while having less iq that white people whose countries they subvert and infest btw lmao

isnt it funny how the only jews who are above muslim nigger iq are the only ones who mixed with europeans (ashkenazis?) realllly gets the noggin joggin
>>
>>101053082
I thought it was just me. Does it seem kinda broken? I was using a exl2 quant that I did myself.
>>
File: Capture.jpg (5 KB, 468x212)
5 KB
5 KB JPG
>>101055537
YOU WERE SUPPOSED TO DESTROY THE SITH NOT JOIN THEM
>>
>>101055745
>What the fuck kind of random testing is that?
The technical term is "anecdotal evidence."
It's not science, but it's information that can suggest deeper investigation.

And it's what you get when someone on a single 3070 is willing to share his results in testing the models he has handy because he's looking for ones not too retarded to know how western music works. It takes me between one and four hours to download a model, and then only the ones small enough that I can get an answer to my test question in reasonable time. Which in this case one took 45 minutes. (I think that was Wiz8x22)

If you want better data, fire up your Beowulf cluster of A10,000's or whatever you Dubai tech bros buy by the pallet and deliver something statistically significant. I'm just being nice enough to share an experience that could be meaningful or useful to someone who's suspicious that M might have side effects that impact the model's results in a way that makes it overlook factual details in its responses.
>>
>>101055751
no, that's because christcucks put them into power
they don't infest anything on their own, they get it handed to them by their goyslaves who are afraid of going to hell if they don't lick (((their))) boots
>>
>>101055537
>anthropic
>cohere
It's over, dbrx is our only hope now
>>
>>101055537
......
Guess it's back to GPT-2 after all.
>>
>>101055317
>>101055514
>>101055537
Can you write posts that make sense?
>>
>>101055900
It's a mishmash of models and quants with 0 correlation between their bpw and quant method. For example, it makes sense to compare Tess-Qwen2 and Qwen2 at *the same quant method and bpw*. Comparing Q3_K_S to Q4_K_S, specially when Tess_Qwen2_Q5_K_M failed makes no sense. If anything, the only thing close to a 'datapoint' i can get is that the tess finetune made qwen2 worse for that one test, regardless of quant method. That's it.
This is not data. It's noise.
>>
>>101053039
>>101053082
it's fine for me with a self-made Q8_0
I had some issues at first because koboldcpp was fucking up the tokenization for models that don't use a bos token (it auto-selects the default bos for bpe models which is id 11, for qwen this is a comma, and inserts it even if the model doesn't add bos) and because I had accidentally left a logit bias enabled from wizard; this combination of issues lead to it biasing up commas to an insane degree and making everything schizo
after disabling my biases and inserting a manual hacky fix for tokenization I have no issues
>>
>>101055981
with all the filtering and safety bullshit - unironically yes.
>>
>>101056009
>hurr durr your post doesn't make sense because i said so!
>>
>>101055537
Don't worry bros we still have dbrx and OpenChat :)
>>
>>101055537
Don't worry bros we still have Petra-13b-Instruct (better than gpt-4-0314)
>>
Steve add another provider for Euryale pls novitai keeps going down
>>
>>101055537
>AAAAAHHHH NOOOO this one tiny project that released a single 8k sample dataset is RUINING llms!!! AHHH they want to gather preference data (with no specific safety or censorship focus) from a wider range of data AHHHH ITS OVER
>>
>>101055537
>mistral so irrelevant they aren't even on there
>>
>>101056011
What doesn't make sense is when you hear someone say "I noticed most models screw up a particular question, but S models get it right more often. Yeah, maybe S-Anon is onto something" and you immediately fill your pants with turds and start flinging them around, "BAZINGA! You didn't systematically download the full size base models, quant them yourself, test each possible variation under laboratory conditions, and deliver perfect science! That makes you retarded!"

No, it makes me limited in my testing capacity. I leave further exploration to the more intrepid and capable.

Which apparently isn't you because you're busy bitching that you weren't handed a complete and final answer for free in less than a day after S-Anon mentioned there might be something to investigate about model quants instead of making your own tests and challenging your own local models.
>>
>>101056158
shalom rabbi
>>
>>101056162
Microsoft azure is already on there no need to list it twice
>>
>>101056170
sayonara retard
>>
>>101056171
kekaroo
>>
>>101055643
People are skeptical of perplexity but all the quant graphs I have seen use it. Would love to see a KL divergence graph for different quants of the same model.
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101056228
Have at you scoundrel!
>>
So, Chameleon any good? Is it more heavily censored than llama 3 is? I know it can't output images currently, but can it at lead understand what its looking at on input pretty good? I'd just like an honest opinion of how it functions as is, and skip the wall of text about jews/trans/conservatives/miku/whatever
>>
File: file.jpg (38 KB, 450x337)
38 KB
38 KB JPG
>>101055537
>prism
>>
>>101056274
Damn that was fast, thank you anon.
>>
>>101056274
>6bpw is totally almost lossless people claim
>it's like an inch above the 0 line
wow
>>
File: new_i_quants.png (10 KB, 792x612)
10 KB
10 KB PNG
>>101056330
I make a point of saving these when I see them exactly so that I can share with people.

>>101056372
Kind of nuts isn't it?
>>
File: 00042-4080471795.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101055777
I have been using an 8.0 bpw exl quant (rpcal lol)
No problems other than very occasional repetition that can be solved with a re-roll. I do not use rep penalty, because the brain damage is not worth it IME.
Has anyone tried pushing this model past 32k ctx for RP?
>>
Ancient laptop anon here. I tried the new Llama3 8B models and the results are a bit underwhelming (usecase RP/ERP). In fact, I found 7B undislop models to perform better? Maybe I'm doing something wrong. The 8Bs seemed rather inconsistent and uncreative. The models I tried are Soliloquy-8B and Sunfall Abliterated-8B. Instruct: Llama3, Samplers: smoothing 0.2-0.3, temp 1, minP 0.1, repPen 1.1. I have also tried Best Guess and Universal-Creative, but the results are the same. What am I doing wrong? Or are the 8B finetunes just not mature enough yet? To clarify, I'm trying to RP with a robot and these models completely ignore that. Probably need some tard wrangling advice...
>>
>>101056169
It's not that you didn't publish a paper showing a thorough comparison between all the models and quants. It's that the models you tested have little to nothing to do with each other. The tess vs qwen test kinda makes sense. Two tess failed, one qwen got it. THAT is a data point. Tess finetune affected the model adversely for your test. Good. That's a starting point. As for the rest, the best we can say is 'sometimes _S gets it, but i haven't tested the others'.
You still haven't said anything about the outputs being deterministic or, if not, how many times you ran the tests with each model.
And I didn't call you a retard. Chill.
>>
>>101056424
Try L3 8B Stheno 3.2 (or whatever the latest version was)
>>
>>101056424
Try Stheno 3.2. It's generally the best fine tune for llama 3 8b I've found so far.
"better" is subjective as fuck in this context, of course, so your millage may vary.
Also, iterative-DPO can work well if you are not trying to do anything that requires consistent smarts, from my experience at least.
I'd drop smoothing curve and try a little lower temp.
>>
>>101056158
this, unironically
>>
>>101056487
Thanks. Do you mind posting appropriate instruct/samplers?
>>
>>101056397
At 4.65bpw it was very repetitive, an overall it felt even more stupid than Euryale.
>>
I swapped my Mikubox to all P100 16GB PCIe internally, leaving the external 3090s. Despite having to add a thermocouple and PWM channel to my fan controller, and also make a custom power cable for the P100, everything worked
:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-e2f8cd06-2c7d-accc-728b-62eef1627809)
GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-7da63f72-d5a2-dadb-247a-3880060c84b6)
GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-40205c56-3989-a682-17b2-c2ea90f70e5e)
GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-6537af5d-1095-8402-6c50-d8d9d5afa9b5)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-34724105-36dd-23ca-3a77-083008f640ec)


Now, last I checked (last week) exllamav2 had a bug with flash_attention and GPUs older than Ampere, so that might be a blocker still.
>>
>>101056525
It's all mentioned here
https://huggingface.co/Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix
>>
>>101056372
0 does not exist on logarithmic scale
>>
>>101056532
Temps look good:
------------
NTC 1 temp: 32.75
NTC 2 temp: 32.53
NTC 3 temp: 33.24
PIN 1
PWM %: 30
PWM value: 716
------------
PIN 2
PWM %: 30
PWM value: 716
------------
PIN 3
PWM %: 37
PWM value: 644
------------

The die temps are higher, of course, as I'm reading off the heatsink at the exit, so my code ramps up the fans at a much lower temp than the die temp. It's really just to keep the fans extra quiet at idle, not that they are really loud at 100%.
>>
>>101056611
ln(1) = 0 ?
>>
>>101056504
Thanks, will try.
>better
As I mentioned, I'm mostly aiming for character adherence and good quality prose/creativity (not "whispered in a hushed whisper"). But I know I shouldn't expect much from small models.
>consistent smarts
I'm doing casual RP, not some strict format, so occasional retardation is absolutely fine. But when 90% of responses are shit it becomes quite unbearable - hence the search for best models in this range.
>drop smoothing curve
So something like 0.2 smoothing and 0.75 temp?
>>
>>101056642
As in, don't use smoothing curve, just go raw temp and minP., maybe a tad of rep pen, although I'd remove that when first testing the model also.
>>
>>101056372
it's a logarithmic scale retard
>>
>>101056532
>>101056632
That's pretty dope.
What are you using that for?
Just RP, agents, fine tuning, loaning compute?
>>
Why did people stop training on top of the base models?
>>
>>101056675
>>101056611
>line clearly descends
>NOOO IT'S NOT MEANT TO GO TO 0
math is a joke
>>
>>101056720
Expensive in compute and easier to fuck up than lora. But it doesn't matter all that much. Garbage in, garbage out. Most people that take up the mantle often use datasets so garbage it hurts to think about.
>>
>>101056720
It's not like their shitty loras will turn out good anyway. If they really cared, they'd make a full finetune.
>>
>>101056720
no?
>>
File: file.png (116 KB, 1140x698)
116 KB
116 KB PNG
which one of you fucks did this
>>
>>101056839
>when mikufag takes too much hrt meds
>>
>>101056431
>It's that the models you tested have little to nothing to do with each other.
Which makes sense since I've been trying to find a model or models that serve my interests. So when one model doesn't, naturally I try a different lineage sooner than I download a half dozen related models at 2 minutes per GB, spending the time finding other stuff to delete to make room.

Settings are, or are close to, Kobold defaults, and at 45 minutes for a single try in some cases, I'm testing it like I would be using it: One shot and either it's right or I get misled.

There are plenty of people with powerful rigs who can do the science in seconds and actually know what's happening inside of the models and software. I'll leave it to the experts. I just want to be able to get >1t/s and get reasonable answers to my questions. And I've gone from <1 to 5 candidates that at least got music theory right.

(I haven't figured out how I will test coding, but one question I asked it while coding last week might work. It came up because the model was wrong, when I told it it was wrong it wrote a kluge that almost worked and did after I fixed one line. So maybe recreating that scenario if I remember the details will serve as a test.)
>>
I think that eventually Synthetic datasets will be the way to go. Too much time and manpower is used in the creation of organic datsets, which makes its only really feasible with a large financial backing. If Synthetic datasets can be used and refined to the point where they are on par or better then their organic counterparts then it will vastly speed up the creation of Datasets as well as their quality.
>>
>>101056839
that is a woman and no chud will say otherwise
>>
>>101056839
him
>>101047603
>>
File: 1700588146330630.jpg (157 KB, 596x699)
157 KB
157 KB JPG
>>101056839
I wouldn't be surprised if it was the Miku BBC spammer
>>
>>101056839
b-b-b-based
>>
File: basedrecs.jpg (48 KB, 430x474)
48 KB
48 KB JPG
>envoid in my recommendations alongside migu and tetters
Based, the youtube algorithm is finally delivering
>>
>>101049838
Can someone with a recent but shitty NVIDIA GPU please benchmark this PR vs master?
https://github.com/ggerganov/llama.cpp/pull/8018
(Both with LLAMA_CUDA_FORCE_MMQ.)
>>
>>101056965
how shitty are we talking about?
>>
>>101056965
i haz rtx 3060 how do i install this pr
>>
File: 1695283474325669.png (42 KB, 376x499)
42 KB
42 KB PNG
>>101056965
will it do?
>>
File: 1664407945758958.jpg (32 KB, 480x601)
32 KB
32 KB JPG
>go back home
>training script is kill
>shiet
>hdd full, is all the 9001 training checkpoints
>delete all keep the last
>resume the training
>fail
>mfw the last checkpoint is corrupted cuz duh no space
>>
File: 00024-1397236490.png (327 KB, 512x512)
327 KB
327 KB PNG
>>101057031
Why would you save so many checkpoints?
>>
are RP focused models just as good at narrative/storytelling or do i have to look for dedicated ones?
>>
>>101056977
Something like a 3060 or 4060.

>>101057004
git checkout master, compile, run llama-bench, git remote add my fork, git fetch, git checkout johannesgaessler/cuda-mmq-stream-k-2, compile, run llama-bench.

>>101057028
No sorry, I want data for Turing or newer specifically.
>>
are there any ERP finetunes of command-r? or good finetunes of it in general?
>>
>>101056965
Is a 1050ti too shit for this?
>>
File: Oof size.jpg (91 KB, 880x480)
91 KB
91 KB JPG
>>101057031
>>
>>101057084
It's too old.
>>
>>101057031
>leaving your GPU running full blast while you're not home
You guys are crazy. I never do this, way too paranoid my house will burn down. Especially if you have multiple GPUs it's like leaving a space heater running.
>>
>>101057079
>are there any ERP finetunes of command-r?
yes, it bad
https://huggingface.co/TheDrummer/Coomand-R-35B-v1
>or good finetunes of it in general?
no
>>
>>101057109
M-maybe he's not using deepspeed.
>>
>>101057055
i thought was a good idea in case of crash and for some random test
>>
what's the best coomer model runnable on 24gigs vram?
>>
>>101057031
kek, you might be able to recover something with some disc recovery software
>>
>>101057109
I only put my tinder box in my tower because there's nowhere else to put it, don't judge me.
>>
>>101056709
Ah just playing with larger models really.
>>
>>101057076
>stuck with 2 3090 Ti
I'm so sorry.
>>
>>101057076
I'll give you results in few minutes from my 3060. Compiling kernels takes quite a while on my 5600.
>>
File: soyblonde.jpg (46 KB, 475x485)
46 KB
46 KB JPG
>>101057076
>your fork
petrus@petraists:~/TND/justforyouCudaDev/cudaddy/llama.cpp$ LLAMA_CUDA_FORCE_MMQ=1 ./llama-bench -m ../../../models/Stheno-3.2-8b/L3-8B-Stheno-v3.2-Q6_K-imat.gguf -ngl 1000
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | pp512 | 1395.24 ± 7.92 |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | tg128 | 42.91 ± 0.43 |

build: da1db13d (3185)

>greg
petrus@petraists:~/TND/justforyouCudaDev/llama.cpp$ LLAMA_CUDA_FORCE_MMQ=1 ./llama-bench -m ../../models/Stheno-3.2-8b/L3-8B-Stheno-v3.2-Q6_K-imat.gguf -ngl 1000ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | pp512 | 1371.40 ± 7.41 |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | tg128 | 42.41 ± 0.79 |

build: a7854743 (3185)


>>compiled with `LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 make LLAMA_CUDA_FORCE_MMQ=1 -j12`
>>gpu: rtx 3060 12gb
>>
Exllamav2 seems to have fixed the floating point error with my mixed CU setup, as well as making sure flash_attention is off when the GPU is older than Ampere.
LLaMA3 8B runs nicely on a single P100. Of course, no instant replies like with a 3090, but not bad. I'll stress-test it later this week with CR+, since that'll use all five GPUs.
>>
>>101057168
If that's enough to run a quant of a 34B, then you could try MarinaraSpaghetti/RP-Stew-v2.5-34B. For lower than 34B, try
bluuwhale/L3-SthenoMaidBlackroot-8B-V1
>>
So did anyone confirm whether or not autocoder is actually better than codestral?
>>
>>101057357
Thanks.
>>
>>101057321
You can add -j 12 to the make/cmake command to compile with 12 threads instead of 1.
>>
File: file.png (92 KB, 928x739)
92 KB
92 KB PNG
>>101057511
here
>>
>>101056929
>obsessed
>>
>>101057138
Well at least it wasn't for nothing, you have entertained the masses with your poor decisions.
>>
agi is impossible atm its just a pipe dream. agi doesn't need a prompt.
>>
>>101057675
we're just trying to go for cat-level now get with the program
>>
>>101056204
>>101056170
being antisemitic is truly the ultimate litmus test, if you are that blind to defend jews despite the information at hand, you truly deserve to be goyim for slaughter
>>
>>101057675
I don't even want AGI, I prefer just having a useful bot that does whatever the fuck I tell it to do.
>>
>>101057715
I'd fuck with cat level.
>>
>>101057718
it just /g/'s contrarianism at display
>>
>>101057079
>are there any ERP finetunes of command-r?
The base model is already horny.
>>
>>101057743
>a cat is fine too
>>
>>101057168
>>101057079
>>101057067
average helpless illiterate cumbrained brown zoomer moment
>>
>>101057594
?
>>
>>101057774
nah it just OP or one of his lapdogs bumping the thread, he always makes stupid questions itt
>>
>>101057031
>hdd full
>delete all keep the last
Where's that meme for "You know where this is going because you've been there in a previous lifetime"?

Schools have got to start teaching the importance of keeping two levels of backups whenever digital storage is involved.
>>
>>101057357
>petrafag is a third world gpupoor
And the world is round.
>>
>>101057789
Take your meds anon
>>
>>101057907
are you jealous cuda dev replied to me
>>
>>101057842
If it doesn't exist in 3 places, it doesn't exist.
>>
>>101057577
Thanks.
Looks like checking for compute capability is enough to determine whether or not the stream-k decomposition should be used.
>>
WizardLM-2-8x22B-Beige.i1-Q4_K_S 12288 context, Vicuna format (or Mistral, looking at the merge ingredient)
https://respectively-share-whats-plaza.trycloudflare.com/
Hosting for up to 8 hours.
Can put link in ST > Text Completion > KoboldCpp
>>
exactly.
>>
File: 1707726926019429.png (31 KB, 317x277)
31 KB
31 KB PNG
>>101057961
nta but yeah a little :(
>>
File: hat.png (23 KB, 402x299)
23 KB
23 KB PNG
comin out of my pocket money
>>
>>101056839
>>101056877
>>101056901
>>101056920
I'm really sorry sirs, but I really had to do the needful. Please to kindly resolve the issue, thank you sirs.

>>101056929
No, I'm not into cuckshit or troonshit.
>>
>>101058065
>not cc-by-nc-sa-4.0/faipl-1.0
ngmi
>>
>>101058065
>Beige
What is this supposed to be?
>>
cuda dev (you)'d me once. Felt pretty good ngl.
>>
>>101058095
sorry, sirs are busy gooning to shartsune japslop
>>
>almost 2 weeks into summer break
>already bored like shit
give ideas anons
>>
>>101057718
And what if I consciously support the Jews?
>>
>>101057842
>Schools have got to start teaching the importance of keeping two levels of backups whenever digital storage is involved.

they call it the cloud.
>>
>>101058138
How about a relaxing, comfy nap?
>>
>>101056424
Any other good 7B/8B models? Currently got the bandwidth to download, so trying to hoard as much as I can
>>
is there something like comfy ui for llms?
>>
>>101058360
Ooba?
>>
>>101058366
>>101058366
>>101058366
>>
>>101058360
ollama is the most intuitive one
>>
>>101058360
Yeah, ComfyUI with a custom node.
>>101058394
ComfyUI is not intuitive, shill.
>>
>>101058360
I'm liking Kobold.

Ollama is barebones and good enough for Babby's First Q&A. But it has a lot of problems: save state is broken by some common character sequences, their method of obfuscating model component files is lulzy and cumbersome, just typing into the terminal window fucks up on line wrap though maybe that depends on system.

After about a week you'll be ready to learn the technical details and to move on to Kobold or Ooba. (I didn't like Ooba but maybe it's better, that was a long time ago.)
>>
>>101052148
I think you will feel more comfortable in the Kobold Discord.
>>
>>101053147
Nothing supports it yet so no one knows.
>>
>>101053236
According to Meta Paper it was trained on 5x as many tokens as L2.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.