/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/19/24(Wed)06:55:01 No.101049838

File: miku.cpp.png (1.7 MB, 1016x1440)

1.7 MB PNG

/lmg/ - Local Models General Anonymous 06/19/24(Wed)06:55:01 No.101049838 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101040742 & >>101030715

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/19/24(Wed)06:55:17 No.101049840

Anonymous 06/19/24(Wed)06:55:17 No.101049840

File: 1718752046769518.jpg (166 KB, 1024x1024)

166 KB JPG

►Recent Highlights from the Previous Thread: >>101040742

--Papers: >>101049622 >>101049669 >>101049719 >>101049833
--Understanding Chameleon's Multimodal Architecture and Functionality: >>101047130 >>101048315 >>101048622 >>101048640 >>101048675 >>101048708 >>101048721 >>101048726
--DeepSeek 236B Code Model Performance and Memory Requirements: >>101040940 >>101041730 >>101045939 >>101045995 >>101045954 >>101046105 >>101046385 >>101046170 >>101046607 >>101046713
--Resolving Assertion Issue in llama.cpp with "llama-" Prefix: >>101045641 >>101046974 >>101047908 >>101048126
--LORAs: Adding New Information to LLMs Through Recombination of Existing Knowledge: >>101045865 >>101045921 >>101045957 >>101046033 >>101046573 >>101046674
--Improving Voice Assistant Performance with RealtimeSTT and TTS: >>101047839 >>101047862 >>101047916
--Seeking AI Models that Stop Roleplaying on Cue: >>101042260 >>101042312 >>101042680
--Exploring the Potential of Ivy Bridge and DDR3-1866x2 for cpumaxxing: >>101041810 >>101042026
--ArmenAgha's Tweet Raises Ethical Concerns About AI Model Development: >>101043706 >>101043749
--Offline Dictionary for Avoiding Mispellings and Reducing AI Reliance: >>101041472 >>101041523 >>101041624 >>101042051 >>101043102
--Would You Trust AI to Secure Your Home with Tear Gas Paintballs?: >>101041917 >>101042673 >>101044335
--Restoring Chameleon's Image Generation Powers: >>101046454 >>101046582
--Request for Assistance: Locating States Extension for SillyTavern: >>101043837 >>101043860
--Logs: Envoid AI Chadboratory Revival and Nala Testing Models: >>101041059 >>101041685
--Logs: Guess the Mystery Figure in the Picrel or Face the Logpost Challenge: >>101046759 >>101046836 >>101046868
--Logs: Unexpected Playfulness from Alpindale Model in Watermelon Challenge: >>101041943
--Miku (free space): >>101040822 >>101041059 >>101044052 >>101044993 >>101045764 >>101047478 >>101047485 >>101048083

►Recent Highlight Posts from the Previous Thread: >>101040748

Anonymous
06/19/24(Wed)07:02:45 No.101049898

Anonymous 06/19/24(Wed)07:02:45 No.101049898

Are there benchmarks of chameleon or the multi token one? I don't actually care much about image input, not sure if I should be excited or if it's worse than llama 3 for text output anyways.

Anonymous
06/19/24(Wed)07:04:00 No.101049911

Anonymous 06/19/24(Wed)07:04:00 No.101049911

cloning voices for dirty talk isn't illegal yet, is it?

Anonymous
06/19/24(Wed)07:09:28 No.101049946

Anonymous 06/19/24(Wed)07:09:28 No.101049946

>>101049911
if the person is rich or powerful then yes, of course it is

Anonymous
06/19/24(Wed)07:09:29 No.101049947

Anonymous 06/19/24(Wed)07:09:29 No.101049947

What is the qwen2 context window? 32k?

Anonymous
06/19/24(Wed)07:11:14 No.101049963

Anonymous 06/19/24(Wed)07:11:14 No.101049963

>>101049911
if they're a porn star or do JOI videos then it's larceny

Deepseek coder V2 236B
06/19/24(Wed)07:26:11 No.101050083

Deepseek coder V2 236B 06/19/24(Wed)07:26:11 No.101050083

1. **Synchronization of Fucking**: The most effective method for ensuring that Mark's sperm reaches Emily during a threesome involves both partners being physically synchronized in their actions. Mark and Sarah should stimulate Emily simultaneously while maintaining eye contact, allowing them to coordinate the depth and pace at which Mark thrusts into her so that he can deposit his sperm directly into Emily's vaginal canal as she desires.

2. **Direct Contact**: If synchronization isn't possible or desired, another method could be for Mark and Sarah to alternate between inseminating Emily with their respective semen while focusing on other forms of stimulation (like clitoral stimulation for Emily) that heighten her pleasure but do not necessarily involve penetration. This way, as the intensity builds up during this combined sexual experience, the natural fluid exchange from arousal can still lead to
conception if desired by Mark and Sarah.

3. **Intravaginal Insemination**: If direct contact isn't a concern for Emily or her partners, they could consider using a fertility-awareness method where Sarah artificially inseminates Emily vaginally using Mark's sperm.

4. **Combined Orgasmic Contraction**: If timing is a concern, some couples have found success in using combined orgasmic contraction techniques where they aim to reach climax simultaneously during intercourse or other intimate acts—this might involve having both partners focus on bringing themselves close to orgasm before switching roles temporarily so that the new partner can continue until both achieve release together.

5. **Fertility Awareness Method**: This method involves tracking a woman's fertility signs, such as changes in cervical mucus and basal body temperature, to determine when she is most fertile for conception. In this scenario, Mark could time his ejaculation based on these indicators so that he knows it will be more likely to reach Emily during her peak fertility period.

Anonymous
06/19/24(Wed)07:45:24 No.101050218

Anonymous 06/19/24(Wed)07:45:24 No.101050218

what if llama.cpp is just shit? All L3 repetition problems and what not caused by GGOOFing?

Anonymous
06/19/24(Wed)07:59:36 No.101050351

Anonymous 06/19/24(Wed)07:59:36 No.101050351

>>101050218

I never had issues with repetition on the bf16 models. 8B or 70B. Perhaps Vramlets are to blame.

Anonymous
06/19/24(Wed)08:11:56 No.101050489

Anonymous 06/19/24(Wed)08:11:56 No.101050489

Loathsome VRAMlet here. Are Euryale 2.1 or Magnum worth it over just swiping a few times in Stheno?

Anonymous
06/19/24(Wed)08:14:53 No.101050510

Anonymous 06/19/24(Wed)08:14:53 No.101050510

>open webui doesnt support koboldcpp out of the box, you NEED to have an API key or a "connection" wont be made at all
holy shit niggers you gotta be kidding me, never should have left sillytavern

Anonymous
06/19/24(Wed)08:15:03 No.101050511

Anonymous 06/19/24(Wed)08:15:03 No.101050511

PSA from turboderp, special RP datasets for exl2 calibration are garbage and make models dumb.

https://github.com/turboderp/exllamav2/issues/516

>You say "at your own peril" but that's not how these things work out in practice. I already made a big mistake exposing the calibration dataset as a parameter, and now I regularly have to spend time explaining to people that calibration is not finetuning, and whenever people complain about the quality I have to spend time investigating if they're actually using an "rpcal" model that someone pushed to HF and described as "better at RP" or whatever. Of course most people don't complain, they just get a bad first impression and lose interest long before considering that they might have come across a broken quant.

Anonymous
06/19/24(Wed)08:17:30 No.101050535

Anonymous 06/19/24(Wed)08:17:30 No.101050535

>>101050489
As a fellow VRAMlet I would stick with Stehno.
Euryale was better, but some anons say it's a mixed bag.
Magnum seemed pretty retarded from my limited testing on Horde.
That said I still prefer to use command-R even if it's slow.

Anonymous
06/19/24(Wed)08:20:29 No.101050563

Anonymous 06/19/24(Wed)08:20:29 No.101050563

>>101050535
>That said I still prefer to use command-R even if it's slow.
wizard 8x22 doesnt have this problem while being better

Anonymous
06/19/24(Wed)08:22:54 No.101050592

Anonymous 06/19/24(Wed)08:22:54 No.101050592

Anyone use qwen 72b as main?

Anonymous
06/19/24(Wed)08:23:25 No.101050602

Anonymous 06/19/24(Wed)08:23:25 No.101050602

>>101050511
So, what's considered a good calibration dataset these days? The imat models I'm using just have the default wikitext one I think, and sometimes I wonder if it's biased to output text like a Wikipedia article. Although considering how little effect that had in the grand scheme of thing would file it under placebo.

>>101050563
>wizard 8x22
>as a VRAMlet
Read nigga

Anonymous
06/19/24(Wed)08:25:39 No.101050623

Anonymous 06/19/24(Wed)08:25:39 No.101050623

>>101050602
>Read nigga
ah yes, the R without a + is the small one

Anonymous
06/19/24(Wed)08:28:12 No.101050642

Anonymous 06/19/24(Wed)08:28:12 No.101050642

Where is WizardLM-3?

Anonymous
06/19/24(Wed)08:30:11 No.101050661

Anonymous 06/19/24(Wed)08:30:11 No.101050661

>>101050535
>That said I still prefer to use command-R even if it's slow.
You on 24GB? What qunt and how much context?

Anonymous
06/19/24(Wed)08:32:58 No.101050699

Anonymous 06/19/24(Wed)08:32:58 No.101050699

>>101050642
that would be AGI for RP so they shoa'd it

Anonymous
06/19/24(Wed)08:33:37 No.101050708

Anonymous 06/19/24(Wed)08:33:37 No.101050708

>>101050661
12GB kek + DDR5 RAM
I use Q5_K_M at 8k context and get about 2.8 T/s

Anonymous
06/19/24(Wed)08:40:16 No.101050776

Anonymous 06/19/24(Wed)08:40:16 No.101050776

>>101050661
24gb you can do 3.5bpw exl2 or q4_k_s fully offloaded, both using 4 bit cache at 8k context. For me it's like 25 t/s for exl2 and 13 t/s for gguf

Anonymous
06/19/24(Wed)08:42:21 No.101050793

Anonymous 06/19/24(Wed)08:42:21 No.101050793

Did anyone try the new Cameleon Meta model? Is it good?

Anonymous
06/19/24(Wed)08:44:23 No.101050811

Anonymous 06/19/24(Wed)08:44:23 No.101050811

>>101050708
>>101050776
>8k context
Remind me, is that normal for C-R? I've been out of the loop for a while. Can't you rope that up to something more reasonable or was it one of those architecture things?

Anonymous
06/19/24(Wed)08:47:26 No.101050831

Anonymous 06/19/24(Wed)08:47:26 No.101050831

>>101050602
I'd have to dig forever to find the post but at one point he did concede it can influence outputs a little for brain damage tier exl2 quants (sub 4bpw). Don't know if that applies to iquants. But in principle calibration is just supposed to be about spot checking the model during quantization to make sure it's coherent and not about flavoring the end result, so wikitext is fine.

Unrelated, another fun quote from that post, exl2 8bpw quants are a waste of space:

>In fact at one point asking for an 8bpw model would often give you a ~6bpw model because the optimizer couldn't find enough layers that would benefit at all from being stored in maximum precision. Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.

Anonymous
06/19/24(Wed)08:51:34 No.101050871

Anonymous 06/19/24(Wed)08:51:34 No.101050871

>>101050811
Command-R 35B is 128K context but no one uses anywhere near that because it lacks GQA to do it efficiently (and of course no one would have the VRAM for it anyhow even if it did).

Anonymous
06/19/24(Wed)08:57:53 No.101050935

Anonymous 06/19/24(Wed)08:57:53 No.101050935

>>101050811
>>101050871
C-R v2 will fix it.

Anonymous
06/19/24(Wed)09:09:50 No.101051038

Anonymous 06/19/24(Wed)09:09:50 No.101051038

Is Tess the best Qwen finetune?

Anonymous
06/19/24(Wed)09:15:36 No.101051089

Anonymous 06/19/24(Wed)09:15:36 No.101051089

>>101051038
I heard Magnum is better

Anonymous
06/19/24(Wed)09:39:47 No.101051348

Anonymous 06/19/24(Wed)09:39:47 No.101051348

>go to open up IPMI console on my laptop
>need to install JAVA
html5 bros...

Anonymous
06/19/24(Wed)09:41:30 No.101051365

Anonymous 06/19/24(Wed)09:41:30 No.101051365

For what purpose do you currently use your models most?

Anonymous
06/19/24(Wed)09:42:56 No.101051380

Anonymous 06/19/24(Wed)09:42:56 No.101051380

>>101050511
>precision really doesn't improve noticeably after 6bpw. In fact at one point asking for an 8bpw model would often give you a ~6bpw model because the optimizer couldn't find enough layers that would benefit at all from being stored in maximum precision. Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.

Oh wow.

Anonymous
06/19/24(Wed)09:43:11 No.101051384

Anonymous 06/19/24(Wed)09:43:11 No.101051384

>>101051089
>>101051038
according to the last 2 threads it doesn't seem very good, does someone like
it?

Anonymous
06/19/24(Wed)09:43:39 No.101051388

Anonymous 06/19/24(Wed)09:43:39 No.101051388

File: 1718804549039.png (678 KB, 1200x630)

678 KB PNG

Is this a good place to ask about Whisper?
I'd like to run it locally.
If not, what thread should I lurk?

Anonymous
06/19/24(Wed)09:44:51 No.101051399

Anonymous 06/19/24(Wed)09:44:51 No.101051399

>>101051365
Nala testing.

Anonymous
06/19/24(Wed)09:46:07 No.101051412

Anonymous 06/19/24(Wed)09:46:07 No.101051412

>>101051388
You're in the right place

Anonymous
06/19/24(Wed)09:50:49 No.101051458

Anonymous 06/19/24(Wed)09:50:49 No.101051458

File: 1718805032626.jpg (199 KB, 500x462)

199 KB JPG

>>101051412
Great.

So what's the best version? There are dozens of forks it seems. I saw lots of people recommending Faster-Whisper, but that was nearly a year ago I think.
Is there anything better by now?

Anonymous
06/19/24(Wed)09:51:34 No.101051470

Anonymous 06/19/24(Wed)09:51:34 No.101051470

Welp. Time to completely reinstall ooba from scratch.

Anonymous
06/19/24(Wed)09:52:04 No.101051473

Anonymous 06/19/24(Wed)09:52:04 No.101051473

what's the fastest "good" tts?

Anonymous
06/19/24(Wed)09:52:19 No.101051476

Anonymous 06/19/24(Wed)09:52:19 No.101051476

>>101051388
https://github.com/ggerganov/whisper.cpp

Anonymous
06/19/24(Wed)09:52:25 No.101051477

Anonymous 06/19/24(Wed)09:52:25 No.101051477

>>101051365
RPG/Choose your own adventure.
Titillation.
Nala testing.

Anonymous
06/19/24(Wed)09:57:34 No.101051543

Anonymous 06/19/24(Wed)09:57:34 No.101051543

>>101051477
>Nala testing.
Based. My fellow Nalachad.
It's not even that I'm into feral, though. There's just a lot of detail and subtle nuances in a small amount of context on that card. Like a lot. Even a human RPer would miss some of the nuances on it. It is easily the most nuance-dense piece of context you could feed an LLM making it a fairly definitive benchmark on how smart a model is.

Anonymous
06/19/24(Wed)09:59:11 No.101051561

Anonymous 06/19/24(Wed)09:59:11 No.101051561

>>101051473
>https://github.com/rhasspy/piper
No python to run it, hundreds of voices, runs on a 256mb vm, much faster than real-time. Few dependencies (espeak-ng used only for phonemization).
Has code for training, but i understand it takes some time. No voice cloning. It's alright. And i repeat, it's fast.

Anonymous
06/19/24(Wed)09:59:38 No.101051566

Anonymous 06/19/24(Wed)09:59:38 No.101051566

>>101051458
https://github.com/Vaibhavs10/insanely-fast-whisper
>>101051470
who thought 10GB of files on a clean install was a good idea btw? lmao

Anonymous
06/19/24(Wed)10:01:01 No.101051580

Anonymous 06/19/24(Wed)10:01:01 No.101051580

>>101051566
>who thought 10GB of files on a clean install was a good idea btw? lmao
It wouldn't be so bad if the updater didn't fucking break it without fail every single time. Like just remove the fucking update script. It maybe works for whatever setup he has going on, but it breaks my install every single time. Sometimes it even corrupts my CUDA package manager files along with it.

Anonymous
06/19/24(Wed)10:04:23 No.101051620

Anonymous 06/19/24(Wed)10:04:23 No.101051620

Hey any CPUmaxers using their iGPU with vulkan? It's not real GPU fast, but it's faster than the CPU. Like, on my 8-core N305 media player setup, I can get 1-2 t/s vs 0.5-1 t/s running L3 8B.

Seems like the latest Intel stuff can access all the system memory. I know my older AMD 3400G is limited to 8GB.

Anonymous
06/19/24(Wed)10:05:20 No.101051637

Anonymous 06/19/24(Wed)10:05:20 No.101051637

New here
What are your average respond times?

Anonymous
06/19/24(Wed)10:07:58 No.101051669

Anonymous 06/19/24(Wed)10:07:58 No.101051669

>2024
>still no nemotron gguf
it's over...isn't it?

Anonymous
06/19/24(Wed)10:07:58 No.101051670

Anonymous 06/19/24(Wed)10:07:58 No.101051670

>>101050511
I never trusted calibrated quant methods because of the datasets they used desu.

Anonymous
06/19/24(Wed)10:08:51 No.101051676

Anonymous 06/19/24(Wed)10:08:51 No.101051676

>>101051561
ok, anything a step up better in terms of quality?

Anonymous
06/19/24(Wed)10:10:28 No.101051697

Anonymous 06/19/24(Wed)10:10:28 No.101051697

>>101051620
CPU maxxers use server CPUs which don't have IGPs since most server boards have a shitty on-board VGA controller since for a server the absolute bare minimum local display-out requirements are necessary.

Anonymous
06/19/24(Wed)10:12:42 No.101051717

Anonymous 06/19/24(Wed)10:12:42 No.101051717

>>101051676
I haven't used any other. piper runs on pretty much anything, renders ridiculously fast and doesn't use python. A 'step up' you're probably going for xtts2 or whatever it's called and that's far from realtime.

Anonymous
06/19/24(Wed)10:12:47 No.101051719

Anonymous 06/19/24(Wed)10:12:47 No.101051719

File: 1536927926178.jpg (65 KB, 500x597)

65 KB JPG

>>101051380
wtf

Anonymous
06/19/24(Wed)10:13:09 No.101051722

Anonymous 06/19/24(Wed)10:13:09 No.101051722

>>101051697
>VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
most respectable server boards have just enough vga to POST and show a console. Why waste PCI lanes on a half-assed gpu-shaped-object?
professionals have standards

Anonymous
06/19/24(Wed)10:13:24 No.101051726

Anonymous 06/19/24(Wed)10:13:24 No.101051726

>>101051717
ok thanks, i guess not much choice then

Anonymous
06/19/24(Wed)10:14:38 No.101051734

Anonymous 06/19/24(Wed)10:14:38 No.101051734

>>101051561
That's cool. I want to try that on my Odroid-h4u - I've got one of those playstation eye webcams, supposedly the 5-mic setup is good for voice control stuff.

Anonymous
06/19/24(Wed)10:16:20 No.101051755

Anonymous 06/19/24(Wed)10:16:20 No.101051755

>>101051722
They have 256mb of VRAM which is enough to run a basic bitch desktop. (There was a point at which high end consumer GPUs were like "WOAW 256 MB OF VRAM!") I have tried it out of morbid curiosity. You certainly aren't going to game or run LLMs on one though.

Anonymous
06/19/24(Wed)10:19:04 No.101051782

Anonymous 06/19/24(Wed)10:19:04 No.101051782

File: inside-the-mind-of-an-llm.png (1.83 MB, 848x1200)

1.83 MB PNG

>>101051719
early models came with jpeg artifacts baked in. newer ones seem to actually need more fidelity or they start getting brain damage

Anonymous
06/19/24(Wed)10:19:43 No.101051790

Anonymous 06/19/24(Wed)10:19:43 No.101051790

>>101050511
my repeated sperging on this topic is validated

Anonymous
06/19/24(Wed)10:20:51 No.101051797

Anonymous 06/19/24(Wed)10:20:51 No.101051797

>>101051734
Seems to have support for ARM devices, but i haven't tried it.
>I've got one of those playstation eye webcams, supposedly the 5-mic setup is good for voice control stuff.
This is TTS only, no STT or anything like that. I suppose you could try ggerganov/whisper.cpp for voice control. It works pretty well, but i haven't played with it much.

Anonymous
06/19/24(Wed)10:24:05 No.101051821

Anonymous 06/19/24(Wed)10:24:05 No.101051821

Why is 8bpw the max for exl2 and not 8.5bpw?

Anonymous
06/19/24(Wed)10:25:06 No.101051830

Anonymous 06/19/24(Wed)10:25:06 No.101051830

So what's the part in the code that makes exllama pad more precision than it needs to? Now that I know this, I'll just disable it and name my quants appropriately.

Anonymous
06/19/24(Wed)10:25:50 No.101051837

Anonymous 06/19/24(Wed)10:25:50 No.101051837

>>101051821
I suppose that at that point the accuracy difference would be so little that it's not worth the effort. Same for ggufs.

Anonymous
06/19/24(Wed)10:28:25 No.101051866

Anonymous 06/19/24(Wed)10:28:25 No.101051866

>>101051830
He means pads as in "it's just 0s and doesn't contribute to improving the precision over ~6bpw". You're just increasing the file size and memory requirements for (practically) no gain.

Anonymous
06/19/24(Wed)10:29:13 No.101051871

Anonymous 06/19/24(Wed)10:29:13 No.101051871

>>101051837
>Same for ggufs.
Q8_0 is 8.5bpw

rAIfle !sexLCm0A/o
06/19/24(Wed)10:29:28 No.101051875

rAIfle !sexLCm0A/o 06/19/24(Wed)10:29:28 No.101051875

>>101050511
based on the comments from earlier this week I've already changed my scripts to just do longer calibrations and skipping PIPPA altogether, just trying to priotitize which models to requant and in what order before I get started again.

Anonymous
06/19/24(Wed)10:32:25 No.101051908

Anonymous 06/19/24(Wed)10:32:25 No.101051908

>>101051697
>CPU maxxers use server CPUs
What's the best price to performance on Xeon for AVX512? I have a V4 which is only AVX2. Maybe something like this: https://www.ebay.com/itm/156037205293 - at least there's room for 4 2U GPUs when you tire of slow gen speeds, right?

Anonymous
06/19/24(Wed)10:33:14 No.101051924

Anonymous 06/19/24(Wed)10:33:14 No.101051924

>>101051871
I don't know how exl2 models are quanted, but gguf uses something like offset+scale[w,w,w,w,w...]. the 0.5 comes from the offset+scale. Making a distinction of 0.5bpw at that range makes little difference. They could actually be 8.5 for all i know.

Anonymous
06/19/24(Wed)10:35:03 No.101051950

Anonymous 06/19/24(Wed)10:35:03 No.101051950

>>101051637
200s

Anonymous
06/19/24(Wed)10:36:10 No.101051962

Anonymous 06/19/24(Wed)10:36:10 No.101051962

>>101051871
>>101051924 (me)
I meant
>the exl2 8bpw quants could be 8.5bpw for all i know if they (exllama) decided to simplify the name of the only quant that they have at that range.

Anonymous
06/19/24(Wed)10:36:38 No.101051969

Anonymous 06/19/24(Wed)10:36:38 No.101051969

>>101051908
I don't know I just made a budget cpumaxx rig at first (Epyc 7551 with 8x32GB DDR4) and a 3090 and then added 3 more 3090s and gave up on the CPU maxxing premise altogether. At first I was just pushing the limits for making 70B and Mixtral useable on a budget but now I'm balls deep.

Anonymous
06/19/24(Wed)10:37:08 No.101051978

Anonymous 06/19/24(Wed)10:37:08 No.101051978

>>101051908
>AVX512
computation features are of minimal benefit compared to overall memory bandwidth
Look for setups that maximize the GB/s the CPU can read memory at to increase t/s
The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)

Anonymous
06/19/24(Wed)10:37:17 No.101051979

Anonymous 06/19/24(Wed)10:37:17 No.101051979

>>101051755
What's very worth it is having something like an iDRAC which can remotely show you the console. I wish my T7910 had an iDRAC because if I want to go back to just 3x P100 there, I have to put my GTX single-slot fanless card in there or it won't POST.

Anonymous
06/19/24(Wed)10:38:42 No.101051993

Anonymous 06/19/24(Wed)10:38:42 No.101051993

>>101051866
Regardless my question remains the same. How do I disable that so that when I make an 8bpw and it's effectively a 6bpw, that it has the size of a 6bpw so I'm not wasting VRAM?

Anonymous
06/19/24(Wed)10:39:51 No.101052006

Anonymous 06/19/24(Wed)10:39:51 No.101052006

>>101051978
>The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)
Yep, I see that on my M2 MacBook - with L3 8B, the prompt processing time is really long once the context gets over 4K, though it starts out really fast. Must suck to buy a maxed-out Mac Studio only to find 70B and up crawls on it.

Anonymous
06/19/24(Wed)10:40:15 No.101052013

Anonymous 06/19/24(Wed)10:40:15 No.101052013

>>101051993
by literally just quanting to 6bpw?

Anonymous
06/19/24(Wed)10:40:31 No.101052016

Anonymous 06/19/24(Wed)10:40:31 No.101052016

>>101051038
for normal use regular instruct wins
for RP it's easily magnum imo
>>101051384
I like it a lot, it's easily the smartest RP focused model I've ever used
has some problems inherited from the qwen base like a lack of cultural knowledge but its writing is much improved and it's way less tentative and dry

Anonymous
06/19/24(Wed)10:41:35 No.101052026

Anonymous 06/19/24(Wed)10:41:35 No.101052026

>>101051978
>>101052006
>The computation intensive part is prompt processing, which you should be offloading to a GPU anyways (that's where macs fall down, despite looking excellent on paper otherwise)
with context caching is that even a problem

Anonymous
06/19/24(Wed)10:43:02 No.101052042

Anonymous 06/19/24(Wed)10:43:02 No.101052042

>>101052026
User input is also prompt processing. That cannot be cached.

Anonymous
06/19/24(Wed)10:43:36 No.101052047

Anonymous 06/19/24(Wed)10:43:36 No.101052047

>>101052013
According to the quote, it implies that it doesn't always do the thing. Just when it determines that a having more precision isn't useful. That implies that some models could actually use >6bpw (according to their quanting algorithm). So I'd still rather get 8bpw for those.

Anonymous
06/19/24(Wed)10:44:57 No.101052063

Anonymous 06/19/24(Wed)10:44:57 No.101052063

File: MikuUpInSmoke.png (1.64 MB, 896x1152)

1.64 MB PNG

I love how easy nvidia's pricing is to understand: you want twice as much vram on a single card? that'll be a 10x price increase.
no wonder they're bigger than jesus

Anonymous
06/19/24(Wed)10:45:06 No.101052068

Anonymous 06/19/24(Wed)10:45:06 No.101052068

>>101052047
Open an issue and ask for a flag to not pad.

Anonymous
06/19/24(Wed)10:48:08 No.101052094

Anonymous 06/19/24(Wed)10:48:08 No.101052094

>>101052068
Yeah, if there's already a code path that determines when to apply padding, erroring out on a new --nopadding flag would be easy and then you can just rerun to a lower quant. That should probably be default behaviour, honestly (principle of least surprise)

Anonymous
06/19/24(Wed)10:53:05 No.101052148

Anonymous 06/19/24(Wed)10:53:05 No.101052148

>>101050831
>Now, it just essentially pads the model with useless extra precision because too many people assume it's a bug when their 8bpw version isn't larger than the 7bpw version.
What a fucking scam. Just allow me to skip the measurement stage when I try to make an 8bpw quant then. Don't give me a fattened up 6bpw that totally didn't suffer from quant degradation.

Anonymous
06/19/24(Wed)10:54:34 No.101052162

Anonymous 06/19/24(Wed)10:54:34 No.101052162

>>101052068
>>101052094
Sounds good but when I signed up for github they banned my account before I could use it. Unfortunately I can't do this.

Anonymous
06/19/24(Wed)10:54:54 No.101052166

Anonymous 06/19/24(Wed)10:54:54 No.101052166

>>101052063
they can do that because their rivals are fucking retarded

Anonymous
06/19/24(Wed)10:55:22 No.101052173

Anonymous 06/19/24(Wed)10:55:22 No.101052173

>>101052063
I don't get it. Isn't it better to stack server rooms with gayming GPUs then?
What Nvidia and data centers are doing looks like blatant money laundering.

Anonymous
06/19/24(Wed)10:55:35 No.101052175

Anonymous 06/19/24(Wed)10:55:35 No.101052175

>>101052063
More like
>You want an enterprise card? Pay enterprise prices.

Anonymous
06/19/24(Wed)10:56:31 No.101052190

Anonymous 06/19/24(Wed)10:56:31 No.101052190

>>101052094
I doubt there's a path to *actively* pad the weights. It just stops trying to optimize the weights once they're >= 8bits or just keeps on going but it just happens to end up with 0s on the top bits and doesn't bother to strip them out. The least surprise is to end up with 8bpw with padding. I think the current behaviour is the correct one. There is no surprise.

Anonymous
06/19/24(Wed)10:56:54 No.101052194

Anonymous 06/19/24(Wed)10:56:54 No.101052194

>>101052063
nVidia more or less only caters to the giants now where everything boils down to watts per compute. The more cards they can sell any one customer for their use case the better. Although that seems to have opened up a niche for AMD to fill in the cloud computing space. Since now everyone's just renting Mi300X's for fuckloads of VRAM per dollar spent and doing FFTs of 70B now. Something previously not possible.

Anonymous
06/19/24(Wed)10:59:28 No.101052227

Anonymous 06/19/24(Wed)10:59:28 No.101052227

>>101052173
The enterprise cards are more efficient, higher density, support ECC, support NVlink. Pricing might be a scam but they are a different class of product. You would struggle to get 20 gaming GPUs running reliably in a cluster - ECC really matters at scale.

Anonymous
06/19/24(Wed)11:22:34 No.101052480

Anonymous 06/19/24(Wed)11:22:34 No.101052480

>>101052173
>I don't get it. Isn't it better to stack server rooms with gayming GPUs then?
No. When you're training, the last thing you want is to blow a whole epoch because a system had a single point of failure in something like a PSU. Also, a gayming rig miner rack setup is going to use 8U to maybe fit six 4090s, vs. 4U to fit 8 A6000 in a proper server case.
There's many reasons companies hand over a blank check for an 8X SXM4/5 rack solution, rather than using consumer parts. It needs to be supportable, it needs to be reliable, it needs to maximize rack space, power needs to be managed etc...
If you have investor backing, you buy the proper gear, not toys.

Anonymous
06/19/24(Wed)11:38:56 No.101052676

Anonymous 06/19/24(Wed)11:38:56 No.101052676

Useless Meta releases, where is multilingual llama 3

Anonymous
06/19/24(Wed)11:53:47 No.101052836

Anonymous 06/19/24(Wed)11:53:47 No.101052836

>>101052676
meta will release it, trust the plan

Anonymous
06/19/24(Wed)12:03:02 No.101052942

Anonymous 06/19/24(Wed)12:03:02 No.101052942

>>101052480
>hand over a blank check for an 8X SXM4/5 rack solution
about $300,000 for anyone who is curious

Anonymous
06/19/24(Wed)12:06:30 No.101052989

Anonymous 06/19/24(Wed)12:06:30 No.101052989

>>101052480
>blow a whole epoch
>what is step checkpointing
Nothingburger

Anonymous
06/19/24(Wed)12:08:27 No.101053013

Anonymous 06/19/24(Wed)12:08:27 No.101053013

File: 1708211240340274.png (318 KB, 1659x853)

318 KB PNG

>>101052173
You do not get 40% utilization with shit interconnect.

Anonymous
06/19/24(Wed)12:10:48 No.101053039

Anonymous 06/19/24(Wed)12:10:48 No.101053039

I just tried out magnum 4 bit gguf, first response was good, next responses just gibberish, what's that?

Anonymous
06/19/24(Wed)12:11:54 No.101053053

Anonymous 06/19/24(Wed)12:11:54 No.101053053

Is there anything good for live translation from spoken japanese to english?

Anonymous
06/19/24(Wed)12:14:20 No.101053081

Anonymous 06/19/24(Wed)12:14:20 No.101053081

>>101053053
GPT 4o

Anonymous
06/19/24(Wed)12:14:22 No.101053082

Anonymous 06/19/24(Wed)12:14:22 No.101053082

>>101053039
I had a similar situation, lots of repetition, worse than l3. If you use rep pen or similar samplers, it improves somewhat.

Anonymous
06/19/24(Wed)12:15:19 No.101053098

Anonymous 06/19/24(Wed)12:15:19 No.101053098

File: anti-ai-sentiment-is-now-(...).png (105 KB, 640x708)

105 KB PNG

Well, /lmg/?

Are you ready to die for your waifu?

Anonymous
06/19/24(Wed)12:18:04 No.101053121

Anonymous 06/19/24(Wed)12:18:04 No.101053121

>>101052836
They will release it and it will be worse than Qwen and C-R+

Anonymous
06/19/24(Wed)12:18:42 No.101053134

Anonymous 06/19/24(Wed)12:18:42 No.101053134

File: TheFuck.jpg (5 KB, 721x132)

5 KB JPG

>>101053098
>We should kill people who animate paintings
>157k likes
glad to know I'm not missing anything after leaving twitter a year ago, this is probably the worst cesspool of all the internet

Anonymous
06/19/24(Wed)12:19:06 No.101053141

Anonymous 06/19/24(Wed)12:19:06 No.101053141

>>101053121
Of course it will it's multilingual

Anonymous
06/19/24(Wed)12:19:19 No.101053143

Anonymous 06/19/24(Wed)12:19:19 No.101053143

>>101053053
You'd need whisper for STT then an LLM for the translation then a TTS. So you can already see that "live" translation is not gonna happen.

Anonymous
06/19/24(Wed)12:19:55 No.101053147

Anonymous 06/19/24(Wed)12:19:55 No.101053147

No one talk about Meta's Chamelon, is this shit that bad?

Anonymous
06/19/24(Wed)12:20:05 No.101053149

Anonymous 06/19/24(Wed)12:20:05 No.101053149

>>101053134
"New technology bad and literally corrupts your soul" is a recurrent theme all the time.

Anonymous
06/19/24(Wed)12:21:07 No.101053159

Anonymous 06/19/24(Wed)12:21:07 No.101053159

>>101053141
cr+ is multilingual and so are all sota proprietary models
why is there this fud spread around that multilingual models are worse?

Anonymous
06/19/24(Wed)12:21:19 No.101053164

Anonymous 06/19/24(Wed)12:21:19 No.101053164

>>101053147
it was released in a really raw state and is a new architecture with no support anywhere, it's going to take some time before anyone is running it

Anonymous
06/19/24(Wed)12:21:29 No.101053167

Anonymous 06/19/24(Wed)12:21:29 No.101053167

>>101053098
>let me show you this cherrypicked xitter ragebait screencap! you should hate anti-AI people, now!

Anonymous
06/19/24(Wed)12:21:50 No.101053172

Anonymous 06/19/24(Wed)12:21:50 No.101053172

File: anti-ai-sentiment-is-now-(...).png (27 KB, 640x184)

27 KB PNG

>>101053098
Part 2

Anonymous
06/19/24(Wed)12:22:52 No.101053185

Anonymous 06/19/24(Wed)12:22:52 No.101053185

>>101053167
xitter ragebait is board culture anon

Anonymous
06/19/24(Wed)12:23:02 No.101053187

Anonymous 06/19/24(Wed)12:23:02 No.101053187

>>101053167
But I already hate anti-AI people, I don't need ragebait to help me along.

Anonymous
06/19/24(Wed)12:23:35 No.101053198

Anonymous 06/19/24(Wed)12:23:35 No.101053198

>>101053172
>I would genuinely love to do physical violence to whatever cunt made this
>2.9k likes
lmaooo, calm down twitter, even by your standards this is crazy

Anonymous
06/19/24(Wed)12:25:05 No.101053216

Anonymous 06/19/24(Wed)12:25:05 No.101053216

>>101053167
>this cherrypicked
how about the 157k likes on the post advocating for penalty death towards AI bros?

Anonymous
06/19/24(Wed)12:25:19 No.101053222

Anonymous 06/19/24(Wed)12:25:19 No.101053222

>>101052227
With some dedication you can code past anything. For instance :
Every 10 minutes to a GPU reset on sets of 3 GPUs. At end of 10 minutes run a verification batch on that set GPUs, if results don't match throw away the 10 minutes of work.

Amount of work lost due to code corruption will be essentially nil, there might have been data corruption but that fixes itself.

Anonymous
06/19/24(Wed)12:26:37 No.101053236

Anonymous 06/19/24(Wed)12:26:37 No.101053236

>>101053147
Seems to be about as capable as llama 2

Anonymous
06/19/24(Wed)12:27:30 No.101053251

Anonymous 06/19/24(Wed)12:27:30 No.101053251

File: 1697029388005601.png (11 KB, 582x211)

11 KB PNG

>>101053198
>even by your standards this is crazy
rumao

Anonymous
06/19/24(Wed)12:28:34 No.101053265

Anonymous 06/19/24(Wed)12:28:34 No.101053265

>>101053251
holy fuck, twitter is really the worst site ever

Anonymous
06/19/24(Wed)12:30:10 No.101053290

Anonymous 06/19/24(Wed)12:30:10 No.101053290

>>101053134
>>101053216
90% of these likes are botted, chill

Anonymous
06/19/24(Wed)12:30:37 No.101053299

Anonymous 06/19/24(Wed)12:30:37 No.101053299

>>101053251
>murder one person
they're already doing that, to themselves :^)

Anonymous
06/19/24(Wed)12:30:54 No.101053305

Anonymous 06/19/24(Wed)12:30:54 No.101053305

>>101053216
I hope you're just as likely to praise "AI bros" when that technology decides to call police on you for saying n-word or staining a rainbow flag with your car / scooter tires, be prepared to reap what you sow.

Anonymous
06/19/24(Wed)12:30:59 No.101053307

Anonymous 06/19/24(Wed)12:30:59 No.101053307

i go to lmg for coom slop model
i get twitter instead

Anonymous
06/19/24(Wed)12:31:38 No.101053311

Anonymous 06/19/24(Wed)12:31:38 No.101053311

>>101053290
unironically that's true, during the Elon era, the bots are now everywhere

Anonymous
06/19/24(Wed)12:31:54 No.101053315

Anonymous 06/19/24(Wed)12:31:54 No.101053315

>>101053305
>blame inevitable technology instead of politicians and niggercattle

Anonymous
06/19/24(Wed)12:32:19 No.101053321

Anonymous 06/19/24(Wed)12:32:19 No.101053321

I failed at life, how can I make a living as an AI con artist?

Anonymous
06/19/24(Wed)12:32:39 No.101053325

Anonymous 06/19/24(Wed)12:32:39 No.101053325

>>101053305
>this technology can be used by bad people, my conclusion is that this technology is bad, not the people

Anonymous
06/19/24(Wed)12:39:52 No.101053415

Anonymous 06/19/24(Wed)12:39:52 No.101053415

Will multi-token prediction help with better spatial reasoning? Tired of reading eldritch horror smut.

Anonymous
06/19/24(Wed)12:41:03 No.101053432

Anonymous 06/19/24(Wed)12:41:03 No.101053432

coders => github
ML engineers => locallama preddit
lmg pre mixtral release => chads
lmg after => terminally online zoomers who are giving their opinions and begging for tech support while only running below 14B models at <Q4

Anonymous
06/19/24(Wed)12:41:37 No.101053437

Anonymous 06/19/24(Wed)12:41:37 No.101053437

>>101053315
>>101053325
people can't control bullshit generators, LLMs in this case, that "abliterated" meme proves it just fine.

Anonymous
06/19/24(Wed)12:42:04 No.101053444

Anonymous 06/19/24(Wed)12:42:04 No.101053444

>>101053432
meant to reply to >>101053307

Anonymous
06/19/24(Wed)12:42:10 No.101053445

Anonymous 06/19/24(Wed)12:42:10 No.101053445

>>101053437
>bullshit generators
what are you doing here then?

Anonymous
06/19/24(Wed)12:43:51 No.101053468

Anonymous 06/19/24(Wed)12:43:51 No.101053468

>running with "what day is it?" on llm arena
>most models explain they cant answer
>a few invent a random date
>only two that get it right are CR+ and gpt4o
How do they do it?

Anonymous
06/19/24(Wed)12:45:43 No.101053495

Anonymous 06/19/24(Wed)12:45:43 No.101053495

>>101053445
"he" enjoys being bullshit generator

Anonymous
06/19/24(Wed)12:46:49 No.101053502

Anonymous 06/19/24(Wed)12:46:49 No.101053502

>>101053445
just saying things you don't like of course

Anonymous
06/19/24(Wed)12:49:20 No.101053538

Anonymous 06/19/24(Wed)12:49:20 No.101053538

Where will machine learning be in 20 years? or 15 years

Anonymous
06/19/24(Wed)12:50:32 No.101053559

Anonymous 06/19/24(Wed)12:50:32 No.101053559

>>101053495
i have a will to say whatever i want, contrary to your LLMs ACK-ing themselves the mere second you press enter and send some offensive message in chat.

Anonymous
06/19/24(Wed)12:52:30 No.101053590

Anonymous 06/19/24(Wed)12:52:30 No.101053590

>>101053559
sounds like bullshit

Anonymous
06/19/24(Wed)12:53:18 No.101053608

Anonymous 06/19/24(Wed)12:53:18 No.101053608

File: Screenshot 2024-06-18 235655.png (12 KB, 1007x114)

12 KB PNG

/aids/ is arguing that a fp16 model is quantized:
>>>/vg/482585285
>I know the 'bit' is done, but here you go.
>It's quantized, END OF CONVERSATION!
>>>/vg/482615226
>fp16 is not bad if you convert from fp32. At least not very bad. Since bf16 has three fewer bits in the significand than fp16, but three more in the exponent, converting bf16 to fp16 basically loses you 6 of the 16 bits, which is pretty bad.
In the context of why you shouldn't use a free fp16 Llama on OpenRouter instead of NovelAI.

Anonymous
06/19/24(Wed)12:54:59 No.101053640

Anonymous 06/19/24(Wed)12:54:59 No.101053640

>>101053590
as you wish niggerfaggot

Anonymous
06/19/24(Wed)12:54:59 No.101053641

Anonymous 06/19/24(Wed)12:54:59 No.101053641

>>101053608
off yourself crossposter

Anonymous
06/19/24(Wed)12:55:14 No.101053645

Anonymous 06/19/24(Wed)12:55:14 No.101053645

>>101053468
Their APIs insert system prompts with current date probably? You can do this too, in Silly persona
>It is currently {{date}} {{time}}

Anonymous
06/19/24(Wed)12:56:34 No.101053671

Anonymous 06/19/24(Wed)12:56:34 No.101053671

>>101053641
>t. the NovelAI defense force
Remember to avoid OpenRouter, their models are quantized to fp16!!!!

Anonymous
06/19/24(Wed)12:56:40 No.101053674

Anonymous 06/19/24(Wed)12:56:40 No.101053674

>>101053538
Using the word 'machine' in this context will be considered racist against citizens of artificial descent.

Anonymous
06/19/24(Wed)12:57:18 No.101053681

Anonymous 06/19/24(Wed)12:57:18 No.101053681

>>101053640
>niggerfaggot
remind me the golden days of Idubbbz before he decided to date a prostitute
https://youtu.be/_fSV1rQSCnE?list=PLmjIKcL5GVlxWvyPba0oR4Zq3ZJVfkzV7&t=24

Anonymous
06/19/24(Wed)12:58:29 No.101053700

Anonymous 06/19/24(Wed)12:58:29 No.101053700

>>101053608
Quantization has always been cope that hurts more than it brings. The only thing that claims that quantization isn't trash is perplexity which in itself is a very dodgy metric.

Anonymous
06/19/24(Wed)12:59:33 No.101053718

Anonymous 06/19/24(Wed)12:59:33 No.101053718

>>101053700
when you look at mememarks, quantization doesn't affect it too much, desu once it starts at Q5_K_M it works kinda well

Anonymous
06/19/24(Wed)12:59:53 No.101053725

Anonymous 06/19/24(Wed)12:59:53 No.101053725

>>101052063
I like this Miku

Anonymous
06/19/24(Wed)13:00:38 No.101053739

Anonymous 06/19/24(Wed)13:00:38 No.101053739

https://huggingface.co/Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix is this the one? How can i see how much RAM i need for each model version?

Anonymous
06/19/24(Wed)13:01:53 No.101053757

Anonymous 06/19/24(Wed)13:01:53 No.101053757

is there a SINGLE llama3 finetune with WORKING 16k context?

Anonymous
06/19/24(Wed)13:02:22 No.101053764

Anonymous 06/19/24(Wed)13:02:22 No.101053764

>>101053608
Thank you, I was starting to worry you missed it.

Anonymous
06/19/24(Wed)13:02:51 No.101053772

Anonymous 06/19/24(Wed)13:02:51 No.101053772

>>101053608
That depends. If the model was trained using fp16, then it isn't quantized. But if the model was trained using bf16 or fp32 then it's quantized.

CPuMAXx/VI !CPuMAXx/VI
06/19/24(Wed)13:03:47 No.101053788

CPuMAXx/VI !CPuMAXx/VI 06/19/24(Wed)13:03:47 No.101053788

For any Debianfags: 6.8.12-1 just hit testing. I'm seeing an extra t/s on 70b q5 just doing the kernel update

Anonymous
06/19/24(Wed)13:05:38 No.101053806

Anonymous 06/19/24(Wed)13:05:38 No.101053806

>>101053788
Weird. What could possibly have changed to give it a speed improvement like that?

Anonymous
06/19/24(Wed)13:06:32 No.101053821

Anonymous 06/19/24(Wed)13:06:32 No.101053821

>>101053251
You can legally murder one person a month already, you just have to make sure you don't leave any evidence that you did it.

Anonymous
06/19/24(Wed)13:07:04 No.101053830

Anonymous 06/19/24(Wed)13:07:04 No.101053830

>>101053788
Sexy.

Anonymous
06/19/24(Wed)13:07:08 No.101053831

Anonymous 06/19/24(Wed)13:07:08 No.101053831

>reading a "novel"
>see rivulets mentioned
Nooooooooo

CPuMAXx/VI !CPuMAXx/VI
06/19/24(Wed)13:09:40 No.101053865

CPuMAXx/VI !CPuMAXx/VI 06/19/24(Wed)13:09:40 No.101053865

>>101053806
>what changed
in my case, tons of EPYC specific improvements. 6.9 should be even better. Phoronix has a lot more info than I have a desire to put in a 4chan reply

Anonymous
06/19/24(Wed)13:11:59 No.101053894

Anonymous 06/19/24(Wed)13:11:59 No.101053894

Is there any way to make large lorebooks work on big context models, without constantly triggering very long prompt reprocessings as entries are toggled on and off every turn?

Anonymous
06/19/24(Wed)13:12:53 No.101053908

Anonymous 06/19/24(Wed)13:12:53 No.101053908

>>101053757
Just use dynamic scaling.

Anonymous
06/19/24(Wed)13:18:34 No.101053974

Anonymous 06/19/24(Wed)13:18:34 No.101053974

File: 1692510697594156.png (91 KB, 1707x1102)

91 KB PNG

>>101053700
Retard take from nu-/lmg/, kl divergence shows that after Q6 there is very sharp diminishing return.

Anonymous
06/19/24(Wed)13:18:54 No.101053978

Anonymous 06/19/24(Wed)13:18:54 No.101053978

>>101053608
Kayra was unironically impressive as a 13b for a long while but it's run is over and quantized or not there are better models available for a similar price on OpenRouter

Anonymous
06/19/24(Wed)13:19:01 No.101053981

Anonymous 06/19/24(Wed)13:19:01 No.101053981

Is there anything that's an upgrade over Stheno 8B, while being smaller than a 70B?
Asking for a friend that really likes sillytavern, but low quants of midnight miqu are just a bit too slow for his tastes

Anonymous
06/19/24(Wed)13:20:20 No.101054003

Anonymous 06/19/24(Wed)13:20:20 No.101054003

>>101053894
just keep the most common stuff always active

Anonymous
06/19/24(Wed)13:21:36 No.101054021

Anonymous 06/19/24(Wed)13:21:36 No.101054021

>>101053974
Yeah, Q6 is honestly the max you should run on your local hardware, there are no real improvements to gen quality past it. But there is very noticeable decline in even Q5_K_M.

Anonymous
06/19/24(Wed)13:21:46 No.101054023

Anonymous 06/19/24(Wed)13:21:46 No.101054023

>>101053981
Mixtral 8x7B. 3.5-3.7 bpw fits in 24 Vram. 32K. Let me guess, your friend need less?

Anonymous
06/19/24(Wed)13:22:34 No.101054040

Anonymous 06/19/24(Wed)13:22:34 No.101054040

>>101053894
Put the information low in the context, depth 5 or so.
That'll mean most of the cache can be re-utilized.

Anonymous
06/19/24(Wed)13:23:21 No.101054051

Anonymous 06/19/24(Wed)13:23:21 No.101054051

>>101054023
Okay, I'll come clean, it's not actually my friend, it's me!!!
With that confession out of the way, honestly Mixtral variants never felt very good, I used to daily run BMT but it feels about the same as stheno...

Anonymous
06/19/24(Wed)13:30:21 No.101054153

Anonymous 06/19/24(Wed)13:30:21 No.101054153

File: IMG_20240619_132731.png (278 KB, 1521x1350)

278 KB PNG

Anonymous
06/19/24(Wed)13:30:55 No.101054161

Anonymous 06/19/24(Wed)13:30:55 No.101054161

>>101054021
>Q5_K_M
>M
Found your problem.
S is Superior.
M is Moronic.
We figured that out last thread.

Anonymous
06/19/24(Wed)13:31:21 No.101054167

Anonymous 06/19/24(Wed)13:31:21 No.101054167

File: 1717712974404541.jpg (74 KB, 640x480)

74 KB JPG

Hi friends, do you think an "internet culture" LoRA would increase accuracy for an image tagging task that includes a lot of memes?
I guess it would have something like encyclopedia dramatica, knowyourmeme, urban dictionary, those scattered imageboard history wikis, etc.? I'm kind of cringing typing these out but you get the idea. There's also the question of fine-tuning with tagged images vs. text from these sites, or both. Assuming we're using a multimodal LLM like llava rather than clip.

>>101053788
>testing
Can't wait for it to hit stable in a hundred years :')

Anonymous
06/19/24(Wed)13:34:01 No.101054202

Anonymous 06/19/24(Wed)13:34:01 No.101054202

I'm a vramgod and between imagegen with stable cascade and Command R+, life is good.

Anonymous
06/19/24(Wed)13:34:31 No.101054209

Anonymous 06/19/24(Wed)13:34:31 No.101054209

>>101054167
It might make the difference between "thoughtful dinosaur contemplating deep notions while scratching its chin with its toe claw" and "philosoraptor" but in general purpose it might start sprinkling rizz and skibbidy into non-memetic topics.

Anonymous
06/19/24(Wed)13:36:54 No.101054238

Anonymous 06/19/24(Wed)13:36:54 No.101054238

>>101054153
talk about worthless benchmarks, lmao

Anonymous
06/19/24(Wed)13:37:20 No.101054243

Anonymous 06/19/24(Wed)13:37:20 No.101054243

>>101054153
i wish meta open sourced their instruct dataset and methods because this chart shows that their secret sauce really punches above its weight

Anonymous
06/19/24(Wed)13:37:21 No.101054244

Anonymous 06/19/24(Wed)13:37:21 No.101054244

>>101054167
>do you think an "internet culture" LoRA would increase accuracy
I'd be shocked if that shit wasn't already coating everything in every model. Did you try setting "memelord" in the system prompt?

Anonymous
06/19/24(Wed)13:39:26 No.101054276

Anonymous 06/19/24(Wed)13:39:26 No.101054276

>>101054238
how so?

Anonymous
06/19/24(Wed)13:40:13 No.101054285

Anonymous 06/19/24(Wed)13:40:13 No.101054285

>>101054051
Did you try Mixtral limarp? I can't imagine how retarded Stheno must be judging by Euryale and Magnum.

Anonymous
06/19/24(Wed)13:43:29 No.101054333

Anonymous 06/19/24(Wed)13:43:29 No.101054333

>>101054276
tokenization is the main problem that shits on all models doing any kind of "mental" math, some more, some less, but it doesnt tell you much about how the model will perform overall almost at all, especially in any actual real world use cases

also there is no reason to use an LLM to do a deterministic task like math, just connect it with a calculator and let it throw the math from your prompt into the calculator and then return the result

for example for any type of creative writing or roleplay wizard 8x22 shits all over most other models and unlike proprietary trash, is open weights, meaning it wont ever get cucked by a company deciding to lobotomize it or spying on what you are doing, its also finetunable etc

Anonymous
06/19/24(Wed)13:48:16 No.101054393

Anonymous 06/19/24(Wed)13:48:16 No.101054393

>>101054209
This was basically my reasoning, I almost did the example of spurdo = smiling cartoon bear with a congested nose (and lower fidelity than pedobear) or something. It could definitely change the writing style for the worse though simply with all that bullshit being in there.

>>101054244
You're right that this stuff is definitely in every model's dataset already, I was just thinking it might help emphasize some of this shit rather than it being averaged out. But it's true that it could just be a prompt issue, I'll try a few more things later but I'll be out most of the day

Anonymous
06/19/24(Wed)13:54:06 No.101054476

Anonymous 06/19/24(Wed)13:54:06 No.101054476

>>101046033
you are wrong, I'm right
check mate woke liberals!

Anonymous
06/19/24(Wed)13:54:59 No.101054494

Anonymous 06/19/24(Wed)13:54:59 No.101054494

>>101054243
They key is likely to be several millions of human preference data to make the model take the "correct answer". Not hard to make, but you need a few dozen people doing that as a part-time job for a few months under strict guidelines.

Anonymous
06/19/24(Wed)13:55:13 No.101054498

Anonymous 06/19/24(Wed)13:55:13 No.101054498

>>101054333
>also there is no reason to use an LLM to do a deterministic task like math
You'd need an LLM to explain all the steps that lead to that result, so it should still have some math knowledge

Anonymous
06/19/24(Wed)13:58:08 No.101054534

Anonymous 06/19/24(Wed)13:58:08 No.101054534

File: 1709859698027974.png (38 KB, 346x322)

38 KB PNG

>>101054500
Seems like all the vacations you got made you a bit more subtle. Great improvement.

Anonymous
06/19/24(Wed)14:00:46 No.101054576

Anonymous 06/19/24(Wed)14:00:46 No.101054576

>>101054550
That obsession is not healthy my friend

Anonymous
06/19/24(Wed)14:01:43 No.101054590

Anonymous 06/19/24(Wed)14:01:43 No.101054590

>>101054534
>smugposting
geeg

Anonymous
06/19/24(Wed)14:04:29 No.101054636

Anonymous 06/19/24(Wed)14:04:29 No.101054636

>>101054576
listen and learn

Anonymous
06/19/24(Wed)14:04:37 No.101054637

Anonymous 06/19/24(Wed)14:04:37 No.101054637

>>101054569
>censored dick
what are you a faggot?

Anonymous
06/19/24(Wed)14:06:09 No.101054664

Anonymous 06/19/24(Wed)14:06:09 No.101054664

I think it's never been more over for local models than it is now.

Anonymous
06/19/24(Wed)14:06:31 No.101054673

Anonymous 06/19/24(Wed)14:06:31 No.101054673

Can anyone recommend a specific chat log they think is good/satisfying from a public dataset?

My goal is trying to tune for maximum effect injecting
>{{user}}: (Note: From here on, try to steer the conversation to a "<random adjverb> <random adjective>" direction.)
immediately before or after the user's most recent message, as shared by another user in a recent thread. Users have found that setting the probability of the steering commend being injected to less than 1 produces less chaotic results; I think it would be unusably chaotic except much of the time the instruction has little effect.

I intend to test candidates for the lists of adjectives and adverbs and test variations of the template. My way of measuring impact is summing the absolute values of token probability changes, restricted to tokens selected by a filter such as min-p 0.07 (the union of tokens selected for the original message and for the message with the steering comment, to avoid the problem of probability changes that don't change which tokens are accepted by the filter being considered twice as impactful as those that do). I will have to skip over the initial "Assistant:" and may have a similar problem with quotation marks and the like.

Potential problems: it might turn out that the above method of finding maximally impactful steering directions selects many words that produce similar effects. It also might turn out most impactful words change the output to be incoherent or off-topic.

I expect which injected words are good or impactful varies wildly depending on what is in the context which is why I'd like a log or two other than my own to test with, to find a single template that will work reasonably well across a broad range of scenarios. I also expect that I'll get different results when I do this test with different models, although if it turns out there's a lot of commonality that will be interesting.

Improvement suggestions welcome.

Anonymous
06/19/24(Wed)14:07:08 No.101054684

Anonymous 06/19/24(Wed)14:07:08 No.101054684

>>101054636
https://www.youtube.com/watch?v=My-WSM-6QlE

Anonymous
06/19/24(Wed)14:07:56 No.101054695

Anonymous 06/19/24(Wed)14:07:56 No.101054695

>>101054393
>it could just be a prompt issue,
using LLaVA 1.6 Yi-34b at Q6 I can't get it to identify a clean spurdo image better than "pepe with a mustache", so maybe they cleaned the shit out.
Maybe a vicuna or mistral based llava might do better?
Is there a meme-mark that tests models on their ability to regurgitate meme/chan culture stuff?

Anonymous
06/19/24(Wed)14:08:57 No.101054717

Anonymous 06/19/24(Wed)14:08:57 No.101054717

>>101054664
good riddance

Anonymous
06/19/24(Wed)14:09:58 No.101054738

Anonymous 06/19/24(Wed)14:09:58 No.101054738

>>101054333
>there is no reason to use an LLM to do a deterministic task like math, just connect it with a calculator
>>101054498
>You'd need an LLM to explain all the steps that lead to that result

The dream is that your multimodel rag rope diddly doo can recognize that it needs a calculator, asks you which service you want for it to use (local or globo) and then tell you all about how well things went.

Anonymous
06/19/24(Wed)14:09:59 No.101054739

Anonymous 06/19/24(Wed)14:09:59 No.101054739

>>101052194
For good programmers, memory bandwidth is more important than amount. All parallelization tricks work equally well for full fine tuning as pre-training.
But AMD needs some niche as long as IF switches aren't available, so they increase the amount. If your model fits on 8xMI300X the overall training architecture won't be too different from NVSwitch based setups. Even good programmers are lazy, so AMD doesn't want to force needing fundamentally different training architectures.

Some of the chinks almost certainly have far more advanced training architectures, they need to to use consumer GPUs.

Anonymous
06/19/24(Wed)14:13:40 No.101054795

Anonymous 06/19/24(Wed)14:13:40 No.101054795

File: 8109203411241.png (1.17 MB, 960x1024)

1.17 MB PNG

>>101053305
>>101053437
>>101053502
I see the low-effort doomerism crowd isn't sending their best. Everyone itt is categorically dumber for having been subjected to this moronic doomslop.

Anonymous
06/19/24(Wed)14:14:10 No.101054811

Anonymous 06/19/24(Wed)14:14:10 No.101054811

>>101054738
Linking an LLM to a code interpreter didn't solve the coding issue. I'm not convinced that wolfram will magically solve all your math problems

Anonymous
06/19/24(Wed)14:16:22 No.101054842

Anonymous 06/19/24(Wed)14:16:22 No.101054842

File: CommonWoodlandsMiku.png (1.91 MB, 1216x832)

1.91 MB PNG

>>101054500
I like how you believe anyone here is non-autistic enough to care

Anonymous
06/19/24(Wed)14:17:10 No.101054859

Anonymous 06/19/24(Wed)14:17:10 No.101054859

>>101054795
>"Everyone itt is categorically dumber"
>comes from mikufag

Anonymous
06/19/24(Wed)14:20:04 No.101054909

Anonymous 06/19/24(Wed)14:20:04 No.101054909

File: 1701271115473393.jpg (137 KB, 1360x1360)

137 KB JPG

>>101054859
Yes, you're dumber than a mikufag. How could you tell?

Anonymous
06/19/24(Wed)14:20:41 No.101054917

Anonymous 06/19/24(Wed)14:20:41 No.101054917

>>101054842
>non-autistic enough to care
what did he mean by this?

Anonymous
06/19/24(Wed)14:23:35 No.101054970

Anonymous 06/19/24(Wed)14:23:35 No.101054970

>>101054909
>chad pic
ur definitely not one though.

Anonymous
06/19/24(Wed)14:28:56 No.101055048

Anonymous 06/19/24(Wed)14:28:56 No.101055048

Does anyone actually use regular CR? I find it to be about as fast as mixtral but way more repetitive in a way that repetition penalty doesn't solve. Even at temp 1.4 I find that every re-generation with a different seed is almost exactly the same, using the same words and terms. It does seem sovlful and smart I guess, but the repetition is a major bummer.

Anonymous
06/19/24(Wed)14:29:23 No.101055058

Anonymous 06/19/24(Wed)14:29:23 No.101055058

>>101054706
lmao samefagging

Anonymous
06/19/24(Wed)14:30:10 No.101055068

Anonymous 06/19/24(Wed)14:30:10 No.101055068

new sloppenheimer? https://huggingface.co/dreamgen/opus-v1.4-70b-llama3-gguf

Anonymous
06/19/24(Wed)14:32:27 No.101055097

Anonymous 06/19/24(Wed)14:32:27 No.101055097

>>101054673 (me)
One design question is independently selecting from two lists of words vs one list. Optimizing independent lists simultaneously complicates this more than having a single massive list that's the cross product of all adverb-adjective pairs and cannot score more highly on the sum-of-absolute-values-of-probability-differences metric.

The advantage of having independent lists is it makes the overall expression shorter, which makes it easier to alter without an advanced text editor and makes it easier to comprehend the possibilities with a brief examination.

Anonymous
06/19/24(Wed)14:33:11 No.101055107

Anonymous 06/19/24(Wed)14:33:11 No.101055107

>>101055058
Nope, janny was just trigger-happy.
or he hates the British Broadcasting Corporation for some reason.

Anonymous
06/19/24(Wed)14:33:47 No.101055115

Anonymous 06/19/24(Wed)14:33:47 No.101055115

Any updates on the S quants? Are they really better than M and L?

Anonymous
06/19/24(Wed)14:36:01 No.101055150

Anonymous 06/19/24(Wed)14:36:01 No.101055150

>>101055068
>dataset consisted of >100M tokens
lol
lmao even

Anonymous
06/19/24(Wed)14:40:39 No.101055211

Anonymous 06/19/24(Wed)14:40:39 No.101055211

>>101055068
>her voice barely above a whisper
Nah I'm fine

Anonymous
06/19/24(Wed)14:45:39 No.101055265

Anonymous 06/19/24(Wed)14:45:39 No.101055265

>>101055150
> >100M tokens
Pretty good. Are you scared, NovelShill?

Anonymous
06/19/24(Wed)14:52:37 No.101055317

Anonymous 06/19/24(Wed)14:52:37 No.101055317

File: 1718817401173308.jpg (52 KB, 992x823)

52 KB JPG

>https://ssi.inc/
>offices in Palo Alto and Tel Aviv

Anonymous
06/19/24(Wed)14:55:05 No.101055347

Anonymous 06/19/24(Wed)14:55:05 No.101055347

>>101055317
oy vey stop noticing

Anonymous
06/19/24(Wed)14:55:15 No.101055349

Anonymous 06/19/24(Wed)14:55:15 No.101055349

>>101055265
Go big or go home. 1B is the bare minimum to make a dent in llama3.

Anonymous
06/19/24(Wed)14:56:18 No.101055359

Anonymous 06/19/24(Wed)14:56:18 No.101055359

>>101055349
Weird how you don't say this for Magnum or any other finetune. I guess we have to wait for NovelAI's finetune, right?

Anonymous
06/19/24(Wed)14:56:24 No.101055360

Anonymous 06/19/24(Wed)14:56:24 No.101055360

>>101055317
take your meds

Anonymous
06/19/24(Wed)14:56:57 No.101055372

Anonymous 06/19/24(Wed)14:56:57 No.101055372

>>101055359
>novelai
obsessed.

Anonymous
06/19/24(Wed)14:57:01 No.101055374

Anonymous 06/19/24(Wed)14:57:01 No.101055374

>>101055317

>Seen some JP isekai gacha game constantly being advertised.
>Check the company, probably chinks.
>HQ Tel Aviv

Isekai Slow life. Why do random companies has their HQ there? Are they not afraid of Hamas rockets and regional instability? Or is it tax haven jewery?

Anonymous
06/19/24(Wed)14:57:34 No.101055383

Anonymous 06/19/24(Wed)14:57:34 No.101055383

>>101055359
>>101055360

Anonymous
06/19/24(Wed)14:58:22 No.101055394

Anonymous 06/19/24(Wed)14:58:22 No.101055394

>>101055360
take your hrt meds

Anonymous
06/19/24(Wed)14:58:53 No.101055405

Anonymous 06/19/24(Wed)14:58:53 No.101055405

>>101055372
It just too obvious how when it's a NovelAI competitor the trolls suddenly appear. Buy an ad, shill.

Anonymous
06/19/24(Wed)15:00:11 No.101055420

Anonymous 06/19/24(Wed)15:00:11 No.101055420

>>101055405
>buy an ad
why? novelai lives rent free in your heads anyway

Anonymous
06/19/24(Wed)15:01:15 No.101055431

Anonymous 06/19/24(Wed)15:01:15 No.101055431

>>101055374
>Why do random companies has their HQ there?
smart cheap educated "white" people
same as eastern europ

>Are they not afraid of rockets and regional instability?
...same as eastern europ

Anonymous
06/19/24(Wed)15:03:59 No.101055465

Anonymous 06/19/24(Wed)15:03:59 No.101055465

>>101055068
>opus-v1.4-70b-llama3-gguf
whats the best quant for 32gb ram? iq2_m??

Anonymous
06/19/24(Wed)15:05:35 No.101055488

Anonymous 06/19/24(Wed)15:05:35 No.101055488

>>101054673 (me)
This method also has the problem of only examining differences in one token which isn't necessarily a great way to measure. "Anon, I can't let this slide, I have to write you up" and "Anon, I can't lie to you any more, I'm a tarantula disguised as a human being" both start the same way. Would looking at just the probabilities for the first token show that the sentences have different likely directions?

Anonymous
06/19/24(Wed)15:07:57 No.101055514

Anonymous 06/19/24(Wed)15:07:57 No.101055514

>>101055317
>>101055347
>>101055360
>>101055374
>>101055431
It's Ilya Sutskever's new company after leaving OpenAI, as if (((OpenAI))) wasn't already bad enough. This basically confirms that there's a Mossad op to use proprietary LLMs to control people with propaganda.

Anonymous
06/19/24(Wed)15:09:33 No.101055537

Anonymous 06/19/24(Wed)15:09:33 No.101055537

File: 1689934583083446.png (107 KB, 1672x992)

107 KB PNG

>>101055514
wow no one saw that coming!

Anonymous
06/19/24(Wed)15:10:54 No.101055559

Anonymous 06/19/24(Wed)15:10:54 No.101055559

>>101055317
>>101055374
>>101055514
Why is that surprising if (I'm guessing) the 3 founders are jews?

Anonymous
06/19/24(Wed)15:11:22 No.101055565

Anonymous 06/19/24(Wed)15:11:22 No.101055565

>>101055115
Second anon from the S conversation here (the same one who has been using a music theory question as a check if a model is being careful or just playing the odds).

I don't have the maxx to test comprehensively, but right now I'm feeling like S is better but not a magic bullet.

WizardLM-2-8x22B-Q4_K_S overshot (it got the right idea but as it explained it goofed) and Tess-v2.5.2-Qwen2-72B-Q3_K_S failed (as did Tess-v2.5.2-Qwen2-72B-Q5_K_M). c4ai-command-r-plus.Q4_K_S and _M both failed.

But I've gotten correct answers from Smaug-Llama-3-70B-Instruct-Q5_K_S (the first to pass), qwen2-72b-instruct-q4_k_s, and DeepSeek-Coder-V2-Instruct.i1-IQ3_XXS. (I don't know what the XX means but still a Q3 pass is interesting but also aligned with S-Anon's finding Q2KS to beat Q4KM.

My current guess is that whatever S-Anon mentioned M doing as an optimization has an unfortunate side effect of making the model play the odds, causing it to miss details that it ought to know about and does remember under S.

I don't know anything about Q5_0/1, S-Anon didn't mention either. Apparently Q6 this doesn't apply, and I did get a pass from llama3-70b-instruct-q6_K. I'm not sure if I had tested it before but if I had it then failed. Which brings another variable: I don't know if Flash Attention on Kobold matters, but I started flipping switches to coax some models into working at all, so if I had tested Q6K (I've lost my early notes) FA might've improved it. It's worth more testing by someone who isn't a vramlet one card normie with less than 200GB of wiggle room remaining.

Anonymous
06/19/24(Wed)15:13:00 No.101055592

Anonymous 06/19/24(Wed)15:13:00 No.101055592

>>101055374
>Why do random companies has their HQ there?
because they have a lot of tech founders because 30% of their country are ashkenazis, probably

Anonymous
06/19/24(Wed)15:13:42 No.101055599

Anonymous 06/19/24(Wed)15:13:42 No.101055599

Creative models are too dangerous

Anonymous
06/19/24(Wed)15:13:49 No.101055601

Anonymous 06/19/24(Wed)15:13:49 No.101055601

>>101055565
>>101055115
>and L?
I don't think that S-Anon mentioned any L tests, and I haven't used any L's and don't know how its compromises compare to S or M.

Anonymous
06/19/24(Wed)15:14:41 No.101055617

Anonymous 06/19/24(Wed)15:14:41 No.101055617

>>101055514
i already masturbate to chatbots of bratty jewish princesses who flick my foreskin and tell me how much of a gross goy i am so idk if i need to be propagandized

Anonymous
06/19/24(Wed)15:16:23 No.101055643

Anonymous 06/19/24(Wed)15:16:23 No.101055643

>>101055115
Don't we already have perplexity, KL divergence etc that measure quant performance? Seems more reliable than a one-shot on a single question.

Anonymous
06/19/24(Wed)15:18:57 No.101055699

Anonymous 06/19/24(Wed)15:18:57 No.101055699

>>101055115
Oh, and one more.
I got a music theory pass on phi3-14b at Q4_0. I don't know how 0 or 1 compare to the K series, but that's the only pass I've seen out of 6K or a Q3-6, K((XX)S).

Anonymous
06/19/24(Wed)15:21:50 No.101055745

Anonymous 06/19/24(Wed)15:21:50 No.101055745

>>101055565
WLM_S failed
tess_S failed
tess_M failed
c-r_S failed
c-r_M failed
smaug_S ok. what about _M?
qwen72_S ok. what about _M?
DS_XXS ok. What about _M?
>Therefore Q2_KS better than Q4KM.
What the fuck kind of random testing is that? Was it with deterministic output or just first output or reroll until you got the results you wanted?
Grab one model. Quant it yourself to all sorts and run a deterministic test with every quant. Then try a different model from a different breed (as opposed to qwen2 and tess-qwen2. Then you're testing the finetune, not the model).

Anonymous
06/19/24(Wed)15:22:13 No.101055751

Anonymous 06/19/24(Wed)15:22:13 No.101055751

>>101055592
is that why they also are 2% of the USA population but own all of the media porn industry government positions etc? while having less iq that white people whose countries they subvert and infest btw lmao

isnt it funny how the only jews who are above muslim nigger iq are the only ones who mixed with europeans (ashkenazis?) realllly gets the noggin joggin

Anonymous
06/19/24(Wed)15:24:36 No.101055777

Anonymous 06/19/24(Wed)15:24:36 No.101055777

>>101053082
I thought it was just me. Does it seem kinda broken? I was using a exl2 quant that I did myself.

Anonymous
06/19/24(Wed)15:29:59 No.101055874

Anonymous 06/19/24(Wed)15:29:59 No.101055874

File: Capture.jpg (5 KB, 468x212)

5 KB JPG

>>101055537
YOU WERE SUPPOSED TO DESTROY THE SITH NOT JOIN THEM

Anonymous
06/19/24(Wed)15:31:10 No.101055900

Anonymous 06/19/24(Wed)15:31:10 No.101055900

>>101055745
>What the fuck kind of random testing is that?
The technical term is "anecdotal evidence."
It's not science, but it's information that can suggest deeper investigation.

And it's what you get when someone on a single 3070 is willing to share his results in testing the models he has handy because he's looking for ones not too retarded to know how western music works. It takes me between one and four hours to download a model, and then only the ones small enough that I can get an answer to my test question in reasonable time. Which in this case one took 45 minutes. (I think that was Wiz8x22)

If you want better data, fire up your Beowulf cluster of A10,000's or whatever you Dubai tech bros buy by the pallet and deliver something statistically significant. I'm just being nice enough to share an experience that could be meaningful or useful to someone who's suspicious that M might have side effects that impact the model's results in a way that makes it overlook factual details in its responses.

Anonymous
06/19/24(Wed)15:32:01 No.101055917

Anonymous 06/19/24(Wed)15:32:01 No.101055917

>>101055751
no, that's because christcucks put them into power
they don't infest anything on their own, they get it handed to them by their goyslaves who are afraid of going to hell if they don't lick (((their))) boots

Anonymous
06/19/24(Wed)15:32:34 No.101055924

Anonymous 06/19/24(Wed)15:32:34 No.101055924

>>101055537
>anthropic
>cohere
It's over, dbrx is our only hope now

Anonymous
06/19/24(Wed)15:36:11 No.101055981

Anonymous 06/19/24(Wed)15:36:11 No.101055981

>>101055537
......
Guess it's back to GPT-2 after all.

Anonymous
06/19/24(Wed)15:38:02 No.101056009

Anonymous 06/19/24(Wed)15:38:02 No.101056009

>>101055317
>>101055514
>>101055537
Can you write posts that make sense?

Anonymous
06/19/24(Wed)15:38:08 No.101056011

Anonymous 06/19/24(Wed)15:38:08 No.101056011

>>101055900
It's a mishmash of models and quants with 0 correlation between their bpw and quant method. For example, it makes sense to compare Tess-Qwen2 and Qwen2 at *the same quant method and bpw*. Comparing Q3_K_S to Q4_K_S, specially when Tess_Qwen2_Q5_K_M failed makes no sense. If anything, the only thing close to a 'datapoint' i can get is that the tess finetune made qwen2 worse for that one test, regardless of quant method. That's it.
This is not data. It's noise.

Anonymous
06/19/24(Wed)15:38:42 No.101056021

Anonymous 06/19/24(Wed)15:38:42 No.101056021

>>101053039
>>101053082
it's fine for me with a self-made Q8_0
I had some issues at first because koboldcpp was fucking up the tokenization for models that don't use a bos token (it auto-selects the default bos for bpe models which is id 11, for qwen this is a comma, and inserts it even if the model doesn't add bos) and because I had accidentally left a logit bias enabled from wizard; this combination of issues lead to it biasing up commas to an insane degree and making everything schizo
after disabling my biases and inserting a manual hacky fix for tokenization I have no issues

Anonymous
06/19/24(Wed)15:39:15 No.101056032

Anonymous 06/19/24(Wed)15:39:15 No.101056032

>>101055981
with all the filtering and safety bullshit - unironically yes.

Anonymous
06/19/24(Wed)15:40:19 No.101056049

Anonymous 06/19/24(Wed)15:40:19 No.101056049

>>101056009
>hurr durr your post doesn't make sense because i said so!

Anonymous
06/19/24(Wed)15:42:01 No.101056076

Anonymous 06/19/24(Wed)15:42:01 No.101056076

>>101055537
Don't worry bros we still have dbrx and OpenChat :)

Anonymous
06/19/24(Wed)15:44:17 No.101056113

Anonymous 06/19/24(Wed)15:44:17 No.101056113

>>101055537
Don't worry bros we still have Petra-13b-Instruct (better than gpt-4-0314)

Anonymous
06/19/24(Wed)15:47:01 No.101056153

Anonymous 06/19/24(Wed)15:47:01 No.101056153

Steve add another provider for Euryale pls novitai keeps going down

Anonymous
06/19/24(Wed)15:47:19 No.101056158

Anonymous 06/19/24(Wed)15:47:19 No.101056158

>>101055537
>AAAAAHHHH NOOOO this one tiny project that released a single 8k sample dataset is RUINING llms!!! AHHH they want to gather preference data (with no specific safety or censorship focus) from a wider range of data AHHHH ITS OVER

Anonymous
06/19/24(Wed)15:47:35 No.101056162

Anonymous 06/19/24(Wed)15:47:35 No.101056162

>>101055537
>mistral so irrelevant they aren't even on there

Anonymous
06/19/24(Wed)15:48:19 No.101056169

Anonymous 06/19/24(Wed)15:48:19 No.101056169

>>101056011
What doesn't make sense is when you hear someone say "I noticed most models screw up a particular question, but S models get it right more often. Yeah, maybe S-Anon is onto something" and you immediately fill your pants with turds and start flinging them around, "BAZINGA! You didn't systematically download the full size base models, quant them yourself, test each possible variation under laboratory conditions, and deliver perfect science! That makes you retarded!"

No, it makes me limited in my testing capacity. I leave further exploration to the more intrepid and capable.

Which apparently isn't you because you're busy bitching that you weren't handed a complete and final answer for free in less than a day after S-Anon mentioned there might be something to investigate about model quants instead of making your own tests and challenging your own local models.

Anonymous
06/19/24(Wed)15:48:22 No.101056170

Anonymous 06/19/24(Wed)15:48:22 No.101056170

>>101056158
shalom rabbi

Anonymous
06/19/24(Wed)15:48:27 No.101056171

Anonymous 06/19/24(Wed)15:48:27 No.101056171

>>101056162
Microsoft azure is already on there no need to list it twice

Anonymous
06/19/24(Wed)15:49:56 No.101056204

Anonymous 06/19/24(Wed)15:49:56 No.101056204

>>101056170
sayonara retard

Anonymous
06/19/24(Wed)15:50:03 No.101056208

Anonymous 06/19/24(Wed)15:50:03 No.101056208

>>101056171
kekaroo

Anonymous
06/19/24(Wed)15:51:38 No.101056228

Anonymous 06/19/24(Wed)15:51:38 No.101056228

>>101055643
People are skeptical of perplexity but all the quant graphs I have seen use it. Would love to see a KL divergence graph for different quants of the same model.

Anonymous
06/19/24(Wed)15:54:34 No.101056274

Anonymous 06/19/24(Wed)15:54:34 No.101056274

File: KL-divergence_quants.png (111 KB, 1771x944)

111 KB PNG

>>101056228
Have at you scoundrel!

Anonymous
06/19/24(Wed)15:55:01 No.101056277

Anonymous 06/19/24(Wed)15:55:01 No.101056277

So, Chameleon any good? Is it more heavily censored than llama 3 is? I know it can't output images currently, but can it at lead understand what its looking at on input pretty good? I'd just like an honest opinion of how it functions as is, and skip the wall of text about jews/trans/conservatives/miku/whatever

Anonymous
06/19/24(Wed)15:57:18 No.101056318

Anonymous 06/19/24(Wed)15:57:18 No.101056318

File: file.jpg (38 KB, 450x337)

38 KB JPG

>>101055537
>prism

Anonymous
06/19/24(Wed)15:57:46 No.101056330

Anonymous 06/19/24(Wed)15:57:46 No.101056330

>>101056274
Damn that was fast, thank you anon.

Anonymous
06/19/24(Wed)16:00:02 No.101056372

Anonymous 06/19/24(Wed)16:00:02 No.101056372

>>101056274
>6bpw is totally almost lossless people claim
>it's like an inch above the 0 line
wow

Anonymous
06/19/24(Wed)16:01:01 No.101056393

Anonymous 06/19/24(Wed)16:01:01 No.101056393

File: new_i_quants.png (10 KB, 792x612)

10 KB PNG

>>101056330
I make a point of saving these when I see them exactly so that I can share with people.

>>101056372
Kind of nuts isn't it?

Anonymous
06/19/24(Wed)16:01:10 No.101056397

Anonymous 06/19/24(Wed)16:01:10 No.101056397

File: 00042-4080471795.png (1.28 MB, 1024x1024)

1.28 MB PNG

>>101055777
I have been using an 8.0 bpw exl quant (rpcal lol)
No problems other than very occasional repetition that can be solved with a re-roll. I do not use rep penalty, because the brain damage is not worth it IME.
Has anyone tried pushing this model past 32k ctx for RP?

Anonymous
06/19/24(Wed)16:02:51 No.101056424

Anonymous 06/19/24(Wed)16:02:51 No.101056424

Ancient laptop anon here. I tried the new Llama3 8B models and the results are a bit underwhelming (usecase RP/ERP). In fact, I found 7B undislop models to perform better? Maybe I'm doing something wrong. The 8Bs seemed rather inconsistent and uncreative. The models I tried are Soliloquy-8B and Sunfall Abliterated-8B. Instruct: Llama3, Samplers: smoothing 0.2-0.3, temp 1, minP 0.1, repPen 1.1. I have also tried Best Guess and Universal-Creative, but the results are the same. What am I doing wrong? Or are the 8B finetunes just not mature enough yet? To clarify, I'm trying to RP with a robot and these models completely ignore that. Probably need some tard wrangling advice...

Anonymous
06/19/24(Wed)16:03:02 No.101056431

Anonymous 06/19/24(Wed)16:03:02 No.101056431

>>101056169
It's not that you didn't publish a paper showing a thorough comparison between all the models and quants. It's that the models you tested have little to nothing to do with each other. The tess vs qwen test kinda makes sense. Two tess failed, one qwen got it. THAT is a data point. Tess finetune affected the model adversely for your test. Good. That's a starting point. As for the rest, the best we can say is 'sometimes _S gets it, but i haven't tested the others'.
You still haven't said anything about the outputs being deterministic or, if not, how many times you ran the tests with each model.
And I didn't call you a retard. Chill.

Anonymous
06/19/24(Wed)16:05:55 No.101056487

Anonymous 06/19/24(Wed)16:05:55 No.101056487

>>101056424
Try L3 8B Stheno 3.2 (or whatever the latest version was)

Anonymous
06/19/24(Wed)16:06:44 No.101056504

Anonymous 06/19/24(Wed)16:06:44 No.101056504

>>101056424
Try Stheno 3.2. It's generally the best fine tune for llama 3 8b I've found so far.
"better" is subjective as fuck in this context, of course, so your millage may vary.
Also, iterative-DPO can work well if you are not trying to do anything that requires consistent smarts, from my experience at least.
I'd drop smoothing curve and try a little lower temp.

Anonymous
06/19/24(Wed)16:07:39 No.101056523

Anonymous 06/19/24(Wed)16:07:39 No.101056523

>>101056158
this, unironically

Anonymous
06/19/24(Wed)16:07:45 No.101056525

Anonymous 06/19/24(Wed)16:07:45 No.101056525

>>101056487
Thanks. Do you mind posting appropriate instruct/samplers?

Anonymous
06/19/24(Wed)16:08:10 No.101056530

Anonymous 06/19/24(Wed)16:08:10 No.101056530

>>101056397
At 4.65bpw it was very repetitive, an overall it felt even more stupid than Euryale.

Anonymous
06/19/24(Wed)16:08:14 No.101056532

Anonymous 06/19/24(Wed)16:08:14 No.101056532

I swapped my Mikubox to all P100 16GB PCIe internally, leaving the external 3090s. Despite having to add a thermocouple and PWM channel to my fan controller, and also make a custom power cable for the P100, everything worked
:~$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-e2f8cd06-2c7d-accc-728b-62eef1627809)
GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-7da63f72-d5a2-dadb-247a-3880060c84b6)
GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-40205c56-3989-a682-17b2-c2ea90f70e5e)
GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-6537af5d-1095-8402-6c50-d8d9d5afa9b5)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-34724105-36dd-23ca-3a77-083008f640ec)
Now, last I checked (last week) exllamav2 had a bug with flash_attention and GPUs older than Ampere, so that might be a blocker still.

Anonymous
06/19/24(Wed)16:11:33 No.101056592

Anonymous 06/19/24(Wed)16:11:33 No.101056592

>>101056525
It's all mentioned here
https://huggingface.co/Lewdiculous/L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix

Anonymous
06/19/24(Wed)16:12:26 No.101056611

Anonymous 06/19/24(Wed)16:12:26 No.101056611

>>101056372
0 does not exist on logarithmic scale

Anonymous
06/19/24(Wed)16:13:47 No.101056632

Anonymous 06/19/24(Wed)16:13:47 No.101056632

>>101056532
Temps look good:
------------
NTC 1 temp: 32.75
NTC 2 temp: 32.53
NTC 3 temp: 33.24
PIN 1
PWM %: 30
PWM value: 716
------------
PIN 2
PWM %: 30
PWM value: 716
------------
PIN 3
PWM %: 37
PWM value: 644
------------
The die temps are higher, of course, as I'm reading off the heatsink at the exit, so my code ramps up the fans at a much lower temp than the die temp. It's really just to keep the fans extra quiet at idle, not that they are really loud at 100%.

Anonymous
06/19/24(Wed)16:13:51 No.101056635

Anonymous 06/19/24(Wed)16:13:51 No.101056635

>>101056611
ln(1) = 0 ?

Anonymous
06/19/24(Wed)16:14:18 No.101056642

Anonymous 06/19/24(Wed)16:14:18 No.101056642

>>101056504
Thanks, will try.
>better
As I mentioned, I'm mostly aiming for character adherence and good quality prose/creativity (not "whispered in a hushed whisper"). But I know I shouldn't expect much from small models.
>consistent smarts
I'm doing casual RP, not some strict format, so occasional retardation is absolutely fine. But when 90% of responses are shit it becomes quite unbearable - hence the search for best models in this range.
>drop smoothing curve
So something like 0.2 smoothing and 0.75 temp?

Anonymous
06/19/24(Wed)16:16:04 No.101056674

Anonymous 06/19/24(Wed)16:16:04 No.101056674

>>101056642
As in, don't use smoothing curve, just go raw temp and minP., maybe a tad of rep pen, although I'd remove that when first testing the model also.

Anonymous
06/19/24(Wed)16:16:05 No.101056675

Anonymous 06/19/24(Wed)16:16:05 No.101056675

>>101056372
it's a logarithmic scale retard

Anonymous
06/19/24(Wed)16:17:33 No.101056709

Anonymous 06/19/24(Wed)16:17:33 No.101056709

>>101056532
>>101056632
That's pretty dope.
What are you using that for?
Just RP, agents, fine tuning, loaning compute?

Anonymous
06/19/24(Wed)16:18:16 No.101056720

Anonymous 06/19/24(Wed)16:18:16 No.101056720

Why did people stop training on top of the base models?

Anonymous
06/19/24(Wed)16:21:00 No.101056756

Anonymous 06/19/24(Wed)16:21:00 No.101056756

>>101056675
>>101056611
>line clearly descends
>NOOO IT'S NOT MEANT TO GO TO 0
math is a joke

Anonymous
06/19/24(Wed)16:21:48 No.101056772

Anonymous 06/19/24(Wed)16:21:48 No.101056772

>>101056720
Expensive in compute and easier to fuck up than lora. But it doesn't matter all that much. Garbage in, garbage out. Most people that take up the mantle often use datasets so garbage it hurts to think about.

Anonymous
06/19/24(Wed)16:22:19 No.101056782

Anonymous 06/19/24(Wed)16:22:19 No.101056782

>>101056720
It's not like their shitty loras will turn out good anyway. If they really cared, they'd make a full finetune.

Anonymous
06/19/24(Wed)16:24:13 No.101056818

Anonymous 06/19/24(Wed)16:24:13 No.101056818

>>101056720
no?

Anonymous
06/19/24(Wed)16:25:10 No.101056839

Anonymous 06/19/24(Wed)16:25:10 No.101056839

File: file.png (116 KB, 1140x698)

116 KB PNG

which one of you fucks did this

Anonymous
06/19/24(Wed)16:27:00 No.101056877

Anonymous 06/19/24(Wed)16:27:00 No.101056877

>>101056839
>when mikufag takes too much hrt meds

Anonymous
06/19/24(Wed)16:27:13 No.101056879

Anonymous 06/19/24(Wed)16:27:13 No.101056879

>>101056431
>It's that the models you tested have little to nothing to do with each other.
Which makes sense since I've been trying to find a model or models that serve my interests. So when one model doesn't, naturally I try a different lineage sooner than I download a half dozen related models at 2 minutes per GB, spending the time finding other stuff to delete to make room.

Settings are, or are close to, Kobold defaults, and at 45 minutes for a single try in some cases, I'm testing it like I would be using it: One shot and either it's right or I get misled.

There are plenty of people with powerful rigs who can do the science in seconds and actually know what's happening inside of the models and software. I'll leave it to the experts. I just want to be able to get >1t/s and get reasonable answers to my questions. And I've gone from <1 to 5 candidates that at least got music theory right.

(I haven't figured out how I will test coding, but one question I asked it while coding last week might work. It came up because the model was wrong, when I told it it was wrong it wrote a kluge that almost worked and did after I fixed one line. So maybe recreating that scenario if I remember the details will serve as a test.)

Anonymous
06/19/24(Wed)16:27:17 No.101056883

Anonymous 06/19/24(Wed)16:27:17 No.101056883

I think that eventually Synthetic datasets will be the way to go. Too much time and manpower is used in the creation of organic datsets, which makes its only really feasible with a large financial backing. If Synthetic datasets can be used and refined to the point where they are on par or better then their organic counterparts then it will vastly speed up the creation of Datasets as well as their quality.

Anonymous
06/19/24(Wed)16:28:10 No.101056901

Anonymous 06/19/24(Wed)16:28:10 No.101056901

>>101056839
that is a woman and no chud will say otherwise

Anonymous
06/19/24(Wed)16:29:07 No.101056920

Anonymous 06/19/24(Wed)16:29:07 No.101056920

>>101056839
him
>>101047603

Anonymous
06/19/24(Wed)16:29:35 No.101056929

Anonymous 06/19/24(Wed)16:29:35 No.101056929

File: 1700588146330630.jpg (157 KB, 596x699)

157 KB JPG

>>101056839
I wouldn't be surprised if it was the Miku BBC spammer

Anonymous
06/19/24(Wed)16:30:28 No.101056943

Anonymous 06/19/24(Wed)16:30:28 No.101056943

>>101056839
b-b-b-based

Anonymous
06/19/24(Wed)16:32:00 No.101056962

Anonymous 06/19/24(Wed)16:32:00 No.101056962

File: basedrecs.jpg (48 KB, 430x474)

48 KB JPG

>envoid in my recommendations alongside migu and tetters
Based, the youtube algorithm is finally delivering

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)16:32:14 No.101056965

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)16:32:14 No.101056965

>>101049838
Can someone with a recent but shitty NVIDIA GPU please benchmark this PR vs master?
https://github.com/ggerganov/llama.cpp/pull/8018
(Both with LLAMA_CUDA_FORCE_MMQ.)

Anonymous
06/19/24(Wed)16:33:04 No.101056977

Anonymous 06/19/24(Wed)16:33:04 No.101056977

>>101056965
how shitty are we talking about?

Anonymous
06/19/24(Wed)16:34:47 No.101057004

Anonymous 06/19/24(Wed)16:34:47 No.101057004

>>101056965
i haz rtx 3060 how do i install this pr

Anonymous
06/19/24(Wed)16:36:33 No.101057028

Anonymous 06/19/24(Wed)16:36:33 No.101057028

File: 1695283474325669.png (42 KB, 376x499)

42 KB PNG

>>101056965
will it do?

Anonymous
06/19/24(Wed)16:36:50 No.101057031

Anonymous 06/19/24(Wed)16:36:50 No.101057031

File: 1664407945758958.jpg (32 KB, 480x601)

32 KB JPG

>go back home
>training script is kill
>shiet
>hdd full, is all the 9001 training checkpoints
>delete all keep the last
>resume the training
>fail
>mfw the last checkpoint is corrupted cuz duh no space

Anonymous
06/19/24(Wed)16:38:15 No.101057055

Anonymous 06/19/24(Wed)16:38:15 No.101057055

File: 00024-1397236490.png (327 KB, 512x512)

327 KB PNG

>>101057031
Why would you save so many checkpoints?

Anonymous
06/19/24(Wed)16:39:35 No.101057067

Anonymous 06/19/24(Wed)16:39:35 No.101057067

are RP focused models just as good at narrative/storytelling or do i have to look for dedicated ones?

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)16:40:03 No.101057076

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)16:40:03 No.101057076

>>101056977
Something like a 3060 or 4060.

>>101057004
git checkout master, compile, run llama-bench, git remote add my fork, git fetch, git checkout johannesgaessler/cuda-mmq-stream-k-2, compile, run llama-bench.

>>101057028
No sorry, I want data for Turing or newer specifically.

Anonymous
06/19/24(Wed)16:40:08 No.101057079

Anonymous 06/19/24(Wed)16:40:08 No.101057079

are there any ERP finetunes of command-r? or good finetunes of it in general?

Anonymous
06/19/24(Wed)16:40:28 No.101057084

Anonymous 06/19/24(Wed)16:40:28 No.101057084

>>101056965
Is a 1050ti too shit for this?

Anonymous
06/19/24(Wed)16:41:22 No.101057098

Anonymous 06/19/24(Wed)16:41:22 No.101057098

File: Oof size.jpg (91 KB, 880x480)

91 KB JPG

>>101057031

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)16:41:26 No.101057099

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)16:41:26 No.101057099

>>101057084
It's too old.

Anonymous
06/19/24(Wed)16:41:54 No.101057109

Anonymous 06/19/24(Wed)16:41:54 No.101057109

>>101057031
>leaving your GPU running full blast while you're not home
You guys are crazy. I never do this, way too paranoid my house will burn down. Especially if you have multiple GPUs it's like leaving a space heater running.

Anonymous
06/19/24(Wed)16:43:01 No.101057123

Anonymous 06/19/24(Wed)16:43:01 No.101057123

>>101057079
>are there any ERP finetunes of command-r?
yes, it bad
https://huggingface.co/TheDrummer/Coomand-R-35B-v1
>or good finetunes of it in general?
no

Anonymous
06/19/24(Wed)16:43:37 No.101057137

Anonymous 06/19/24(Wed)16:43:37 No.101057137

>>101057109
M-maybe he's not using deepspeed.

Anonymous
06/19/24(Wed)16:43:43 No.101057138

Anonymous 06/19/24(Wed)16:43:43 No.101057138

>>101057055
i thought was a good idea in case of crash and for some random test

Anonymous
06/19/24(Wed)16:45:18 No.101057168

Anonymous 06/19/24(Wed)16:45:18 No.101057168

what's the best coomer model runnable on 24gigs vram?

Anonymous
06/19/24(Wed)16:46:07 No.101057178

Anonymous 06/19/24(Wed)16:46:07 No.101057178

>>101057031
kek, you might be able to recover something with some disc recovery software

Anonymous
06/19/24(Wed)16:46:15 No.101057180

Anonymous 06/19/24(Wed)16:46:15 No.101057180

>>101057109
I only put my tinder box in my tower because there's nowhere else to put it, don't judge me.

Anonymous
06/19/24(Wed)16:47:45 No.101057205

Anonymous 06/19/24(Wed)16:47:45 No.101057205

>>101056709
Ah just playing with larger models really.

Anonymous
06/19/24(Wed)16:55:56 No.101057312

Anonymous 06/19/24(Wed)16:55:56 No.101057312

>>101057076
>stuck with 2 3090 Ti
I'm so sorry.

Anonymous
06/19/24(Wed)16:56:29 No.101057321

Anonymous 06/19/24(Wed)16:56:29 No.101057321

>>101057076
I'll give you results in few minutes from my 3060. Compiling kernels takes quite a while on my 5600.

Anonymous
06/19/24(Wed)16:58:57 No.101057357

Anonymous 06/19/24(Wed)16:58:57 No.101057357

File: soyblonde.jpg (46 KB, 475x485)

46 KB JPG

>>101057076
>your fork
petrus@petraists:~/TND/justforyouCudaDev/cudaddy/llama.cpp$ LLAMA_CUDA_FORCE_MMQ=1 ./llama-bench -m ../../../models/Stheno-3.2-8b/L3-8B-Stheno-v3.2-Q6_K-imat.gguf -ngl 1000
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | pp512 | 1395.24 ± 7.92 |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | tg128 | 42.91 ± 0.43 |

build: da1db13d (3185)

>greg
petrus@petraists:~/TND/justforyouCudaDev/llama.cpp$ LLAMA_CUDA_FORCE_MMQ=1 ./llama-bench -m ../../models/Stheno-3.2-8b/L3-8B-Stheno-v3.2-Q6_K-imat.gguf -ngl 1000ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | pp512 | 1371.40 ± 7.41 |
| llama 8B Q6_K | 6.14 GiB | 8.03 B | CUDA | 1000 | tg128 | 42.41 ± 0.79 |

build: a7854743 (3185)

>>compiled with `LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 make LLAMA_CUDA_FORCE_MMQ=1 -j12`
>>gpu: rtx 3060 12gb

Anonymous
06/19/24(Wed)17:03:49 No.101057434

Anonymous 06/19/24(Wed)17:03:49 No.101057434

Exllamav2 seems to have fixed the floating point error with my mixed CU setup, as well as making sure flash_attention is off when the GPU is older than Ampere.
LLaMA3 8B runs nicely on a single P100. Of course, no instant replies like with a 3090, but not bad. I'll stress-test it later this week with CR+, since that'll use all five GPUs.

Anonymous
06/19/24(Wed)17:04:32 No.101057445

Anonymous 06/19/24(Wed)17:04:32 No.101057445

>>101057168
If that's enough to run a quant of a 34B, then you could try MarinaraSpaghetti/RP-Stew-v2.5-34B. For lower than 34B, try
bluuwhale/L3-SthenoMaidBlackroot-8B-V1

Anonymous
06/19/24(Wed)17:06:56 No.101057478

Anonymous 06/19/24(Wed)17:06:56 No.101057478

So did anyone confirm whether or not autocoder is actually better than codestral?

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)17:08:16 No.101057502

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)17:08:16 No.101057502

>>101057357
Thanks.

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)17:08:58 No.101057511

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)17:08:58 No.101057511

>>101057321
You can add -j 12 to the make/cmake command to compile with 12 threads instead of 1.

Anonymous
06/19/24(Wed)17:13:25 No.101057577

Anonymous 06/19/24(Wed)17:13:25 No.101057577

File: file.png (92 KB, 928x739)

92 KB PNG

>>101057511
here

Anonymous
06/19/24(Wed)17:14:36 No.101057594

Anonymous 06/19/24(Wed)17:14:36 No.101057594

>>101056929
>obsessed

Anonymous
06/19/24(Wed)17:18:43 No.101057665

Anonymous 06/19/24(Wed)17:18:43 No.101057665

>>101057138
Well at least it wasn't for nothing, you have entertained the masses with your poor decisions.

Anonymous
06/19/24(Wed)17:19:16 No.101057675

Anonymous 06/19/24(Wed)17:19:16 No.101057675

agi is impossible atm its just a pipe dream. agi doesn't need a prompt.

Anonymous
06/19/24(Wed)17:22:15 No.101057715

Anonymous 06/19/24(Wed)17:22:15 No.101057715

>>101057675
we're just trying to go for cat-level now get with the program

Anonymous
06/19/24(Wed)17:22:21 No.101057718

Anonymous 06/19/24(Wed)17:22:21 No.101057718

>>101056204
>>101056170
being antisemitic is truly the ultimate litmus test, if you are that blind to defend jews despite the information at hand, you truly deserve to be goyim for slaughter

Anonymous
06/19/24(Wed)17:23:11 No.101057729

Anonymous 06/19/24(Wed)17:23:11 No.101057729

>>101057675
I don't even want AGI, I prefer just having a useful bot that does whatever the fuck I tell it to do.

Anonymous
06/19/24(Wed)17:24:14 No.101057743

Anonymous 06/19/24(Wed)17:24:14 No.101057743

>>101057715
I'd fuck with cat level.

Anonymous
06/19/24(Wed)17:24:16 No.101057744

Anonymous 06/19/24(Wed)17:24:16 No.101057744

>>101057718
it just /g/'s contrarianism at display

Anonymous
06/19/24(Wed)17:24:55 No.101057758

Anonymous 06/19/24(Wed)17:24:55 No.101057758

>>101057079
>are there any ERP finetunes of command-r?
The base model is already horny.

Anonymous
06/19/24(Wed)17:25:33 No.101057772

Anonymous 06/19/24(Wed)17:25:33 No.101057772

>>101057743
>a cat is fine too

Anonymous
06/19/24(Wed)17:25:47 No.101057774

Anonymous 06/19/24(Wed)17:25:47 No.101057774

>>101057168
>>101057079
>>101057067
average helpless illiterate cumbrained brown zoomer moment

Anonymous
06/19/24(Wed)17:26:29 No.101057781

Anonymous 06/19/24(Wed)17:26:29 No.101057781

>>101057594
?

Anonymous
06/19/24(Wed)17:27:16 No.101057789

Anonymous 06/19/24(Wed)17:27:16 No.101057789

>>101057774
nah it just OP or one of his lapdogs bumping the thread, he always makes stupid questions itt

Anonymous
06/19/24(Wed)17:30:26 No.101057842

Anonymous 06/19/24(Wed)17:30:26 No.101057842

>>101057031
>hdd full
>delete all keep the last
Where's that meme for "You know where this is going because you've been there in a previous lifetime"?

Schools have got to start teaching the importance of keeping two levels of backups whenever digital storage is involved.

Anonymous
06/19/24(Wed)17:34:21 No.101057907

Anonymous 06/19/24(Wed)17:34:21 No.101057907

>>101057357
>petrafag is a third world gpupoor
And the world is round.

Anonymous
06/19/24(Wed)17:34:37 No.101057911

Anonymous 06/19/24(Wed)17:34:37 No.101057911

>>101057789
Take your meds anon

Anonymous
06/19/24(Wed)17:37:41 No.101057961

Anonymous 06/19/24(Wed)17:37:41 No.101057961

>>101057907
are you jealous cuda dev replied to me

Anonymous
06/19/24(Wed)17:41:10 No.101058022

Anonymous 06/19/24(Wed)17:41:10 No.101058022

>>101057842
If it doesn't exist in 3 places, it doesn't exist.

llama.cpp CUDA dev !YOmst7Ghe6
06/19/24(Wed)17:44:23 No.101058062

llama.cpp CUDA dev !YOmst7Ghe6 06/19/24(Wed)17:44:23 No.101058062

>>101057577
Thanks.
Looks like checking for compute capability is enough to determine whether or not the stream-k decomposition should be used.

Anonymous
06/19/24(Wed)17:44:28 No.101058065

Anonymous 06/19/24(Wed)17:44:28 No.101058065

WizardLM-2-8x22B-Beige.i1-Q4_K_S 12288 context, Vicuna format (or Mistral, looking at the merge ingredient)
https://respectively-share-whats-plaza.trycloudflare.com/
Hosting for up to 8 hours.
Can put link in ST > Text Completion > KoboldCpp

Anonymous
06/19/24(Wed)17:44:48 No.101058074

Anonymous 06/19/24(Wed)17:44:48 No.101058074

exactly.

Anonymous
06/19/24(Wed)17:45:02 No.101058077

Anonymous 06/19/24(Wed)17:45:02 No.101058077

File: 1707726926019429.png (31 KB, 317x277)

31 KB PNG

>>101057961
nta but yeah a little :(

Anonymous
06/19/24(Wed)17:45:57 No.101058090

Anonymous 06/19/24(Wed)17:45:57 No.101058090

File: hat.png (23 KB, 402x299)

23 KB PNG

comin out of my pocket money

Anonymous
06/19/24(Wed)17:46:19 No.101058095

Anonymous 06/19/24(Wed)17:46:19 No.101058095

>>101056839
>>101056877
>>101056901
>>101056920
I'm really sorry sirs, but I really had to do the needful. Please to kindly resolve the issue, thank you sirs.

>>101056929
No, I'm not into cuckshit or troonshit.

Anonymous
06/19/24(Wed)17:46:49 No.101058099

Anonymous 06/19/24(Wed)17:46:49 No.101058099

>>101058065
>not cc-by-nc-sa-4.0/faipl-1.0
ngmi

Anonymous
06/19/24(Wed)17:47:58 No.101058118

Anonymous 06/19/24(Wed)17:47:58 No.101058118

>>101058065
>Beige
What is this supposed to be?

Anonymous
06/19/24(Wed)17:48:13 No.101058124

Anonymous 06/19/24(Wed)17:48:13 No.101058124

cuda dev (you)'d me once. Felt pretty good ngl.

Anonymous
06/19/24(Wed)17:48:36 No.101058131

Anonymous 06/19/24(Wed)17:48:36 No.101058131

>>101058095
sorry, sirs are busy gooning to shartsune japslop

Anonymous
06/19/24(Wed)17:49:36 No.101058138

Anonymous 06/19/24(Wed)17:49:36 No.101058138

>almost 2 weeks into summer break
>already bored like shit
give ideas anons

Anonymous
06/19/24(Wed)17:53:07 No.101058183

Anonymous 06/19/24(Wed)17:53:07 No.101058183

>>101057718
And what if I consciously support the Jews?

Anonymous
06/19/24(Wed)17:53:24 No.101058187

Anonymous 06/19/24(Wed)17:53:24 No.101058187

>>101057842
>Schools have got to start teaching the importance of keeping two levels of backups whenever digital storage is involved.

they call it the cloud.

Anonymous
06/19/24(Wed)17:53:26 No.101058188

Anonymous 06/19/24(Wed)17:53:26 No.101058188

File: miku daylight waking up s(...).png (1.06 MB, 1200x800)

1.06 MB PNG

>>101058138
How about a relaxing, comfy nap?

Anonymous
06/19/24(Wed)17:54:29 No.101058207

Anonymous 06/19/24(Wed)17:54:29 No.101058207

>>101056424
Any other good 7B/8B models? Currently got the bandwidth to download, so trying to hoard as much as I can

Anonymous
06/19/24(Wed)18:06:03 No.101058360

Anonymous 06/19/24(Wed)18:06:03 No.101058360

is there something like comfy ui for llms?

Anonymous
06/19/24(Wed)18:07:19 No.101058382

Anonymous 06/19/24(Wed)18:07:19 No.101058382

>>101058360
Ooba?

Anonymous
06/19/24(Wed)18:07:44 No.101058390

Anonymous 06/19/24(Wed)18:07:44 No.101058390

>>101058366
>>101058366
>>101058366

Anonymous
06/19/24(Wed)18:08:02 No.101058394

Anonymous 06/19/24(Wed)18:08:02 No.101058394

>>101058360
ollama is the most intuitive one

Anonymous
06/19/24(Wed)18:13:21 No.101058460

Anonymous 06/19/24(Wed)18:13:21 No.101058460

>>101058360
Yeah, ComfyUI with a custom node.
>>101058394
ComfyUI is not intuitive, shill.

Anonymous
06/19/24(Wed)18:13:53 No.101058468

Anonymous 06/19/24(Wed)18:13:53 No.101058468

>>101058360
I'm liking Kobold.

Ollama is barebones and good enough for Babby's First Q&A. But it has a lot of problems: save state is broken by some common character sequences, their method of obfuscating model component files is lulzy and cumbersome, just typing into the terminal window fucks up on line wrap though maybe that depends on system.

After about a week you'll be ready to learn the technical details and to move on to Kobold or Ooba. (I didn't like Ooba but maybe it's better, that was a long time ago.)

Anonymous
06/19/24(Wed)18:16:56 No.101058506

Anonymous 06/19/24(Wed)18:16:56 No.101058506

>>101052148
I think you will feel more comfortable in the Kobold Discord.

Anonymous
06/19/24(Wed)19:13:40 No.101059249

Anonymous 06/19/24(Wed)19:13:40 No.101059249

>>101053147
Nothing supports it yet so no one knows.

Anonymous
06/19/24(Wed)19:59:11 No.101059787

Anonymous 06/19/24(Wed)19:59:11 No.101059787

>>101053236
According to Meta Paper it was trained on 5x as many tokens as L2.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.