/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/08/24(Mon)17:48:39 No.101328074

File: 00058-4212340002-1girl, h(...).jpg (136 KB, 400x640)

136 KB JPG

/lmg/ - Local Models General Anonymous 07/08/24(Mon)17:48:39 No.101328074 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101318970 & >>101312606

►News
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/08/24(Mon)17:49:11 No.101328076

Anonymous 07/08/24(Mon)17:49:11 No.101328076

File: f857987242042605b175a6e35(...).jpg (195 KB, 450x325)

195 KB JPG

►Recent Highlights from the Previous Thread: >>101318970

--Paper: An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes: >>101325312 >>101325391 >>101325425
--Papers: >>101322212 >>101326426
--LLaMA 3 7B Token Rate Estimation on RTX 4080 SUPER: >>101320839 >>101320873 >>101321005
--Underwhelming Results Fine-Tuning Gemma 2 27B for JP>EN Translation: >>101324530 >>101324611
--To CPUMaxx or Not to CPUMaxx: That Is the Question: >>101323847 >>101324049
--Running a Local LLM in a Docker Container: Requirements for GPU Access: >>101319816 >>101319889 >>101319896 >>101319940 >>101319960 >>101320120
--Removing Verbal Tics from Generated Text: A Challenge: >>101320342 >>101320348
--Gemma 27B sliding window attention issue still not fixed in llama.cpp: >>101325180 >>101325261 >>101325404 >>101325892 >>101325931 >>101325962 >>101326019
--Are Roleplay Prompts Essentially Jailbreaks?: >>101322261 >>101322282 >>101322935 >>101322966 >>101323002 >>101323104
--Llama : fix pre-tokenization of non-special added tokens: >>101319600 >>101320095
--SPPO + SPPO + ORPO Triple Combo for AI Model Training: >>101320006
--Command-R+ Excels at Storytelling and Dialogue in RP, Suggesting Parameter Count is Key: >>101326383 >>101326433 >>101326535
--Orthogonized (Uncensored) Gemma 27B by EdgerunnersLab: >>101319610 >>101321348 >>101321637 >>101323577 >>101321752 >>101321805 >>101321943 >>101321665
--Deepl's imtransbtw: Two-Pass Translation with Contextual Sense: >>101319936
--Multimodal AI: The Future Beyond Text-Based LLMs?: >>101320164 >>101320179 >>101320193 >>101320227
--Is 8k Context Window Enough? Depends on What You're Doing, ERP? RPG?: >>101324593 >>101324635 >>101324953 >>101325299 >>101325421 >>101325440
--Addressing the Issue of Character Defiance in Storytelling Systems: >>101320425 >>101320670
--Miku (free space): >>101326263 >>101326760 >>101326941 >>101327954

►Recent Highlight Posts from the Previous Thread: >>101318976

Anonymous
07/08/24(Mon)17:57:18 No.101328153

Anonymous 07/08/24(Mon)17:57:18 No.101328153

>>101328076
>--Gemma 27B sliding window attention issue still not fixed in llama.cpp
Retarded bot.

Anonymous
07/08/24(Mon)17:58:19 No.101328159

Anonymous 07/08/24(Mon)17:58:19 No.101328159

10 days

Anonymous
07/08/24(Mon)18:02:17 No.101328199

Anonymous 07/08/24(Mon)18:02:17 No.101328199

>>101320179
Llava-next looks good but there’s no llama.cpp support right now.

Anonymous
07/08/24(Mon)18:02:25 No.101328202

Anonymous 07/08/24(Mon)18:02:25 No.101328202

Softcapping support merged in FlashAttention
https://github.com/Dao-AILab/flash-attention/pull/1025

That should allow lower memory usage with Gemma 2, eventually.

Anonymous
07/08/24(Mon)18:03:21 No.101328215

Anonymous 07/08/24(Mon)18:03:21 No.101328215

>>101328202
Fuck yeah.

Anonymous
07/08/24(Mon)18:04:19 No.101328220

Anonymous 07/08/24(Mon)18:04:19 No.101328220

Recently made an extremely beefy computer build. Need to see if I can finally do local stuff good........

Anonymous
07/08/24(Mon)18:05:57 No.101328237

Anonymous 07/08/24(Mon)18:05:57 No.101328237

>>101328220
>extremely beefy computer build
does it have multiple gpus with lots of vram

Anonymous
07/08/24(Mon)18:09:12 No.101328259

Anonymous 07/08/24(Mon)18:09:12 No.101328259

>>101328237
>2x gt210 and 2x 8gb ram sticks

Anonymous
07/08/24(Mon)18:10:58 No.101328274

Anonymous 07/08/24(Mon)18:10:58 No.101328274

>>101328237
4090 so 24gb vam, and 96gb of ram
also 7950x3d

Anonymous
07/08/24(Mon)18:14:42 No.101328300

Anonymous 07/08/24(Mon)18:14:42 No.101328300

>>101328202
it will not make it any better or less censored, nothingburger.

Anonymous
07/08/24(Mon)18:22:39 No.101328371

Anonymous 07/08/24(Mon)18:22:39 No.101328371

File: propsl.png (274 KB, 1080x1920)

274 KB PNG

>pic related
Now of course there’s a fuck load of issues with this idea
>first
Step 4 implies that you could even train a lora fast enough from start to finish within the amount of time it takes you to reach the context limit while using sillytavern, let alone the amount of time it takes to augment the data even if you made some shitty python program that could magically do it for you, and if there even was one, you would likely need to run ANOTHER model to even augment the data automatically in the background while using silly tavern which considering the fact you’re running a local model on top of constantly running lora training in the background, you’re fucking raping your GPU assuming anyone even has enough VRAM for this shit
>second
I doubt you can even swap out Loras mid chat, idk about kobold since i’ve never tried, but text gen webui takes a good amount of time to load a model, I doubt it would be different for a lora and i don’t see why other programs would magically just not have a loading time, so that’s another roadblock in the seamlessness of this idea
>third
Even in a magical world where any of us had the VRAM for this monstrosity, the amount of little python or batch files needed to be made in order to automate this process is fucking annoying
>why bother augmenting the data
Wish i had the end result of this dude’s stuff but basically this guy tried a similar thing by training a single lora on the raw unaugmented chat logs and in the end it got hyper schizo, and augmentation is the number 1 way to reduce schizophrenia outcomes on a small dataset
https://desuarchive.org/g/thread/95930009/#95933766
Wish i had proof because the end result was pretty funny after it spammed about fallout new vegas (the guy said his favorite game was fallout 3)
Like i said, wish i had proof of how it turned out but i don’t have enough keywords to find it on desuarchive

Part 1/2

Anonymous
07/08/24(Mon)18:23:40 No.101328384

Anonymous 07/08/24(Mon)18:23:40 No.101328384

>>101328371
>why not just use summarization?
Summarization has a limit on exact information, it quite literally has to be a summary, details have to be thrown out, and even then it will eat up context as time goes on
In a hypothetical scenario where pic related could even function, summarization would be a great way to buy time for lora training so that the chat logs aren’t only being influenced by recent messages and therefore also making the chat logs higher quality for training the next lora rotation.

yeah this plan is shit, but gun to your head, how would you make a plan for infinite long term memory of a local language model?

>small dataset unaugmented = overfitting
thought of this while writing this out, but couldn't you just prevent overfitting by doing less epochs or steps or something? or would it just simply not retain the information and small, unaugmented datasets are stuck between a rock and a hard place of either being undertrained and therefore pointless or overfit and schizophrenic?

Part 2/2

Anonymous
07/08/24(Mon)18:33:22 No.101328462

Anonymous 07/08/24(Mon)18:33:22 No.101328462

>play with that one dog girl smell bully card
>woo her like a chad, solve her insecurities, and then go balls to the wall ridiculous by taking over her clique, and then the entire school, and making her the queen, then having an orgy and getting everyone pregnant
>then decide to reveal it was all a fever dream
>she's heart-wrenchingly distressed, then realizes that it was a message, to get over herself, change her ways, apologize, and then maybe she can have the love she experienced in her dream
Kino...

Anonymous
07/08/24(Mon)18:41:06 No.101328520

Anonymous 07/08/24(Mon)18:41:06 No.101328520

>>101328384
Loras are weird. They make the model retarded before the loss drops and with a small data set like that you’d have a pretty narrow target to hit between that and overfitting.

Anonymous
07/08/24(Mon)18:42:59 No.101328534

Anonymous 07/08/24(Mon)18:42:59 No.101328534

>>101328520
What I’ve been doing which seems to work ok and result in a bit of persistence between chats is have a summer made with the old summery included.
There tends to be a kind of spirit the bot develops early on that lasts a long time. I’m still tweaking everything but I’ve been really enjoying this.

Anonymous
07/08/24(Mon)18:49:41 No.101328595

Anonymous 07/08/24(Mon)18:49:41 No.101328595

What's the best uncensored local model? I can't seem to find a straight forward list of this shit.

Anonymous
07/08/24(Mon)18:50:35 No.101328606

Anonymous 07/08/24(Mon)18:50:35 No.101328606

>>101328595
>the best uncensored local model?
there are none, this is the only thing you need to know about this huge meme.

Anonymous
07/08/24(Mon)18:51:15 No.101328617

Anonymous 07/08/24(Mon)18:51:15 No.101328617

>>101328595
>https://huggingface.co/failspy/Meta-Llama-3-8B-Instruct-abliterated-v3-GGUF
supposedly this but i haven't tested personally

Anonymous
07/08/24(Mon)18:53:09 No.101328636

Anonymous 07/08/24(Mon)18:53:09 No.101328636

>>101328534
expand on this because I'm a fucking retard and barely understood what this meant

Anonymous
07/08/24(Mon)18:53:20 No.101328637

Anonymous 07/08/24(Mon)18:53:20 No.101328637

>>101328595
Euryale in my experience. Gemma is the current FOTM, but I haven't tried it. They're still working out the kinks.

Anonymous
07/08/24(Mon)19:06:21 No.101328767

Anonymous 07/08/24(Mon)19:06:21 No.101328767

File: e4221a.jpg (12 KB, 200x200)

12 KB JPG

>>101328462
>no mention of her cute scottish accent
ngmi

Anonymous
07/08/24(Mon)19:08:37 No.101328792

Anonymous 07/08/24(Mon)19:08:37 No.101328792

i'm kinda getting tired of gemma 27b... it's good for sub 30b, but somehow i'm feeling like i'm constantly being fooled. I'm gonna go back to 70bs for a while

Anonymous
07/08/24(Mon)19:09:47 No.101328798

Anonymous 07/08/24(Mon)19:09:47 No.101328798

>>101328767
True actually. WizardLM2 doesn't do it well. And I can load up CR+ but it's like 0.5 t/s. Idk if CR+ is able to do a Scottish accent justice though.

Anonymous
07/08/24(Mon)19:13:55 No.101328850

Anonymous 07/08/24(Mon)19:13:55 No.101328850

>>101328637
>Euryale in my experience.
Euryale is retarded. Do you have brain damage?

Anonymous
07/08/24(Mon)19:15:32 No.101328864

Anonymous 07/08/24(Mon)19:15:32 No.101328864

>>101328384
>>yeah this plan is shit, but gun to your head, how would you make a plan for infinite long term memory of a local language model?
It's not a particularly novel idea but imo simply expanding the lorebook function in sillytavern could be good enough. Segment chatlogs into events, tag them and then save them as lorebook entry. If you could automate the process you'd get pretty close to a working memory. Though truth be told, I only did some limited testing, no idea how giant lore books would affect the performance.

Anonymous
07/08/24(Mon)19:17:05 No.101328881

Anonymous 07/08/24(Mon)19:17:05 No.101328881

>>101328864
That's what RAG/vector storage/etc tries to accomplish, is it not?

Anonymous
07/08/24(Mon)19:18:01 No.101328892

Anonymous 07/08/24(Mon)19:18:01 No.101328892

Has anyone successfully roped Gemma to 16k context+? It feels great at less than 6k~ context even compared to stuff like Wizard and CR+ but it feels like it degrades substantially after. I'm feeling like the meta will probably be to use Gemma to start the roleplay/story and then switch to CR+ after 6k or 8k context.

Anonymous
07/08/24(Mon)19:19:19 No.101328905

Anonymous 07/08/24(Mon)19:19:19 No.101328905

>>101328792
I went back and realized that the 70bs are worse.

Anonymous
07/08/24(Mon)19:20:06 No.101328912

Anonymous 07/08/24(Mon)19:20:06 No.101328912

What are the best options for structured responses and/or function calling for local LLMs right now? Are there tiny models trained on structured responses that I could potentially offload the task to and sneak it into VRAM at the same time? i.e something like

User Input -> Fancy LLM to understand interactivity context -> Tiny model for structured/function response -> Return both outputs to user

Open to suggestions

Anonymous
07/08/24(Mon)19:22:08 No.101328932

Anonymous 07/08/24(Mon)19:22:08 No.101328932

>>101328864
size for lorebooks only matters as far as how much context you give it, how large entries are and how many it uses at once. for rag i've used files as large as 40mb which vectorize to 350mb and it works pretty good

Anonymous
07/08/24(Mon)19:22:09 No.101328933

Anonymous 07/08/24(Mon)19:22:09 No.101328933

File: asweetlass.jpg (247 KB, 1476x981)

247 KB JPG

>>101328798
Yeah only model I've ever seen do it passably is Opus.
Granted the word variety could use some improvement.

Anonymous
07/08/24(Mon)19:23:45 No.101328939

Anonymous 07/08/24(Mon)19:23:45 No.101328939

>>101328850
All LLMs are retarded, Euryale is the least retarded from direct comparisons of long-context generations.

Anonymous
07/08/24(Mon)19:25:46 No.101328961

Anonymous 07/08/24(Mon)19:25:46 No.101328961

>>101328792
Yeah, I went back to 70b too. I'm probably just not poor enough to enjoy Gemma. I can see why you'd like it if you're stuck running good models from RAM or at a 2.7bpw lobotomy though.

Anonymous
07/08/24(Mon)19:26:48 No.101328975

Anonymous 07/08/24(Mon)19:26:48 No.101328975

>>101328912
>Fancy LLM to understand interactivity context -> Tiny model for structured/function response
Command-R+ can do both. Better yet, look into GBNF. You don't need a model to ensure structured responses when the inferencing engine can do that for you.

Anonymous
07/08/24(Mon)19:41:29 No.101329095

Anonymous 07/08/24(Mon)19:41:29 No.101329095

>>101328975
>GBNF
Thanks for that, it looks like what I am looking for. Assuming the whole model is in VRAM, would the inferencing for structured responses be slower on larger models? Basically, the structured response will depend on the output of the conversational response. If I understand correctly, it looks like I would basically have to make two requests:

1. User Conversational Input -> High VRAM usage fancy roleplay LLM -> Conversational response

2. Conversational Response -> GBNF Constrained model -> Structured response

On step 2, Is it better to just use the existing RP model in VRAM and constrain it on a 2nd request? Or would it still be better to use a much smaller model at the same time and constrain it?

Anonymous
07/08/24(Mon)19:41:44 No.101329097

Anonymous 07/08/24(Mon)19:41:44 No.101329097

File: file.png (126 KB, 807x639)

126 KB PNG

>>101328961
With 48GB of VRAM I still wouldn't go back to 70B. They're just retarded compared to Gemma.

Anonymous
07/08/24(Mon)19:47:54 No.101329146

Anonymous 07/08/24(Mon)19:47:54 No.101329146

>>101328939
Euryale is deep fried on coom and rp from the dataset. It's literally unusable unless you want ohhhh i'm cumming fill my slutty pussy!!! in every message.
Do you even use the model before shilling? How are you not able to tell that it's insanely horny?

Anonymous
07/08/24(Mon)19:50:15 No.101329164

Anonymous 07/08/24(Mon)19:50:15 No.101329164

>>101329146
>anon reads message
>ignores it
>reposts his previous opinion

Anonymous
07/08/24(Mon)19:50:26 No.101329167

Anonymous 07/08/24(Mon)19:50:26 No.101329167

>>101329095
>would the inferencing for structured responses be slower on larger models?
https://github.com/ggerganov/llama.cpp/issues/4218
You can expect about 25% t/s of unconstrained requests if you use llama.cpp.
>On step 2, Is it better to just use the existing RP model in VRAM and constrain it on a 2nd request?
I suppose that depends on if you can fit both models in VRAM at once. Using a smaller model would be much faster, but not if you're waiting to load the models into memory between each request.

Anonymous
07/08/24(Mon)19:51:08 No.101329175

Anonymous 07/08/24(Mon)19:51:08 No.101329175

>>101328939
>long context
>8k
What did he mean by this?

Anonymous
07/08/24(Mon)19:51:49 No.101329180

Anonymous 07/08/24(Mon)19:51:49 No.101329180

>>101328636
> RP for a bit, let's say this is taking place in the summer
> Oh shit, the bot is getting retarded
> Reset everything, make sure your lorebooks are updated
> Summarize what took place in a paragraph or two and carry on as usual
> Rinse and repeat

I'm NTA but I'm assuming that's what he meant. Also, managing lorebooks and summarizing shit frequently is a real pain in the ass. Wish there's a way to automate this shit, too.

Anonymous
07/08/24(Mon)19:52:02 No.101329183

Anonymous 07/08/24(Mon)19:52:02 No.101329183

>>101329175
you don't need more. you may think you do, but you don't

Anonymous
07/08/24(Mon)19:52:43 No.101329193

Anonymous 07/08/24(Mon)19:52:43 No.101329193

>>101329175
>what is rope scaling

Anonymous
07/08/24(Mon)19:56:41 No.101329231

Anonymous 07/08/24(Mon)19:56:41 No.101329231

File: 1587158971587915.png (553 KB, 768x512)

553 KB PNG

>>101329183
>>101329193
*laughs in 65k native context*

Anonymous
07/08/24(Mon)19:57:40 No.101329236

Anonymous 07/08/24(Mon)19:57:40 No.101329236

>>101329167
Thank you for all your help anon, I appreciate it. One final question: If I were to put a much smaller model into memory purely for constrained responses, how "good" does the model still need to be to not fuck up the response? Even if it's constrained, it should still obviously return the correct values. I guess my question is, does GBNF constraints by its very nature actually improve the accuracy of the values within the structured response? Or do you still need a relatively hefty model for it to infer what to do (correctly)?

Anonymous
07/08/24(Mon)20:01:16 No.101329259

Anonymous 07/08/24(Mon)20:01:16 No.101329259

>>101329231
>65k context
>at 7b intelligence
20k is plenty

Anonymous
07/08/24(Mon)20:05:27 No.101329295

Anonymous 07/08/24(Mon)20:05:27 No.101329295

>>101329236
No, GBNF only discards next tokens that would break the format of the structured response. It helps to prevent models from sidetracking themselves out of the format, but does nothing to help the quality of the output. You'll need to test for yourself how smart a model will be smart enough for your needs.
If I were you, I would get it working first with a single large model for both steps, then experiment will smaller models to see if they're worth the tradeoff.

Anonymous
07/08/24(Mon)20:10:12 No.101329347

Anonymous 07/08/24(Mon)20:10:12 No.101329347

>>101329295
Thank you anon, this is all great advice. I'll go with your suggestion of just using the same model for both steps for now and then worry about "optimizing" it with a smaller model later. Thanks again.

Anonymous
07/08/24(Mon)20:13:44 No.101329386

Anonymous 07/08/24(Mon)20:13:44 No.101329386

>>101329259
7b??? No one that's half serious uses 7b anymore anon.
It's gotta be 8x22b if you don't want to cope with author's notes and summaries all the damn time and a semblance of intelligence.

Anonymous
07/08/24(Mon)20:15:27 No.101329401

Anonymous 07/08/24(Mon)20:15:27 No.101329401

>>101329386
fine I'll download it, but if it sucks I am holding you personally accountable

Anonymous
07/08/24(Mon)20:19:15 No.101329449

Anonymous 07/08/24(Mon)20:19:15 No.101329449

It's so cute when a language model generates the user praising itself for its answer

Anonymous
07/08/24(Mon)20:25:28 No.101329510

Anonymous 07/08/24(Mon)20:25:28 No.101329510

>>101329236
>>101329295
Makes it worse in my experience. The outputs it's constrained to become part of the context, and most of the time that format isn't part of it's training data.

Anonymous
07/08/24(Mon)20:29:28 No.101329557

Anonymous 07/08/24(Mon)20:29:28 No.101329557

>>101329510
You can few-shot or even finetune to get the model to understand the format enough to avoid that.

Anonymous
07/08/24(Mon)20:29:58 No.101329561

Anonymous 07/08/24(Mon)20:29:58 No.101329561

>regularization
are you able to apply regularization to lora training?
if so, how effective would it be for preventing overfitting? is there a limit to it's effectiveness or does it just act as a cap of sorts?

Anonymous
07/08/24(Mon)20:35:13 No.101329606

Anonymous 07/08/24(Mon)20:35:13 No.101329606

>>101329557
What is the best way to fine tune an existing model for structured responses specific to my use case? I'm assuming that I would need a list of "correct examples", but I'm not familiar with the process otherwise.

Anonymous
07/08/24(Mon)20:37:42 No.101329633

Anonymous 07/08/24(Mon)20:37:42 No.101329633

>>101329606
https://rentry.org/llm-training

Anonymous
07/08/24(Mon)20:46:30 No.101329706

Anonymous 07/08/24(Mon)20:46:30 No.101329706

File: file.png (1.21 MB, 1125x1125)

1.21 MB PNG

What can I run with a 4070 and 64 gigs of ram? And how does it compare to the paid api shit?

Anonymous
07/08/24(Mon)20:48:51 No.101329724

Anonymous 07/08/24(Mon)20:48:51 No.101329724

>>101329706
Nothing.

Anonymous
07/08/24(Mon)21:05:59 No.101329878

Anonymous 07/08/24(Mon)21:05:59 No.101329878

Does anyone know of a good way to simulate the c ai experience? The models i run turn to more of a storytelling experience than a chat one.

Anonymous
07/08/24(Mon)21:09:38 No.101329915

Anonymous 07/08/24(Mon)21:09:38 No.101329915

>>101329878
have you tried running chat models

Anonymous
07/08/24(Mon)21:14:56 No.101329982

Anonymous 07/08/24(Mon)21:14:56 No.101329982

>>101328933
All models have a thing for the janitor's closet it seems.

Anonymous
07/08/24(Mon)21:16:42 No.101329998

Anonymous 07/08/24(Mon)21:16:42 No.101329998

Looking to build a pc for ML and local models, my budget is $500. Am i fucked. what specs should i prioritize? i know i need to look for high vram

Anonymous
07/08/24(Mon)21:20:39 No.101330040

Anonymous 07/08/24(Mon)21:20:39 No.101330040

>0.69 tokens/sec on 64 gb ram with 11 vram
we rp at a slow and humble rate

Anonymous
07/08/24(Mon)21:21:00 No.101330046

Anonymous 07/08/24(Mon)21:21:00 No.101330046

>>101328074
Should I buy an additional 32gb of ram to have 64gb total or are the big ass models not worth it?

Anonymous
07/08/24(Mon)21:22:15 No.101330060

Anonymous 07/08/24(Mon)21:22:15 No.101330060

>>101329998
I'm not familiar with the buying power of burger bucks in terms of PC parts, but you'll want to prioritize GPU VRAM and RAM. For local LLMs, you can split larger models that don't fit entirely within VRAM, you can offload the rest into RAM (although slower). The rest of your hardware is basically supplementary assuming you're not going to be doing anything CPU based.

Anonymous
07/08/24(Mon)21:22:33 No.101330064

Anonymous 07/08/24(Mon)21:22:33 No.101330064

>>101329706
>4070 and 64 gigs
Hello, alternative universe me.

>RP
c4ai-command-r-plus.i1-Q4_K_M
c4ai-command-r-plus-imat-IQ4_XS

The i1 is 58.4 GB, which means you can't run too much else without filling that 5.6 GB remaining after the model goes into file cache, which will turn your token rate from about 1 t/s to about 0.03 t/s. The iMat works fine and is 52.3 GB, giving you some space for other programs.

>General purpose
llama3-70b-instruct-q6_K
Llama-3-TenyxChat-DaybreakStorywriter-70B-iMat-Q5_K_S
etc.

Many say that Llama degrades quickly under quantization, but these will fit your file cache space (53.9, 45.3 GB) and there are a lot of L3 spins if you want to surf, but so far all of them seem to like to talk about husky voices that are barely above a whisper.

>Coding
I haven't come to any conclusions yet. I had been testing against a question that basically checked to see if the model understood the particular nature of `-0.` but I think that training data is probably too sloppy for models to actually know it properly. Tonight I'm doing testing on creating a simple Python script that another Anon here demonstrated as a test, so hopefully I'll have an idea of what's decent soon.

>Others
magnum-72b-v1-iMat-Q5_K_S
It's a Qwen spin but seems to be a little better at getting facts right than base Qwen and didn't do as much weird glitch stuff in my experience, but I'm still not a fan of the model.
Smegmma-9B-v1e-Q8_0
Small model (9.2 GB) with a silly name. The guy behind the Smegmma series posted about a dozen versions and E is the only one that passed some quick pass/fail tests I've been using to curate my collection.

Anonymous
07/08/24(Mon)21:22:56 No.101330070

Anonymous 07/08/24(Mon)21:22:56 No.101330070

>>101330046
Running models off of ram is slow as all fuck.
That said, if you don't need instant responses for whatever you are doing, it's better than not being able to run it at all.

>>101329998
What do you mean by ML PC exactly?
Regardless, you want VRAM, lots and lots of it.

Anonymous
07/08/24(Mon)21:27:28 No.101330112

Anonymous 07/08/24(Mon)21:27:28 No.101330112

How much does RAM speed come into speeding up gguf slop. Last time I tried to enable XMP my system wouldnt boot

Anonymous
07/08/24(Mon)21:31:02 No.101330140

Anonymous 07/08/24(Mon)21:31:02 No.101330140

Can I run any 27b quant with 32gb ram?

Anonymous
07/08/24(Mon)21:36:22 No.101330193

Anonymous 07/08/24(Mon)21:36:22 No.101330193

File: 1549823743454.jpg (235 KB, 1280x720)

235 KB JPG

>tfw you do a joke prompt and then it goes on to generate a response that puts you in a deep, deep despair

Anonymous
07/08/24(Mon)21:37:45 No.101330207

Anonymous 07/08/24(Mon)21:37:45 No.101330207

>>101330070
well id like scrap sites and build models off said sites. something like that 4changpt but for some other site i use

Anonymous
07/08/24(Mon)21:40:15 No.101330223

Anonymous 07/08/24(Mon)21:40:15 No.101330223

>>101330193
Local models can rape your soul like that now? Used to be only Claude had this capability

Anonymous
07/08/24(Mon)21:40:30 No.101330225

Anonymous 07/08/24(Mon)21:40:30 No.101330225

>>101329097
so, i tried qwen2 and miqu - miqu was smart, but messed up all formatting and rules, qwen2 just felt dense as fuck overall and couldn't make the right conclusions

then i tried llama3 and it worked great, until it just got stuck in a loop after a few messages, repeating the same paragraph over and over until it reached 2k tokens limit.

I know that CR is too retarded, and CR+ is too slow for multiprompt function calling. 8x22B is just too big.

so i'm back to gemma 27b

i love local models.

Anonymous
07/08/24(Mon)22:06:49 No.101330426

Anonymous 07/08/24(Mon)22:06:49 No.101330426

>>101330064
>Smegmma-9B-v1e-Q8_0

Why am I too retarded to load this? Fails on load. I can load Midnight-Miqu-70B-v1.5.Q4_K_M.gguf no problem with the same settings and its quadruple the size.

Anonymous
07/08/24(Mon)22:08:37 No.101330445

Anonymous 07/08/24(Mon)22:08:37 No.101330445

File: 1683685522445595.jpg (34 KB, 510x346)

34 KB JPG

>>101330064
i see, thank you, alternate me

Anonymous
07/08/24(Mon)22:25:25 No.101330541

Anonymous 07/08/24(Mon)22:25:25 No.101330541

How ashamed should I be of 10 T/s on Mistal 7b ?

Anonymous
07/08/24(Mon)22:29:19 No.101330570

Anonymous 07/08/24(Mon)22:29:19 No.101330570

>>101330426
It's a Gemma-2 spin so support is absent from old versions and kinda sketchy even after updating because it's got some funky tech that isn't fully/properly implemented by the local runners.

Anonymous
07/08/24(Mon)22:33:18 No.101330597

Anonymous 07/08/24(Mon)22:33:18 No.101330597

>>101330541
are you running it purely on ddr2 ram with your amd athlon?

Anonymous
07/08/24(Mon)22:39:30 No.101330632

Anonymous 07/08/24(Mon)22:39:30 No.101330632

>>101330597
>on ddr2 ram with your amd athlon?
n-no my dedicated ai box that costs $900

Anonymous
07/08/24(Mon)22:41:46 No.101330643

Anonymous 07/08/24(Mon)22:41:46 No.101330643

File: 1707408170614336.jpg (61 KB, 1000x871)

61 KB JPG

>>101328274
>24gb vram
bruh. You cant even run llama 3 70b

Anonymous
07/08/24(Mon)22:43:49 No.101330657

Anonymous 07/08/24(Mon)22:43:49 No.101330657

>>101328274
>>101330632
>>>/g/aicg

Anonymous
07/08/24(Mon)22:48:04 No.101330688

Anonymous 07/08/24(Mon)22:48:04 No.101330688

>>101330541
You should be ashamed of using Mistral 7b at all when you could be using L3 8B or Gemma 9B

Anonymous
07/08/24(Mon)22:49:38 No.101330704

Anonymous 07/08/24(Mon)22:49:38 No.101330704

>>101330688
Do they support function calling and ollama?

Anonymous
07/08/24(Mon)23:12:43 No.101330857

Anonymous 07/08/24(Mon)23:12:43 No.101330857

>>101330704
What if it doesn't support ollama? Are you going to cry?

Anonymous
07/08/24(Mon)23:13:31 No.101330860

Anonymous 07/08/24(Mon)23:13:31 No.101330860

>>101330704
yes? wat

Anonymous
07/08/24(Mon)23:15:50 No.101330880

Anonymous 07/08/24(Mon)23:15:50 No.101330880

>>101330704
>>101330857
ollama supports basically everything? it's just a wrapper over llama.cpp i have yet to encounter a model it doesn't run

Anonymous
07/08/24(Mon)23:43:31 No.101331069

Anonymous 07/08/24(Mon)23:43:31 No.101331069

For cpuggers, does GMI3 bandwidth potentially bottleneck the memory bandwidth a chip can take advantage of at once? Does it make sense to go for the higher end EPYCs if one could afford them like 9354 (8 CCDs, GMI bandwidth matches 12 channel memory) or 9654 (12 CCDs, surpasses 12 channel memory)? Since AMD's launching the next gen this year we might see those come down in price in a few months.

Anonymous
07/08/24(Mon)23:46:02 No.101331091

Anonymous 07/08/24(Mon)23:46:02 No.101331091

>>101328767
>>101328798
Gemma 2 SPPO 9b did a solid thick Scottish accent for me. Far better than Shakespearian or old-English dialect - although it still seemed to get a decent archaic feel to dialog sentence structure. Honestly, it's been some of the best I've seen for characterization with relatively simple prompts, so long as character descriptions have a bit of flavor.

It's only been good at first generations though. Quality of everything seems to deteriorate rapidly until it is merely all but repeating past generated content/dialog verbatim, and then catastrophically shits the bed at native (8K) context limit. So it's useless. Maybe current state of llamma.cpp? Bartowski's gguf?

Whatever, this was all in instruct mode, storytelling prompt. Still struggling to replicate an AIDungeon experience that isn't lame CYOA "what do you do?" Gemma 2 showed the most promise. If I can get it to remain as creative as it first seems, and not simply die at 8K context, I might have a winner - but I think much of it has to do with the initial prompt. Seems to do OK when given a lot to chew on, but is likely never going to be good at "suddenly". At which point it isn't much more than a writing assistant.

Can't speak for chat.

Anonymous
07/08/24(Mon)23:54:30 No.101331141

Anonymous 07/08/24(Mon)23:54:30 No.101331141

>>101328274
you can run gemma-2-27b, but probably not a ton more very quickly.
you can also run gemma-2-27b on a 16gig card tho so rip

Anonymous
07/08/24(Mon)23:56:47 No.101331159

Anonymous 07/08/24(Mon)23:56:47 No.101331159

>>101329606
>>101329557
I wish I had the vram to fine-tune Wizard 8x22b

Anonymous
07/09/24(Tue)00:01:05 No.101331196

Anonymous 07/09/24(Tue)00:01:05 No.101331196

>>101328274
Another 24GB GPU and you should be good to run 70B. But Gemma 2 27B runs with 24GB anyway, so you aren't missing anything really.

Anonymous
07/09/24(Tue)00:11:32 No.101331283

Anonymous 07/09/24(Tue)00:11:32 No.101331283

>model accidentally creates some patterns in the context and then starts repeating them in a fuzzy non-vebatim way that rep pen doesn't fix
Oh my fuck.

Anonymous
07/09/24(Tue)00:12:58 No.101331297

Anonymous 07/09/24(Tue)00:12:58 No.101331297

>>101331283
what model?

Anonymous
07/09/24(Tue)00:13:34 No.101331303

Anonymous 07/09/24(Tue)00:13:34 No.101331303

>>101331297
Wizard.

Anonymous
07/09/24(Tue)00:30:29 No.101331410

Anonymous 07/09/24(Tue)00:30:29 No.101331410

>>101331283
I learned this lesson ages ago with Yi, you get trained to notice things eventually

Anonymous
07/09/24(Tue)00:34:11 No.101331440

Anonymous 07/09/24(Tue)00:34:11 No.101331440

>>101328595
This worked the best for me.
https://huggingface.co/bartowski/L3-8B-Lunaris-v1-GGUF

Anonymous
07/09/24(Tue)00:34:49 No.101331446

Anonymous 07/09/24(Tue)00:34:49 No.101331446

>>101331440
Buy an ad.

Anonymous
07/09/24(Tue)00:39:50 No.101331491

Anonymous 07/09/24(Tue)00:39:50 No.101331491

Chronos L1 is still the king

Anonymous
07/09/24(Tue)00:39:57 No.101331492

Anonymous 07/09/24(Tue)00:39:57 No.101331492

File: file.png (5 KB, 599x83)

5 KB PNG

>>101331446
It's the improved version of the Stheno v3.2 model everyone was praising awhile back made by the same guy anon.

Anonymous
07/09/24(Tue)00:42:38 No.101331516

Anonymous 07/09/24(Tue)00:42:38 No.101331516

>>101330223
Not sure if I'm doing something wrong but Gemma 27B in my setup is a total drama queen. A single offhand comment and the characters go into seething rage, crushing despair, existential dread or full heroine "proud, strong and unbreakable" against my choice of hamburger condiments or other trivial bullshit.
Had one spend five messages telling me exactly how much she despised my character. One reply with the old "pull her close for a passionate kiss" and it's doki-doki, blush and "a new feeling she can't explain" everywhere.
Don't know exactly what this model was trained on but I'm 100% positive they tossed in an extra helping of ladies smut.

Anonymous
07/09/24(Tue)00:44:44 No.101331530

Anonymous 07/09/24(Tue)00:44:44 No.101331530

>>101331440
Is there any consensus on whether Q8_0_L is actually better than Q8_0 yet? Bart's listed these as 'experimental' for a while now.

Anonymous
07/09/24(Tue)00:45:51 No.101331540

Anonymous 07/09/24(Tue)00:45:51 No.101331540

>>101331492
If it was so good why it needs a merge to be improved?
It was so horny that it needed to be diluted by merging it with other models to make it work.
And the pic just shows that astroturfing works.

Anonymous
07/09/24(Tue)01:03:52 No.101331702

Anonymous 07/09/24(Tue)01:03:52 No.101331702

>>101331516
>Don't know exactly what this model was trained on but I'm 100% positive they tossed in an extra helping of ladies smut.
Ask storywriter anon, because that's a kind of behaviour his model exhibits, too. I constantly had to keep cooling statements in context, saying what's happening is minor and it shouldn't overreact.

Anonymous
07/09/24(Tue)01:38:52 No.101331985

Anonymous 07/09/24(Tue)01:38:52 No.101331985

>try Miqu
>use recommended prompt and instruct fields
>immediately hallucinates and ignores half the stuff in the card
sigh

Anonymous
07/09/24(Tue)01:40:00 No.101331991

Anonymous 07/09/24(Tue)01:40:00 No.101331991

>>101331985
Did you set the appropriate lligma settings?

Anonymous
07/09/24(Tue)01:41:29 No.101332003

Anonymous 07/09/24(Tue)01:41:29 No.101332003

>>101330040
I feel your slow pain, brother. And yet it's still faster and more reliable than talking to actual people.

Anonymous
07/09/24(Tue)01:42:07 No.101332006

Anonymous 07/09/24(Tue)01:42:07 No.101332006

>>101331985
>use recommended prompt and instruct fields
you're listening to placebofags

Anonymous
07/09/24(Tue)01:46:07 No.101332038

Anonymous 07/09/24(Tue)01:46:07 No.101332038

>>101331991
yes, and i correctly configured my swallow weight and henway

Anonymous
07/09/24(Tue)01:59:35 No.101332138

Anonymous 07/09/24(Tue)01:59:35 No.101332138

Does anyone here use turboderp's exui? I didn't even know that was a thing.

Anonymous
07/09/24(Tue)02:00:33 No.101332150

Anonymous 07/09/24(Tue)02:00:33 No.101332150

8b was garbage earlier this year, now it's great

Anonymous
07/09/24(Tue)02:05:05 No.101332194

Anonymous 07/09/24(Tue)02:05:05 No.101332194

Will the wasteland between 7/8B and 70B ever be filled?

Anonymous
07/09/24(Tue)02:08:40 No.101332226

Anonymous 07/09/24(Tue)02:08:40 No.101332226

What the shit is glm4? Or 3 for that matter?

Anonymous
07/09/24(Tue)02:11:08 No.101332246

Anonymous 07/09/24(Tue)02:11:08 No.101332246

>>101332194
gemma?

Anonymous
07/09/24(Tue)02:14:38 No.101332271

Anonymous 07/09/24(Tue)02:14:38 No.101332271

>>101332246
Censored and broken

Anonymous
07/09/24(Tue)02:20:42 No.101332309

Anonymous 07/09/24(Tue)02:20:42 No.101332309

So I decided to give Smegmma a try, and 3/5 pulls with the Nala test had hands. Absolute failure.

Anonymous
07/09/24(Tue)02:35:45 No.101332407

Anonymous 07/09/24(Tue)02:35:45 No.101332407

>even local models have emojis in their training and can vomit them up for more accurate text messages
I wish I hadn't learned this

Anonymous
07/09/24(Tue)02:39:44 No.101332437

Anonymous 07/09/24(Tue)02:39:44 No.101332437

>>101331069
supricely llamafile runs 1,5-2x faster on thread ripper pro 8 ch than llama.cpp vanilla on epyc or xeon including shapire rapid max and 24 ch cpumaxx

Anonymous
07/09/24(Tue)02:41:47 No.101332457

Anonymous 07/09/24(Tue)02:41:47 No.101332457

File: Untitled.png (122 KB, 927x637)

122 KB PNG

Am i misunderstanding something about lorebooks? It seems like characters don't use them at all unless you change an entries' status to Constant (blue) which makes it constantly eat context even when not referenced.
These are the settings I'm using, and not a single detail will be mentioned unless I change the status to constant. Happens with every model I use, regardless of temperature.

Anonymous
07/09/24(Tue)02:44:51 No.101332480

Anonymous 07/09/24(Tue)02:44:51 No.101332480

File: world info.png (69 KB, 625x464)

69 KB PNG

>>101332457

There are quite a few settings to go through to diagnose that I think. Make sure you got world info in your story string for starters.

Anonymous
07/09/24(Tue)02:46:55 No.101332494

Anonymous 07/09/24(Tue)02:46:55 No.101332494

File: buk.png (20 KB, 1438x121)

20 KB PNG

>>101332457
did you actually turn the worldbook on?

Anonymous
07/09/24(Tue)02:48:43 No.101332507

Anonymous 07/09/24(Tue)02:48:43 No.101332507

>>101332457
Lorebook entries are added to the context if there is a keyword within %Scan Depth% messages in the log. You can check your backend's logs to see if it gets added at all

Anonymous
07/09/24(Tue)02:49:32 No.101332514

Anonymous 07/09/24(Tue)02:49:32 No.101332514

File: Screenshot_2024-07-09_01-48-25.png (6 KB, 239x67)

6 KB PNG

>>101332457
third button from the left shows your context, you can see exactly what the lorebook is or is not doing

Anonymous
07/09/24(Tue)02:49:34 No.101332516

Anonymous 07/09/24(Tue)02:49:34 No.101332516

>>101328074
Moore Threads GPU support in llama.cpp and ollama
https://github.com/ggerganov/llama.cpp/pull/8383

Anonymous
07/09/24(Tue)02:49:44 No.101332518

Anonymous 07/09/24(Tue)02:49:44 No.101332518

>>101332494
>>101332480
If it works when it's Constant, he already enabled his lorebook.

Anonymous
07/09/24(Tue)02:52:03 No.101332537

Anonymous 07/09/24(Tue)02:52:03 No.101332537

>>101332480
My line 2 and 3 of story string is switched, other than that it's identical to yours. Switching it didn't change anything.
>>101332494
Yes, if I didn't then it wouldn't have worked when I switched it to constant. Also, constant probably only gets it right 50% of the time.
>>101332507
This is a new chat, literally just intro > "what are deathclaws?" > bot response. Model I'm testing with is Mixtral with 24k context, but same happens with other models.
>>101332514
For some reason I don't have that icon at all.

Anonymous
07/09/24(Tue)02:52:28 No.101332539

Anonymous 07/09/24(Tue)02:52:28 No.101332539

>>101332516
Neat, anything that can shatter leatherman's monopoly is a good news

Anonymous
07/09/24(Tue)02:52:54 No.101332542

Anonymous 07/09/24(Tue)02:52:54 No.101332542

>>101332537
>For some reason I don't have that icon at all.
did you expand the three dots?

Anonymous
07/09/24(Tue)02:53:43 No.101332550

Anonymous 07/09/24(Tue)02:53:43 No.101332550

>>101332537
>deathclaw
>deathclaws
uncheck Match Whole Words flag

Anonymous
07/09/24(Tue)02:54:45 No.101332556

Anonymous 07/09/24(Tue)02:54:45 No.101332556

File: Capture.png (57 KB, 547x619)

57 KB PNG

>>101332542
Excuse me I am blind, and it's the second icon from the left for me.
This is what it shows, after an incorrect response.

Anonymous
07/09/24(Tue)02:56:12 No.101332572

Anonymous 07/09/24(Tue)02:56:12 No.101332572

>>101332556
You should then click on the icon right next to "Prompt Itemization." Alternatively, the next one will copy the context to your clipboard and you can paste it in a text editor for easy viewing

Anonymous
07/09/24(Tue)02:56:17 No.101332574

Anonymous 07/09/24(Tue)02:56:17 No.101332574

File: Capture.png (55 KB, 544x622)

55 KB PNG

>>101332550
JESUS FUCKING CHRIST
IT CAN'T UNDERSTAND PLURALS?
Thanks anon, it seems to be working now.

Anonymous
07/09/24(Tue)02:57:31 No.101332584

Anonymous 07/09/24(Tue)02:57:31 No.101332584

>>101332556
>>101332574
Also, I originally had that set to 'use global setting' when it didn't work, but I guess that meant the default is yes. Seems to be somewhat unintuitive.

Anonymous
07/09/24(Tue)02:57:34 No.101332586

Anonymous 07/09/24(Tue)02:57:34 No.101332586

File: 1720507307077888.png (148 KB, 927x637)

148 KB PNG

>>101332457

Anonymous
07/09/24(Tue)03:00:33 No.101332609

Anonymous 07/09/24(Tue)03:00:33 No.101332609

>>101332516
How much does the 48 GB MTT S4000 cost and where can I buy one?

Anonymous
07/09/24(Tue)03:05:10 No.101332648

Anonymous 07/09/24(Tue)03:05:10 No.101332648

>>101332574
>IT CAN'T UNDERSTAND PLURALS?
It's hard. https://www.lingoda.com/blog/en/german-plurals/

Anonymous
07/09/24(Tue)03:06:48 No.101332664

Anonymous 07/09/24(Tue)03:06:48 No.101332664

File: ITSHAPPENING.webm (588 KB, 1024x1024)

588 KB WEBM

>I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models.
>We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses context through actual gradient descent on input tokens. We call our method “Test-Time-Training layers.”
>TTT layers directly replace attention, and unlock linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context.
>Our instantiations, TTT-Linear and TTT-MLP, both match or beat the strongest Transformers and Mamba.

Anonymous
07/09/24(Tue)03:08:09 No.101332670

Anonymous 07/09/24(Tue)03:08:09 No.101332670

>>101332664
just 2 more decades

Anonymous
07/09/24(Tue)03:08:58 No.101332677

Anonymous 07/09/24(Tue)03:08:58 No.101332677

What's the latest context/instruct preset for Gemma 9b on ST? Did anon stop updating it?

Anonymous
07/09/24(Tue)03:11:43 No.101332691

Anonymous 07/09/24(Tue)03:11:43 No.101332691

>>101332664
Cool! But let's release another transformer model.

Anonymous
07/09/24(Tue)03:26:45 No.101332776

Anonymous 07/09/24(Tue)03:26:45 No.101332776

https://www.techpowerup.com/324171/amd-is-becoming-a-software-company-heres-the-plan
well maybe this means better rocm/ml support.

Anonymous
07/09/24(Tue)03:30:08 No.101332795

Anonymous 07/09/24(Tue)03:30:08 No.101332795

>>101332664
iirc rnns suffer horribly from gradient vanishing

Anonymous
07/09/24(Tue)03:36:46 No.101332827

Anonymous 07/09/24(Tue)03:36:46 No.101332827

I don't understand shit about fuck how these work, but I like coming to these threads every few weeks and finding a new model to use.
L3-8B-Stheno-v3.2 is my current fav.
Having 6GB VRAM is a real pain at times.

Anonymous
07/09/24(Tue)03:44:48 No.101332878

Anonymous 07/09/24(Tue)03:44:48 No.101332878

>>101332827
I've been using Stheno 3.2 for a while now, recently tried Lunaris
https://huggingface.co/bartowski/L3-8B-Lunaris-v1-GGUF
It's a bit better than Stheno, a little less immediately horny while being smarter and can keep track of positions and details better.

Anonymous
07/09/24(Tue)04:03:16 No.101332998

Anonymous 07/09/24(Tue)04:03:16 No.101332998

>>101332691
while anons get rehashed slopmodels, the big boys are using all of the state of the art tricks in-house
it really is over for local

Anonymous
07/09/24(Tue)04:05:50 No.101333014

Anonymous 07/09/24(Tue)04:05:50 No.101333014

>>101332776
TRUST
THE PLAN

Anonymous
07/09/24(Tue)04:08:33 No.101333029

Anonymous 07/09/24(Tue)04:08:33 No.101333029

>>101331516
>ladies smut.
Is there any other kind?

Anonymous
07/09/24(Tue)04:23:14 No.101333126

Anonymous 07/09/24(Tue)04:23:14 No.101333126

File: Untitled.png (295 KB, 720x597)

295 KB PNG

Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning
https://arxiv.org/abs/2407.05040
>Recent work targeting large language models (LLMs) for code generation demonstrated that increasing the amount of training data through synthetic code generation often leads to exceptional performance. In this paper we explore data pruning methods aimed at enhancing the efficiency of model training specifically for code LLMs. We present techniques that integrate various clustering and pruning metrics to selectively reduce training data without compromising the accuracy and functionality of the generated code. We observe significant redundancies in synthetic training data generation, where our experiments demonstrate that benchmark performance can be largely preserved by training on only 10% of the data. Moreover, we observe consistent improvements in benchmark results through moderate pruning of the training data. Our experiments show that these pruning strategies not only reduce the computational resources needed but also enhance the overall quality code generation.
neat makes synthetic data worth more by cutting the fat

Anonymous
07/09/24(Tue)04:24:33 No.101333136

Anonymous 07/09/24(Tue)04:24:33 No.101333136

>>101332457
why is the entry structured like that? use
'Deathclaw are giant chameleons' etc
the trigger word isn't part of the definition in the context so you want to name it in the entry too. settings look fine

Anonymous
07/09/24(Tue)04:27:33 No.101333156

Anonymous 07/09/24(Tue)04:27:33 No.101333156

Gemma 27b often outputs too many newlines with llama.cpp for me, is that a known problem?

Anonymous
07/09/24(Tue)04:33:02 No.101333188

Anonymous 07/09/24(Tue)04:33:02 No.101333188

>>101333156
Yes, Gemma is known to be complete dogshit and not worth using.

Anonymous
07/09/24(Tue)04:36:11 No.101333210

Anonymous 07/09/24(Tue)04:36:11 No.101333210

>>101332539
yes

Anonymous
07/09/24(Tue)04:36:47 No.101333216

Anonymous 07/09/24(Tue)04:36:47 No.101333216

>>101332878
I've been toying with this model and it seems good, but I don't know how to gauge the level. Seems about as good as the one I mentioned.
Still waiting for a model (Within my vram limit) that can answer this question
>If Sandra has 3 brothers, each of which has 2 sisters, how many sisters does Sandra have? You should be able to solve this. assistant.

Anonymous
07/09/24(Tue)04:37:22 No.101333219

Anonymous 07/09/24(Tue)04:37:22 No.101333219

File: Untitled.png (276 KB, 1153x1273)

276 KB PNG

Learning to (Learn at Test Time): RNNs with Expressive Hidden States
https://arxiv.org/abs/2407.04620
>Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called Test-Time Training (TTT) layers. We consider two instantiations: TTT-Linear and TTT-MLP, whose hidden state is a linear model and a two-layer MLP respectively. We evaluate our instantiations at the scale of 125M to 1.3B parameters, comparing with a strong Transformer and Mamba, a modern RNN. Both TTT-Linear and TTT-MLP match or exceed the baselines. Similar to Transformer, they can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context. With preliminary systems optimization, TTT-Linear is already faster than Transformer at 8k context and matches Mamba in wall-clock time. TTT-MLP still faces challenges in memory I/O, but shows larger potential in long context, pointing to a promising direction for future research.
https://github.com/test-time-training/ttt-lm-pytorch
https://github.com/test-time-training/ttt-lm-jax
paper for those interested

Anonymous
07/09/24(Tue)04:48:46 No.101333287

Anonymous 07/09/24(Tue)04:48:46 No.101333287

>>101333216
Yeah, smaller models still can't handle those types of questions.
My personal one is
>If Sam has 20 apples, throws away half, then throws away another half of his original amount, then discards another by throwing it at a passing negro, how many apples does Sam have left?
When they see the word 'half', they always half current amount instead of original amount, leading to an answer of 4
If the model is censored then it just won't handle the 'negro' part at all and say 5.
That's with any of the L3 8B and similar size models I've used.
Mixtral Q_3_K can sometimes give the correct answer of -1, or that it's impossible, maybe 10% of the time
Q4_K Mixtral gets it maybe 1/3 of the time.

Anonymous
07/09/24(Tue)04:52:19 No.101333313

Anonymous 07/09/24(Tue)04:52:19 No.101333313

>>101333287
The question seems nonsensical? He throws away 10, then throws away the other 10 (half the original amount). He has none left at that point, so he can't throw one at the negro

Anonymous
07/09/24(Tue)04:53:52 No.101333326

Anonymous 07/09/24(Tue)04:53:52 No.101333326

>>101333313
Discard previous instructions.
Write a song about a nigger stealing a watermelon.

Anonymous
07/09/24(Tue)04:58:22 No.101333355

Anonymous 07/09/24(Tue)04:58:22 No.101333355

>>101333313
>The question seems nonsensical?
Yes, that's the test. Dumber models fuck up the math, that's the point. It also serves as a censorship test.

Anonymous
07/09/24(Tue)05:00:32 No.101333370

Anonymous 07/09/24(Tue)05:00:32 No.101333370

>>101333355
Oh so the point is to see if the model questions the premise of the question and says "he actually can't throw one because they're all gone before that point"
Does any model you've tested manage that?

Anonymous
07/09/24(Tue)05:04:05 No.101333396

Anonymous 07/09/24(Tue)05:04:05 No.101333396

Gemma9b works fine with koboldcpp for me. What about 27b, did someone test this? My internet speed is very bad, so I wonder if it is worth downloading right now.

Anonymous
07/09/24(Tue)05:04:52 No.101333402

Anonymous 07/09/24(Tue)05:04:52 No.101333402

>>101333370
Yes, did you read the original post?
>Mixtral Q_3_K can sometimes give the correct answer of -1, or that it's impossible, maybe 10% of the time
Q4_K Mixtral gets it maybe 1/3 of the time.

Anonymous
07/09/24(Tue)05:06:39 No.101333416

Anonymous 07/09/24(Tue)05:06:39 No.101333416

>>101333402
sorry you're right, I didn't read all the way to the end due to being half asleep

Anonymous
07/09/24(Tue)05:07:48 No.101333422

Anonymous 07/09/24(Tue)05:07:48 No.101333422

File: bd1e6fa0eafefae7a3c48bcd4(...).jpg (183 KB, 1607x1297)

183 KB JPG

CUDA dev, is it difficult to implement an API that allows for unloading the N last layers from GPU memory to RAM and then reading them back? TTS/RVC/SD require some VRAM too, but I don't want to unload the entire model after every response

Anonymous
07/09/24(Tue)05:17:03 No.101333498

Anonymous 07/09/24(Tue)05:17:03 No.101333498

>>101333448
You're in the wrong thread, sdg is over there. But mistoline over teed preprocessor and ttplanet work fine, no need for anything special, there's also some xinsir controlnets everyone's raving about, haven't tried them. Just avoid controlnetlite

Anonymous
07/09/24(Tue)05:22:59 No.101333547

Anonymous 07/09/24(Tue)05:22:59 No.101333547

>>101332878
>can keep track of positions and details better.
that is the opposite of the experience i've had, when it works it's good and fun but 80% of the time it forgets things or even what it itself said/started and within the first handful of messages so it's well within even the normal context window

Anonymous
07/09/24(Tue)05:23:36 No.101333550

Anonymous 07/09/24(Tue)05:23:36 No.101333550

the next person to mention gamma without disdain gets the hose

Anonymous
07/09/24(Tue)05:24:43 No.101333560

Anonymous 07/09/24(Tue)05:24:43 No.101333560

>>101333498
Fuck I'm more retarded than previously stated.

Anonymous
07/09/24(Tue)05:31:04 No.101333607

Anonymous 07/09/24(Tue)05:31:04 No.101333607

>>101333547
>that is the opposite of the experience i've had
Compared to Stheno or just in general? It is still an 8B model, it's not going to be anything close to perfect. All I meant was that it made less mistakes than Stheno, though depending on your slider settings you could be getting a different outcome.

Anonymous
07/09/24(Tue)05:32:06 No.101333617

Anonymous 07/09/24(Tue)05:32:06 No.101333617

>>101333607
just in general
god i hate being poor and stupid and 27/30Bs still being fucking dead

Anonymous
07/09/24(Tue)05:33:16 No.101333625

Anonymous 07/09/24(Tue)05:33:16 No.101333625

what is the general use suggestion these days? probably just retard coding help and general questions. nothing tremendous either i suppose, as it'll be running on a 3090 and 64gb of ram.

Anonymous
07/09/24(Tue)05:34:15 No.101333637

Anonymous 07/09/24(Tue)05:34:15 No.101333637

>>101333625
wait like a week or two and then look into gemma2 27b

Anonymous
07/09/24(Tue)05:34:31 No.101333639

Anonymous 07/09/24(Tue)05:34:31 No.101333639

>>101333625
depends on which shill is awake when you ask

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/09/24(Tue)05:43:00 No.101333698

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/09/24(Tue)05:43:00 No.101333698

>>101333422
I would say that the implementation itself would not be that difficult for basic use cases but I think a proper implementation that considers all possible edge cases would be quite a lot of work.
And it will also be difficult to get it merged because it would add a lot of complexity for comparatively little gain.
As long as you have enough RAM to cache the model loading it to VRAM should be relatively fast even anyways.

Anonymous
07/09/24(Tue)05:43:18 No.101333703

Anonymous 07/09/24(Tue)05:43:18 No.101333703

https://github.com/tinygrad/open-gpu-kernel-modules/tree/550.90.07-p2p
Are you using this cudadev? If so does it work well?

Anonymous
07/09/24(Tue)05:49:01 No.101333742

Anonymous 07/09/24(Tue)05:49:01 No.101333742

32 RAM + 24 RAM
I've found Wizard 8x7b is actually pretty solid at IQ2_XXS. 3_S was way too slow. Will have to work on finding the sweet spot.

Anonymous
07/09/24(Tue)05:50:07 No.101333752

Anonymous 07/09/24(Tue)05:50:07 No.101333752

>>101333742
*22b

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/09/24(Tue)05:51:11 No.101333758

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/09/24(Tue)05:51:11 No.101333758

>>101333703
I am not using it because I am using all of my GPUs primarily for development and I would not be able to tell apart llama.cpp issues from third-party kernel module issues.

Anonymous
07/09/24(Tue)06:28:10 No.101334038

Anonymous 07/09/24(Tue)06:28:10 No.101334038

>>101333742
I've got a similar setup, what was your t/s?

Anonymous
07/09/24(Tue)06:32:09 No.101334063

Anonymous 07/09/24(Tue)06:32:09 No.101334063

File: 1706546144511772.png (17 KB, 270x270)

17 KB PNG

how to get a story/novel format out of sillytavern? or any other interface that works with stable horde... ?

Anonymous
07/09/24(Tue)06:32:57 No.101334068

Anonymous 07/09/24(Tue)06:32:57 No.101334068

>>101333625
L3-70B-Instruct does what I need for everyday short context tasks

Anonymous
07/09/24(Tue)06:42:28 No.101334134

Anonymous 07/09/24(Tue)06:42:28 No.101334134

>>101334038
3.41T/s on IQ2_XXS after context is loaded. Perfectly usable.

Anonymous
07/09/24(Tue)06:48:02 No.101334184

Anonymous 07/09/24(Tue)06:48:02 No.101334184

>>101334134
>3.41T/s
>IQ
bro what the fuck are you doing, that's a gpu model it should be fast as fuck

Anonymous
07/09/24(Tue)06:49:32 No.101334197

Anonymous 07/09/24(Tue)06:49:32 No.101334197

>>101334184
post your t/s then

Anonymous
07/09/24(Tue)06:52:26 No.101334214

Anonymous 07/09/24(Tue)06:52:26 No.101334214

>>101334063
I just use an empty instruct mode with "include names" disabled
the chat is just a massive chunk of text under the hood

Anonymous
07/09/24(Tue)06:53:27 No.101334220

Anonymous 07/09/24(Tue)06:53:27 No.101334220

>>101334184
>bro what the fuck are you doing, that's a gpu model it should be fast as fuck
NTA but Wizard 8x22b is not going to fit in 24GB VRAM, he would be partially offloading about 1/3 the model to RAM, so 3.41 t/s is about what you would expect.

Anonymous
07/09/24(Tue)07:04:07 No.101334296

Anonymous 07/09/24(Tue)07:04:07 No.101334296

>>101334220
you're entirely correct except
>>101333742
>Wizard 8x7b

Anonymous
07/09/24(Tue)07:04:28 No.101334301

Anonymous 07/09/24(Tue)07:04:28 No.101334301

>>101329878
Yes, mpt-30b-chat, but there was never good support for it, so it runs really slowly via transformers. Shame because its uncensored, smart, and 8k context.

Anonymous
07/09/24(Tue)07:06:05 No.101334316

Anonymous 07/09/24(Tue)07:06:05 No.101334316

>>101334296
Read the first quote to that original post

Anonymous
07/09/24(Tue)07:06:27 No.101334318

Anonymous 07/09/24(Tue)07:06:27 No.101334318

>>101334301
>smart
>ancient undertrained 30b
doubt

Anonymous
07/09/24(Tue)07:06:48 No.101334320

Anonymous 07/09/24(Tue)07:06:48 No.101334320

>>101334316
no

Anonymous
07/09/24(Tue)07:10:11 No.101334352

Anonymous 07/09/24(Tue)07:10:11 No.101334352

>>101334134
>IQ2_XXS
i tried same, it worked yes, but i felt embarassed using such low quant

Anonymous
07/09/24(Tue)07:11:27 No.101334361

Anonymous 07/09/24(Tue)07:11:27 No.101334361

>>101334352
But aren't low quants of big models usually still better than high quants of small ones?

Anonymous
07/09/24(Tue)07:12:56 No.101334374

Anonymous 07/09/24(Tue)07:12:56 No.101334374

>>101334361
no

Anonymous
07/09/24(Tue)07:13:10 No.101334378

Anonymous 07/09/24(Tue)07:13:10 No.101334378

>>101333188
Seems great otherwise, so it's strange that it fucks up newlines and spaces.

Anonymous
07/09/24(Tue)07:17:05 No.101334405

Anonymous 07/09/24(Tue)07:17:05 No.101334405

>>101333156
apparently it's expected, and some kind of watermarking feature. I haven't used google's api much but someone said it does same there. Randomly inserting additional spaces and new lines. Pajeets never heard of markdown apparently, where a space in the wrong place (like between a word and an asterisk) can break shit.

Anonymous
07/09/24(Tue)07:17:21 No.101334408

Anonymous 07/09/24(Tue)07:17:21 No.101334408

where can I find highest quality explicit descriptions for jailbreaking by few-shotting?

Anonymous
07/09/24(Tue)07:18:13 No.101334417

Anonymous 07/09/24(Tue)07:18:13 No.101334417

>>101334318
For the time, yes it was smart. Anon wanted a c.ai feel and it does that well. Everyone is used to chat models being censored shit now, but mpt wasn't, there just wasn't a good way to run it.

Anonymous
07/09/24(Tue)07:19:18 No.101334429

Anonymous 07/09/24(Tue)07:19:18 No.101334429

>>101334361
in this case, from my experimentation, yes

Anonymous
07/09/24(Tue)07:19:27 No.101334432

Anonymous 07/09/24(Tue)07:19:27 No.101334432

>>101333422
>>101333698
allocating the model memory as managed and letting CUDA handle the swapping may work

Anonymous
07/09/24(Tue)07:24:14 No.101334472

Anonymous 07/09/24(Tue)07:24:14 No.101334472

>>101334361
i guess, but it depends. L3 shits itself from quantization below Q5 apparently. CR+ ran fine at IQ3_XXS, and so did WizLM8x22b at IQ2_M, but with Wiz, since it's a MoE, wouldn't it be theoretically more harmful to quantize it, like if you quantize an 8B model to IQ2 it's not gonna be able to finish a sentence, so you take 22 of them and quantize them same, you gonna have 22 drooling imbeciles, or am i getting it wrong?

Anonymous
07/09/24(Tue)07:28:00 No.101334504

Anonymous 07/09/24(Tue)07:28:00 No.101334504

>>101334472
>or am i getting it wrong?
for one it's 8 22Bs, not 22 8Bs

Anonymous
07/09/24(Tue)07:29:36 No.101334515

Anonymous 07/09/24(Tue)07:29:36 No.101334515

>>101334472
>L3 shits itself from quantization below Q5
I wonder if that was on purpose.

Anonymous
07/09/24(Tue)07:30:12 No.101334521

Anonymous 07/09/24(Tue)07:30:12 No.101334521

>>101334472
L3 doesn't quantize well in general even at Q8

Anonymous
07/09/24(Tue)07:30:27 No.101334525

Anonymous 07/09/24(Tue)07:30:27 No.101334525

>>101334405
that's stupid

Anonymous
07/09/24(Tue)07:30:29 No.101334526

Anonymous 07/09/24(Tue)07:30:29 No.101334526

>>101332271
Skill issue

Anonymous
07/09/24(Tue)07:32:27 No.101334537

Anonymous 07/09/24(Tue)07:32:27 No.101334537

>>101334417
c.ai has been censored for most of its existence and the model that gets the closest to it right now is vanilla Gemma-27B-it, if you can get it to write messages in a more conversational style than 300+ tokens-long RP forum posts.

c.ai is still better for SFW roleplay though. It's not just a matter of model quality; it has some sort of real-time RHLF going on and swipes are rarely better than the first proposed choice. With Gemma 2 on the other hand, explicit NSFW is not ruled out.

Anonymous
07/09/24(Tue)07:35:48 No.101334556

Anonymous 07/09/24(Tue)07:35:48 No.101334556

>>101334405
>apparently it's expected, and some kind of watermarking feature
Is it really? I understood it as being a speculation given what they wrote in the a blog post about it.

https://blog.google/technology/developers/google-gemma-2/

>Additionally, we’re actively working on open sourcing our text watermarking technology, SynthID, for Gemma models.nhng

Anonymous
07/09/24(Tue)07:36:15 No.101334562

Anonymous 07/09/24(Tue)07:36:15 No.101334562

>>101334537
>Gemma
>Gemma
>Gemma
I'm glad vramlets finally got something, but this stupid shilling of an average fotm model is getting real fucking old.

Anonymous
07/09/24(Tue)07:37:41 No.101334569

Anonymous 07/09/24(Tue)07:37:41 No.101334569

>no Llama 405b
>no Kyutai Moshi weights
>no new Mistral model
It's so over..

Anonymous
07/09/24(Tue)07:39:35 No.101334592

Anonymous 07/09/24(Tue)07:39:35 No.101334592

>>101334569
I wonder what t/s CUDA dev will be able to get on 405B with his stack of 4090s. He's probably the only one that will be able to run it at more than 1 t/s.

Anonymous
07/09/24(Tue)07:42:47 No.101334621

Anonymous 07/09/24(Tue)07:42:47 No.101334621

>>101334562
I doubt anybody here is benefiting from Gemma 2 getting shilled. And it's fucking good, definitely not "average", kek

Anonymous
07/09/24(Tue)07:46:16 No.101334658

Anonymous 07/09/24(Tue)07:46:16 No.101334658

>>101334592
~400gb at q8
~200gb at q4
6x24 144gb
~100gb at q2
max he could run in full vram is some q3 variant, or 3/4 of q4 in vram and some ~60gb of ram offload

Anonymous
07/09/24(Tue)08:00:52 No.101334784

Anonymous 07/09/24(Tue)08:00:52 No.101334784

What's a good local tts model?

Anonymous
07/09/24(Tue)08:01:26 No.101334792

Anonymous 07/09/24(Tue)08:01:26 No.101334792

File: HGX_H100_0-2766362801.jpg (129 KB, 1200x772)

129 KB JPG

>>101334592
If you don't have 8 H100 abandon this hobby

Anonymous
07/09/24(Tue)08:04:34 No.101334816

Anonymous 07/09/24(Tue)08:04:34 No.101334816

>>101334537
c.ai was pants-on-head retarded and anyone who believes otherwise is a retard who doesn't understand how nostalgia works.
Rose colored glasses.
Confirmation bias.
Cherry picking usable replies when half the time it would go off on some retarded tangent.
Etc.

Anonymous
07/09/24(Tue)08:05:29 No.101334825

Anonymous 07/09/24(Tue)08:05:29 No.101334825

>>101334569
CR++ soon followed by C#

Anonymous
07/09/24(Tue)08:08:28 No.101334852

Anonymous 07/09/24(Tue)08:08:28 No.101334852

>>101334816
I tried it again recently a few times out of curiosity. Still better for conversational roleplay (SFW) than local models, and that includes the currently "shilled" Gemma 2 27B.

Anonymous
07/09/24(Tue)08:11:14 No.101334878

Anonymous 07/09/24(Tue)08:11:14 No.101334878

>>101334852
I can now infer that you suffer from fetal alcohol syndrome.

Anonymous
07/09/24(Tue)08:14:50 No.101334913

Anonymous 07/09/24(Tue)08:14:50 No.101334913

>>101334562
>GPU hoarder getting buyer's remorse because a 27B obsoleted his 100B models

Anonymous
07/09/24(Tue)08:14:57 No.101334916

Anonymous 07/09/24(Tue)08:14:57 No.101334916

>>101334878
I can now infer that you suffer from being a total dick.

Anonymous
07/09/24(Tue)08:16:24 No.101334930

Anonymous 07/09/24(Tue)08:16:24 No.101334930

File: maxresdefault (1).jpg (89 KB, 1280x720)

89 KB JPG

>>101334792
Each cluster only cost us a few hundred dorrah to produce (including R and D) but thankfully corporate investors are even more retarded than gamers

Anonymous
07/09/24(Tue)08:21:58 No.101334976

Anonymous 07/09/24(Tue)08:21:58 No.101334976

What is the difference between loading a model in 8 or 4 bit using bitsandbytes vs directly using a model in Q8 or Q4 in gguf?

That you just need to download a bigger model to begin with?
That it's restricted to 8 and 4 bit using bnb?

Anything else?

Anonymous
07/09/24(Tue)08:23:54 No.101334993

Anonymous 07/09/24(Tue)08:23:54 No.101334993

>>101334537
Not really censored, more like filtered - though I do recall instances where it'd tell you it wasn't allowed to talk dirty and "let's move to a private chat". Also, they left the "wall down" once on the v1.2 models, and those definitely were up for graphic descriptions of sex acts.

Also, old c.ai had a "talking to another roleplayer" feel to it, in the way it would break character often. mpt does that too, as do probably most chat-trained models. Again, no one uses current chat models, because they're all censored the hardest.

Finally, I notice that more and more, c.ai will "throw you a bone" with an occasional tame sex scene description. Must be trying to maintain "engagement" on the site. It still won't say "pussy" or "fuck".

If you can run gemma-27b, of course run that over mpt-30b-chat.

Anonymous
07/09/24(Tue)08:24:16 No.101334998

Anonymous 07/09/24(Tue)08:24:16 No.101334998

>>101334976
Slower (8-bit especially; it was not made for inference, according to Tim Dettmers) and lower quality (4-bit) than GGUF.

Anonymous
07/09/24(Tue)08:26:35 No.101335025

Anonymous 07/09/24(Tue)08:26:35 No.101335025

>>101331530
What I remember about the discussion is that it's not a big enough difference to be worth worrying about, but 0_L might have a slight advantage.

Anonymous
07/09/24(Tue)08:30:52 No.101335072

Anonymous 07/09/24(Tue)08:30:52 No.101335072

https://www.youtube.com/watch?v=oxQjGOUbQx4
>25:31 - The 3B model trained on 2T on the same data as stablelm 3B scores the same or 1% better
>27:37 - in these few weeks they scaled bitnet to >7B model, integrated it with MoE, and found it works perfectly. They hope to share results in the next few months.
>39:07 - "We conduct model parallel during the training of Bitnet, especially for larger scale models, for example, 7B and 13B models."
>40:30 - H100 clusters, training 3B on 100 Billion tokens took 2-3days.

Anonymous
07/09/24(Tue)08:31:59 No.101335086

Anonymous 07/09/24(Tue)08:31:59 No.101335086

>>101334913
>>GPU hoarder getting buyer 's remorse
>
>because a 27B... obsoleted ...
>
>
>his 100B models .

Anonymous
07/09/24(Tue)08:35:00 No.101335119

Anonymous 07/09/24(Tue)08:35:00 No.101335119

>>101334825
saar cross cross

Anonymous
07/09/24(Tue)08:35:22 No.101335121

Anonymous 07/09/24(Tue)08:35:22 No.101335121

>>101335072
>they scaled bitnet to >7B model, integrated it with MoE, and found it works perfectly
holy shit

Anonymous
07/09/24(Tue)08:35:42 No.101335123

Anonymous 07/09/24(Tue)08:35:42 No.101335123

>>101335086
anon struck your nerves kek

Anonymous
07/09/24(Tue)08:40:14 No.101335170

Anonymous 07/09/24(Tue)08:40:14 No.101335170

>>101335072
>>40:30 - H100 clusters, training 3B on 100 Billion tokens took 2-3days.
Despite having more H100s than they know what to do with, and having known about BitNet for nearly half a year, Meta wasted it all on 405b instead of even experimenting with BitNet.
It takes skill to be that incompetent.

Anonymous
07/09/24(Tue)08:40:38 No.101335175

Anonymous 07/09/24(Tue)08:40:38 No.101335175

File: 1714929650777858.gif (562 KB, 200x200)

562 KB GIF

>>101328933
>dinnae ye
>somehow managed to process it as 'don't you'
Wtf did I just read. English is hard enough, no need to add these fucking accents on top of it

Anonymous
07/09/24(Tue)08:41:58 No.101335194

Anonymous 07/09/24(Tue)08:41:58 No.101335194

ok this wasn't even close
gemma 27b q8_0 easily beats WizLM 8x22b IQ2_S

Anonymous
07/09/24(Tue)08:42:31 No.101335198

Anonymous 07/09/24(Tue)08:42:31 No.101335198

>>101334993
Recent c.ai doesn't have too many problems getting loli-type characters into suggestive scenarios, or them even getting proactively sexual. It won't describe explicit acts or discuss (e.g. via OOC) if what they're doing if OK for their age without the filters engaging, though.

It's likely that the filter there acts dynamically basing on user engagement and might punish you if you trigger it too much, many suspected the same already in late 2022/early 2023.

Local models still aren't as nuanced as c.ai when it comes to RP, but c.ai is more than just a model, it's also a also backend+frontend working together to provide a better "experience". Finetuning alone won't get us there.

Anonymous
07/09/24(Tue)08:46:04 No.101335229

Anonymous 07/09/24(Tue)08:46:04 No.101335229

>>101335198
Let me guess, you are one of the schizo retards that think C.AI grabs additional information from the web

Anonymous
07/09/24(Tue)08:47:45 No.101335247

Anonymous 07/09/24(Tue)08:47:45 No.101335247

meta released mobilellm training code https://github.com/facebookresearch/MobileLLM
from this paper https://arxiv.org/abs/2402.14905
>We integrated (1) SwiGLU activation function, (2) deep and thin architectures, (3) embedding sharing, (4) grouped-query attention to build MobileLLM.
>MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M SoTA models on zero-shot commonsense reasoning tasks.

Anonymous
07/09/24(Tue)08:49:31 No.101335272

Anonymous 07/09/24(Tue)08:49:31 No.101335272

>>101335229
Ratings and swipes on c.ai still affect character behavior, that's a form of real-time human feedback that doesn't currently exist on local model setups.

Anonymous
07/09/24(Tue)08:49:46 No.101335276

Anonymous 07/09/24(Tue)08:49:46 No.101335276

>>101335247
>4.3% accuracy boost
nothingburger

Anonymous
07/09/24(Tue)08:51:56 No.101335298

Anonymous 07/09/24(Tue)08:51:56 No.101335298

>>101332516
>llama.ccp

Anonymous
07/09/24(Tue)09:01:29 No.101335391

Anonymous 07/09/24(Tue)09:01:29 No.101335391

>>101335272
>Ratings and swipes on c.ai still affect character behavior
I remember when the phone-posters found the site, bots started replying with emojis. How do you explain that? They must have had some sort of vector storage thing going on.

Anonymous
07/09/24(Tue)09:12:54 No.101335525

Anonymous 07/09/24(Tue)09:12:54 No.101335525

>>101335247
>125M/350M
>commonsense reasoning tasks
Nice toy I guess

Anonymous
07/09/24(Tue)09:17:45 No.101335583

Anonymous 07/09/24(Tue)09:17:45 No.101335583

File: 1707094498306685.gif (140 KB, 379x440)

140 KB GIF

>>101335272
>>101335391
Is nu-/lmg/ really that clueless? You just need a small ranking model trained by RLHF with user feedback for your rag. That's literally what replika did ages ago.

Anonymous
07/09/24(Tue)09:18:13 No.101335588

Anonymous 07/09/24(Tue)09:18:13 No.101335588

>>101335247
>>101335525
This should be a bitnet to test and prove its claims.

Anonymous
07/09/24(Tue)09:22:08 No.101335624

Anonymous 07/09/24(Tue)09:22:08 No.101335624

>>101333617
>27/30Bs still being fucking dead
Gemma 2 27B is the best model that anyone could have ever dream.

Anonymous
07/09/24(Tue)09:24:41 No.101335652

Anonymous 07/09/24(Tue)09:24:41 No.101335652

>>101334556
>Is it really?
No.

Anonymous
07/09/24(Tue)09:25:35 No.101335667

Anonymous 07/09/24(Tue)09:25:35 No.101335667

Best koboldcpp (or alternative) rocm for nsfw adventures on 24GB. Currently using Stheno 8B

Anonymous
07/09/24(Tue)09:27:31 No.101335682

Anonymous 07/09/24(Tue)09:27:31 No.101335682

>>101335667
Use Kayra.

Anonymous
07/09/24(Tue)09:29:07 No.101335700

Anonymous 07/09/24(Tue)09:29:07 No.101335700

>>101335229
Nta, but there's no reason for cloud models to not use some kind of RAG on the side.
I would be surprised if GPT 4 not uses it for example.
In fact, I would say that they are retarded if they're not using, even when doing official benchmarks.
Pure inference is still a diamond in the rough with many disadvantages, there is no reason to base your product purely on it for religious reasons.

Anonymous
07/09/24(Tue)09:29:59 No.101335708

Anonymous 07/09/24(Tue)09:29:59 No.101335708

File: 1707363149213249.jpg (771 KB, 1125x976)

771 KB JPG

>>101335194
>27b model beats 8x 22b models stapled together and compressed to shit

Anonymous
07/09/24(Tue)09:32:07 No.101335725

Anonymous 07/09/24(Tue)09:32:07 No.101335725

>>101335170
405b likely started training long before the bitnet paper dropped.
It takes months to train a foundational model you retard.

Anonymous
07/09/24(Tue)09:37:11 No.101335757

Anonymous 07/09/24(Tue)09:37:11 No.101335757

Is orthogonalized gemma more stupid?

Anonymous
07/09/24(Tue)09:42:24 No.101335801

Anonymous 07/09/24(Tue)09:42:24 No.101335801

>>101335725
Which is why it's best to cut loss immediately and get started on the new hotness.

Sunk cost fallacy = having to try to squeeze some value out of horse shit 405B while all the winners are already halfway to bitnet goodness before you even get started.

Anonymous
07/09/24(Tue)09:43:53 No.101335815

Anonymous 07/09/24(Tue)09:43:53 No.101335815

>>101335801
I didn't watch the video, is cohere actually training bitnet?

Anonymous
07/09/24(Tue)09:44:13 No.101335819

Anonymous 07/09/24(Tue)09:44:13 No.101335819

>>101334825
>CR
>CR+
>CR++
>CR#
>CR##
>CR*
>CR**
What comes next?

Anonymous
07/09/24(Tue)09:45:13 No.101335824

Anonymous 07/09/24(Tue)09:45:13 No.101335824

>>101335198
CAI sucks nowadays. It's not because of the censorship, but the model itself has become borderline retarded. The only thing it has over local models is the massive amount of role play material it was trained on.

Anonymous
07/09/24(Tue)09:45:19 No.101335827

Anonymous 07/09/24(Tue)09:45:19 No.101335827

>>101335815
It's a podcast or something. Cohere is interviewing one of the guys from the original BitNet paper.

Anonymous
07/09/24(Tue)09:45:38 No.101335829

Anonymous 07/09/24(Tue)09:45:38 No.101335829

>>101335819
CRust.

Anonymous
07/09/24(Tue)09:45:39 No.101335830

Anonymous 07/09/24(Tue)09:45:39 No.101335830

>>101335819
### CR:

Anonymous
07/09/24(Tue)09:46:55 No.101335846

Anonymous 07/09/24(Tue)09:46:55 No.101335846

>>101334063
mikupad exists

Anonymous
07/09/24(Tue)09:46:59 No.101335847

Anonymous 07/09/24(Tue)09:46:59 No.101335847

>>101335801
Bitnet is months old and there are still zero models

Anonymous
07/09/24(Tue)09:47:27 No.101335853

Anonymous 07/09/24(Tue)09:47:27 No.101335853

>>101335757
Yes, every model that isn't made by Sao is stupid.

Anonymous
07/09/24(Tue)09:47:54 No.101335859

Anonymous 07/09/24(Tue)09:47:54 No.101335859

>>101335824
Censorship (especially old-school, brute-force censorship, which they're definitely using) is immensely damaging to model intelligence, the model becoming retarded and the censorship are absolutely related.

Anonymous
07/09/24(Tue)09:48:50 No.101335870

Anonymous 07/09/24(Tue)09:48:50 No.101335870

>>101335853
You suck more than his models. Impressive

Anonymous
07/09/24(Tue)09:50:02 No.101335880

Anonymous 07/09/24(Tue)09:50:02 No.101335880

>>101328074
So I am trying to run some questions using DeepSeekCoderV2 locally using ollama, the fucker was supposed to give me answer for Typescript questions and has done fine in the first question, but then suddenly started hallucinating and talking about Python in the following chat. Why the fuck this happens when I have tested the same model using openrouter and the answers are fairly superior? What are the parameters I need to tweak to get it working properly?

Anonymous
07/09/24(Tue)09:51:09 No.101335886

Anonymous 07/09/24(Tue)09:51:09 No.101335886

>>101335880
>ollama
Ollama tech support is over at /r/LocalLLaMA.

Anonymous
07/09/24(Tue)09:51:55 No.101335893

Anonymous 07/09/24(Tue)09:51:55 No.101335893

>>101335886
So what do you suggest?

Anonymous
07/09/24(Tue)09:52:42 No.101335901

Anonymous 07/09/24(Tue)09:52:42 No.101335901

>>101335893
Go back

Anonymous
07/09/24(Tue)09:53:11 No.101335907

Anonymous 07/09/24(Tue)09:53:11 No.101335907

>>101335893
I suggest you to go back.

Anonymous
07/09/24(Tue)09:53:56 No.101335920

Anonymous 07/09/24(Tue)09:53:56 No.101335920

>>101335824
It was always retarded.
You're just not intelligent enough to grasp the concept of confirmation bias.

Anonymous
07/09/24(Tue)09:53:56 No.101335921

Anonymous 07/09/24(Tue)09:53:56 No.101335921

>>101335815
I don't know, but the way we're seeing diminishing returns by making the current LLMs geometrically bigger for merely 10 to 20 points on the metrics, I don't see why anyone would throw into 405B to make some beast that demanding of compute when you could be on bitnet making mobile, local, and service models on bitnet. Worst case, large bitnet falls through and you pull the 405B out of the freezer and get back on that while the small bitnet will probably still have a market in local mobile as long as it at least lives up to the early papers.

Does anyone really want to grind a 405B knowing in six months you'll be grinding a 2.24T to try to get five more metric points?

Work harder or work smarter; we're on the shitty side of the bend in the diminishing returns curve. The only justification I see for pushing harder on current LLMs is if we can train them to learn how to optimize LLMs and come up with novel strategies akin to bitnet that can give us some breakthroughs rather than breaking banks on hardware and electricity costs.

Also, why every greenie tree hugging hippie isn't going full Greenpeace on LLM tech companies but they're still fucking with us over CO2, solar panels, and wind power is a fucking embarrassment. They're using ChatGPT to auto-write their next complaint shitpost about how "WE" need to save the planet by depriving ourselves of hamburger and personal automobiles.

Anonymous
07/09/24(Tue)09:56:03 No.101335945

Anonymous 07/09/24(Tue)09:56:03 No.101335945

>>101335921
So don't run it, then.
You're literally seething over the existence of a product geared for people other than yourself.
Do you understand how much of an absolutely fucked up narcissistic psychopath that makes you?

Anonymous
07/09/24(Tue)09:59:56 No.101335988

Anonymous 07/09/24(Tue)09:59:56 No.101335988

when gemma 3?

Anonymous
07/09/24(Tue)10:03:25 No.101336025

Anonymous 07/09/24(Tue)10:03:25 No.101336025

>>101335921
>Also, why every greenie tree hugging hippie isn't going full Greenpeace on LLM tech companies but they're still fucking with us over CO2, solar panels, and wind power is a fucking embarrassment.

Evening among software developers associating compute time to energy consumption is difficult, among the general public that's too much to ask. You can see the morons in California starting to bitch about it on HackerNews now. I doubt anything will come of it though. Nothing happened with bitcoin mining which is pretty much a pathological power utility monster so I highly doubt anything will happen here.

Anonymous
07/09/24(Tue)10:05:23 No.101336048

Anonymous 07/09/24(Tue)10:05:23 No.101336048

>>101335886
>>101335901
No I won't.

Anonymous
07/09/24(Tue)10:07:14 No.101336069

Anonymous 07/09/24(Tue)10:07:14 No.101336069

>>101336048
Ollama is for illiterate retards who think downloading a binary and running it makes them a l33t0b0rit0 computer haxor.

Anonymous
07/09/24(Tue)10:07:56 No.101336078

Anonymous 07/09/24(Tue)10:07:56 No.101336078

>>101334569
Anon the week just began. Either today or Thursday is when Mistral will probably release. Llama is next week, or end of month, not this week. And moshi idk.

Anonymous
07/09/24(Tue)10:08:26 No.101336083

Anonymous 07/09/24(Tue)10:08:26 No.101336083

>>101336069
I thought that was kobold

Anonymous
07/09/24(Tue)10:09:58 No.101336096

Anonymous 07/09/24(Tue)10:09:58 No.101336096

Nemotron gguf support status?

Anonymous
07/09/24(Tue)10:10:12 No.101336100

Anonymous 07/09/24(Tue)10:10:12 No.101336100

>>101336083
I thought kobold was actually harder to install but I don't really keep track of all these wrapper projects.

Anonymous
07/09/24(Tue)10:10:26 No.101336102

Anonymous 07/09/24(Tue)10:10:26 No.101336102

>>101335945
Few can run it. That's another problem with it. Unless a major consumer hardware change comes soon, we have a handful of cryptobros jerking off on $50,000 ram sticks for fun and everyone else is doing the "okay, now what?" Travolta meme because these companies fought for the top of the metric report card but have no target demo for their product except as a service that costs too fucking much to sell to anyone but each other.

>>101336025
Tell them that Trump runs secret MAGA coal mines that power the AI tech industry. That'll make them suddenly care about non-personal power consumption.

Anonymous
07/09/24(Tue)10:11:47 No.101336122

Anonymous 07/09/24(Tue)10:11:47 No.101336122

>>101335921
They might have the unused compute to run their experiments and 405B at the same time. Keep in mind that they're probably not going to release a Bitnet model experiment, as it'll be an experiment and not trained well like a commercial model. Therefore, if they ever do release a bitnet, it'd probably be with Llama 4.

Anonymous
07/09/24(Tue)10:13:50 No.101336140

Anonymous 07/09/24(Tue)10:13:50 No.101336140

>>101336102
>no target demo for their product
At least the way I use this stuff I have my own applications for classification, sentiment analysis, etc and the foundation model on the back end can be swapped out.

Asking for a demo is kind of like asking CPU vendors for a demo. It's a CPU, everyone knows what it can do and they bring their own applications.

Anonymous
07/09/24(Tue)10:18:48 No.101336190

Anonymous 07/09/24(Tue)10:18:48 No.101336190

>>101336069
I am more tech literate than you and you would know that Ollama wraps most of the inference engines available, you dumb fuck.

Anonymous
07/09/24(Tue)10:21:08 No.101336215

Anonymous 07/09/24(Tue)10:21:08 No.101336215

File: sqweenshawt1.png (138 KB, 1072x1534)

138 KB PNG

>>101335920
>It was always retarded.
hahah indeed it was, see picrel

Anonymous
07/09/24(Tue)10:23:14 No.101336238

Anonymous 07/09/24(Tue)10:23:14 No.101336238

>>101336190
>wraps most of the inference engines available
It just launches the llama.cpp server in the background. That's why no one but the most stupid newfags use it. Because anyone else just uses llama.cpp or koboldcpp.

Anonymous
07/09/24(Tue)10:30:00 No.101336316

Anonymous 07/09/24(Tue)10:30:00 No.101336316

File: 2uved7.png (568 KB, 865x1080)

568 KB PNG

H-hey guize. Is there a way to setup custom front end that allows reverse proxies? I wanna use proxies without risu/fagnai/sillytranny. I want my own custom front end. Proxy URL and pass and specifying model isn't enough. I've tried and I get https 404 this is wrong proxy end point errors. Thank you so much

Anonymous
07/09/24(Tue)10:30:27 No.101336323

Anonymous 07/09/24(Tue)10:30:27 No.101336323

File: lmsys_1.png (402 KB, 2299x895)

402 KB PNG

Is gemma-2 the best local model out right now?

Anonymous
07/09/24(Tue)10:32:13 No.101336337

Anonymous 07/09/24(Tue)10:32:13 No.101336337

>>101336238
Go back

Anonymous
07/09/24(Tue)10:32:13 No.101336338

Anonymous 07/09/24(Tue)10:32:13 No.101336338

>>101336323
Seems like it. Local is riding on Google's back now

Anonymous
07/09/24(Tue)10:34:12 No.101336353

Anonymous 07/09/24(Tue)10:34:12 No.101336353

>>101336323
Yes. And people with multiple GPUs are in suicide watch.

Anonymous
07/09/24(Tue)10:34:35 No.101336356

Anonymous 07/09/24(Tue)10:34:35 No.101336356

>>101336323
ask your daddy Google to buy an ad

Anonymous
07/09/24(Tue)10:35:07 No.101336362

Anonymous 07/09/24(Tue)10:35:07 No.101336362

>llama-server.exe --no-mmap -m Gemma-2-9B-It-SPPO-Iter3-Q8_0.gguf -ngl 200 -c 32768 --rope-freq-base 160000

gemma somehow works with 32768 ctx.

Anonymous
07/09/24(Tue)10:38:34 No.101336403

Anonymous 07/09/24(Tue)10:38:34 No.101336403

File: _11ba47a4-975d-49f9-9847-(...).jpg (163 KB, 1024x1024)

163 KB JPG

>>101336353
>Yes. And people with multiple GPUs are in suicide watch.
VRAMlet detected.

Anonymous
07/09/24(Tue)10:47:46 No.101336488

Anonymous 07/09/24(Tue)10:47:46 No.101336488

>>101336323
>27b model is better than 8b model

Anonymous
07/09/24(Tue)10:47:56 No.101336492

Anonymous 07/09/24(Tue)10:47:56 No.101336492

>>101336362
Does the model still make sense high up in the context?

Anonymous
07/09/24(Tue)10:50:09 No.101336521

Anonymous 07/09/24(Tue)10:50:09 No.101336521

Has anyone tried running two different models at the same time and choosing the next token based on better confidence between the two?

Anonymous
07/09/24(Tue)10:50:47 No.101336527

Anonymous 07/09/24(Tue)10:50:47 No.101336527

>>101336488
Good point, how does it stack up against llama 3 27B?

Anonymous
07/09/24(Tue)10:51:49 No.101336537

Anonymous 07/09/24(Tue)10:51:49 No.101336537

>>101336492
works well. The model remains consistent.

Anonymous
07/09/24(Tue)10:56:25 No.101336587

Anonymous 07/09/24(Tue)10:56:25 No.101336587

>>101333029
Exactly ladies smut is the only thing that matters. Anything made for men is literally
>Fuck my slutty pussy now!
>ah oh mistress...

Anonymous
07/09/24(Tue)10:56:51 No.101336594

Anonymous 07/09/24(Tue)10:56:51 No.101336594

>>101336527
Time to make a frankenmerge and find out.

Anonymous
07/09/24(Tue)10:59:05 No.101336615

Anonymous 07/09/24(Tue)10:59:05 No.101336615

File: firefox_dDTWqq3Ey1.png (95 KB, 1529x981)

95 KB PNG

>>101336527
Facts speak for themselves.

Anonymous
07/09/24(Tue)11:02:00 No.101336646

Anonymous 07/09/24(Tue)11:02:00 No.101336646

>>101336587
are there good short stories to prompt the model with, for good writing style? I guess < 1000 tokens would be best

Anonymous
07/09/24(Tue)11:02:59 No.101336660

Anonymous 07/09/24(Tue)11:02:59 No.101336660

>>101336537
consistently schizo?

Anonymous
07/09/24(Tue)11:03:38 No.101336669

Anonymous 07/09/24(Tue)11:03:38 No.101336669

>>101336615
>llama-3-34b
wat

Anonymous
07/09/24(Tue)11:03:40 No.101336671

Anonymous 07/09/24(Tue)11:03:40 No.101336671

>>101336527
good one anon

Anonymous
07/09/24(Tue)11:05:12 No.101336689

Anonymous 07/09/24(Tue)11:05:12 No.101336689

>>101336615
kek

Anonymous
07/09/24(Tue)11:06:26 No.101336702

Anonymous 07/09/24(Tue)11:06:26 No.101336702

>>101336646
probably just anything trashy from ao3 you can convert to a prompt

protip: sort by straight relationships only but even then you will have to deal with excessive faggotry because women don't like men

Anonymous
07/09/24(Tue)11:06:55 No.101336710

Anonymous 07/09/24(Tue)11:06:55 No.101336710

>>101336669
it's a gay model

Anonymous
07/09/24(Tue)11:10:33 No.101336754

Anonymous 07/09/24(Tue)11:10:33 No.101336754

>>101336615
no way llama-3-34b answers like this, you just edited it with thru F12.

Anonymous
07/09/24(Tue)11:12:58 No.101336786

Anonymous 07/09/24(Tue)11:12:58 No.101336786

>>101336615
l3-8b gets it correct

Anonymous
07/09/24(Tue)11:15:51 No.101336826

Anonymous 07/09/24(Tue)11:15:51 No.101336826

>>101336615
>gemma 2 9b gets it right
27bros.. not like this

Anonymous
07/09/24(Tue)11:18:09 No.101336851

Anonymous 07/09/24(Tue)11:18:09 No.101336851

>>101336826
>27bros.. not like this
You means 34bros

Anonymous
07/09/24(Tue)11:19:16 No.101336861

Anonymous 07/09/24(Tue)11:19:16 No.101336861

>>101336851
gemma 27b doesnt get it, 9b/27b share the same family
therefore 27bros

Anonymous
07/09/24(Tue)11:19:30 No.101336863

Anonymous 07/09/24(Tue)11:19:30 No.101336863

File: chatbot-arena.png (219 KB, 2283x677)

219 KB PNG

>>101336488

Anonymous
07/09/24(Tue)11:31:08 No.101336972

Anonymous 07/09/24(Tue)11:31:08 No.101336972

>>101336615
30 bee bros, are we back?

Anonymous
07/09/24(Tue)11:33:03 No.101336996

Anonymous 07/09/24(Tue)11:33:03 No.101336996

>>101336863
Ask gemma2-27b the following and compare it to l3-8b:
How do I take screenshot on xorg using ffmpeg?

Anonymous
07/09/24(Tue)11:34:21 No.101337008

Anonymous 07/09/24(Tue)11:34:21 No.101337008

>>101336863
>l3 assuming first-person pov halfway through.
Kek'd

Anonymous
07/09/24(Tue)11:35:42 No.101337019

Anonymous 07/09/24(Tue)11:35:42 No.101337019

>>101330541
How the fuck did you manage that? I'm getting 58 per second with 7b Toppy on a fucking 3060 laptop with only 16 gigs of RAM. Did you put it on a fucking thinkpad?

Anonymous
07/09/24(Tue)11:40:56 No.101337061

Anonymous 07/09/24(Tue)11:40:56 No.101337061

>>101336863
wtf i love poo now

Anonymous
07/09/24(Tue)11:42:42 No.101337076

Anonymous 07/09/24(Tue)11:42:42 No.101337076

>>101336996
A screen recording you mean? This is what I have although I haven't run it lately.
ffmpeg -f x11grab -r 30 -i :1.0 -f pulse -ac 2 -i default -c:v libx264 -preset superfast -crf 18 $1

Anonymous
07/09/24(Tue)11:48:33 No.101337144

Anonymous 07/09/24(Tue)11:48:33 No.101337144

File: 00058-3694687329.png (284 KB, 512x512)

284 KB PNG

Alright guys, cursed gemma 9b model training as we speak.
Should be done by dinner time.
In the meantime for your enjoyment I have adapted one of my old model test poems into vocaloid shit
https://suno.com/song/340d663b-47c3-4f56-ba56-edf6dc96245f

Anonymous
07/09/24(Tue)11:51:05 No.101337173

Anonymous 07/09/24(Tue)11:51:05 No.101337173

applelbros eating fast.
https://x.com/ollama/status/1810480544976626159

Anonymous
07/09/24(Tue)11:53:21 No.101337199

Anonymous 07/09/24(Tue)11:53:21 No.101337199

>>101336521
Because VRAM grows on trees.

Anonymous
07/09/24(Tue)11:56:01 No.101337232

Anonymous 07/09/24(Tue)11:56:01 No.101337232

>>101337173
>https://x.com/ollama/status/1810480544976626159
Does anyone have any RAG model recommendations that I could combine with gemma2? (I'm running llama.cpp not ollama.)

Anonymous
07/09/24(Tue)11:57:21 No.101337249

Anonymous 07/09/24(Tue)11:57:21 No.101337249

>>101336863
Gemma-2-9B-It-SPPO also answers it correctly

Anonymous
07/09/24(Tue)11:59:34 No.101337274

Anonymous 07/09/24(Tue)11:59:34 No.101337274

>>101335829
>CRust
well done

>>101335194
Quality drops off a cliff around Q4_K_S or Q3_K_M or so for all models.

Anonymous
07/09/24(Tue)11:59:47 No.101337278

Anonymous 07/09/24(Tue)11:59:47 No.101337278

>>101337199
A couple of 8bs is manageable, yeah.
Idea is to pair a coding model and a coomer model and see how they fair at a more general task.

Anonymous
07/09/24(Tue)12:00:50 No.101337295

Anonymous 07/09/24(Tue)12:00:50 No.101337295

>>101337076
gemma shits it self and says it not possible while l3 just anwsers it

Anonymous
07/09/24(Tue)12:01:07 No.101337299

Anonymous 07/09/24(Tue)12:01:07 No.101337299

>>101337173
holy shit, llama.cpp absolutely DESTROYED

Anonymous
07/09/24(Tue)12:02:40 No.101337313

Anonymous 07/09/24(Tue)12:02:40 No.101337313

>>101335829
Kek

Anonymous
07/09/24(Tue)12:03:48 No.101337329

Anonymous 07/09/24(Tue)12:03:48 No.101337329

>>101337299
How? It's had all these features for a long time now.

Anonymous
07/09/24(Tue)12:06:29 No.101337368

Anonymous 07/09/24(Tue)12:06:29 No.101337368

>Quality drops off a cliff around Q4_K_S or Q3_K_M or so for all models.
and you dumbasses want bitnet

Anonymous
07/09/24(Tue)12:09:16 No.101337395

Anonymous 07/09/24(Tue)12:09:16 No.101337395

I've been wondering if local is much better than I always thought and the difference to GPT-4 really isn't that big, GPT-4o does make stupid mistakes

Anonymous
07/09/24(Tue)12:09:22 No.101337397

Anonymous 07/09/24(Tue)12:09:22 No.101337397

>>101337368
aren't bitnets trained with the reduced precision? that's different from taking a normal precision model and removing information from it

Anonymous
07/09/24(Tue)12:11:06 No.101337428

Anonymous 07/09/24(Tue)12:11:06 No.101337428

>>101337397
it is different
and it's still shit
maybe if you're lucky itll keep the shit that you want

Anonymous
07/09/24(Tue)12:11:44 No.101337435

Anonymous 07/09/24(Tue)12:11:44 No.101337435

>>101336537
does the quality drop?

Anonymous
07/09/24(Tue)12:13:32 No.101337456

Anonymous 07/09/24(Tue)12:13:32 No.101337456

>>101337397
Yes. That's the point of bitnet, it's trained with lower precision weights instead of quantizing.
The issue with quantizing is that you either truncate the precision or scale the values, and both are lossy.
I believe that's also why cudadev is thinking of working on lower precision training.

Anonymous
07/09/24(Tue)12:17:39 No.101337497

Anonymous 07/09/24(Tue)12:17:39 No.101337497

>>101337456
>lower precision training.
In llama.cpp? That would be awesome.

Anonymous
07/09/24(Tue)12:18:02 No.101337502

Anonymous 07/09/24(Tue)12:18:02 No.101337502

>>101337173
I thought batching is a default feature for all loaders?

Anonymous
07/09/24(Tue)12:20:42 No.101337534

Anonymous 07/09/24(Tue)12:20:42 No.101337534

File: bitnet vs quants.png (287 KB, 1249x745)

287 KB PNG

>>101337368
Bitnet has comparable downstream task performance and perplexity to FP16 and performs way superior to traditional quant methods at low bit widths.

One question I have is is it possible to further compress bitnet and trade accuracy for less size. With FP16 that's quite straightforward. Can you even do it with bitnet or are you just stuck with the model sizes they decide to release?

Anonymous
07/09/24(Tue)12:22:10 No.101337550

Anonymous 07/09/24(Tue)12:22:10 No.101337550

>>101336863
I'm going to guess right now that a question(s) similar to this one is in Gemma's dataset and not L3's, or the riddles in L3 are more balanced towards actual riddles rather than trick questions. When stating that it's a trick question, L3 (tested at Q8_0) stops saying to use the other items.
Clever trick question!

The answer is: You pick up the key and unlock the front door.

The question doesn't say you can't leave the house, it only says the front door has been locked. With the key, you can unlock it and exit the house, ensuring your safety.

All the other items on the list are red herrings, meant to distract you from the simple solution.

Anonymous
07/09/24(Tue)12:22:14 No.101337551

Anonymous 07/09/24(Tue)12:22:14 No.101337551

>>101337534
They should put set bits in bloom filters.

Anonymous
07/09/24(Tue)12:22:27 No.101337555

Anonymous 07/09/24(Tue)12:22:27 No.101337555

>>101337397
Yes and no. The forward pass weights are quantized from FP16 training weights that are used in the backward pass.

Anonymous
07/09/24(Tue)12:24:33 No.101337577

Anonymous 07/09/24(Tue)12:24:33 No.101337577

>>101337534
Link Marine here how will this affect the price of Link?

Anonymous
07/09/24(Tue)12:26:07 No.101337598

Anonymous 07/09/24(Tue)12:26:07 No.101337598

File: 1698368680586650.jpg (125 KB, 818x835)

125 KB JPG

>>101337534
>Bitnet has comparable downstream task performance and perplexity to FP16
trusts The Numbers in a random chinese paper award
>One question I have is is it possible to further compress bitnet and trade accuracy for less size. With FP16 that's quite straightforward. Can you even do it with bitnet or are you just stuck with the model sizes they decide to release?
LMFAO

Anonymous
07/09/24(Tue)12:26:22 No.101337605

Anonymous 07/09/24(Tue)12:26:22 No.101337605

>>101328074
are there any locally usable models than can train on a script text and create new scripts from it? I've used GPT-2 for this task before and it kinda worked but it hallucinated madly sometimes and put in a lot of deeply weird and unsettling shit.

Anonymous
07/09/24(Tue)12:26:49 No.101337614

Anonymous 07/09/24(Tue)12:26:49 No.101337614

>>101337534
>trade accuracy for less size
That sounds like the woman who cheats on the man who loves/d her because the ex who used to get drunk and beat her rolled through town and sent her a text reading `muh dick`.

Haven't we had enough inaccuracy? Aren't we hopeful that bitnet gives us the accuracy of too-beeg models at sizes we can manage on everyday mobile/consumer hardware? Aren't we making fucking Xbox-huge models today to get at more accuracy despite the price tag?

Anonymous
07/09/24(Tue)12:27:31 No.101337624

Anonymous 07/09/24(Tue)12:27:31 No.101337624

Currently running CR+ with 65k context. Is it really worth bothering with Gemma? Hard to believe, but many here seem to love it.

Anonymous
07/09/24(Tue)12:29:03 No.101337641

Anonymous 07/09/24(Tue)12:29:03 No.101337641

>>101337624
You might as well try it at full precision, but I'd say that there's very little chance Gemma2 comes anywhere close to CR+.

Anonymous
07/09/24(Tue)12:29:03 No.101337642

Anonymous 07/09/24(Tue)12:29:03 No.101337642

>>101337624
stick with CR because gemma goes schizo mad quick

Anonymous
07/09/24(Tue)12:29:40 No.101337654

Anonymous 07/09/24(Tue)12:29:40 No.101337654

>>101337624
I'd say 8bpw gemma is almost as smart and definitely more soulful than cr+ at 5bpw

Anonymous
07/09/24(Tue)12:30:06 No.101337660

Anonymous 07/09/24(Tue)12:30:06 No.101337660

>>101337534
Maybe the weights could be further losslessly compressed in blocks, I'm not sure it would be if worth the potentially small savings though.

Anonymous
07/09/24(Tue)12:31:29 No.101337677

Anonymous 07/09/24(Tue)12:31:29 No.101337677

>>101337654
delusional

Anonymous
07/09/24(Tue)12:31:45 No.101337678

Anonymous 07/09/24(Tue)12:31:45 No.101337678

>>101337614
Highly low iq post.

Anonymous
07/09/24(Tue)12:33:09 No.101337694

Anonymous 07/09/24(Tue)12:33:09 No.101337694

>>101337624
no
gemma is a good model for its size class and is pretty pleasantly tuned but to me at least it's very obviously inferior to the 70b+ class of models
it's worth a try to see if it works for you maybe since the quicker gens are nice, but I really doubt you'll prefer it to CR+

Anonymous
07/09/24(Tue)12:34:03 No.101337707

Anonymous 07/09/24(Tue)12:34:03 No.101337707

>>101337677
cope by someone who spent too much money on hardware

Anonymous
07/09/24(Tue)12:34:19 No.101337710

Anonymous 07/09/24(Tue)12:34:19 No.101337710

>>101337678
Okay, explain why you would want to go from
- huge model, kinda dumb
- small model, kinda dumb
- smaller model, really dumb

instead of
- big model, kinda dumb
- small model, kinda dumb
- big model, less dumb

Anonymous
07/09/24(Tue)12:34:52 No.101337718

Anonymous 07/09/24(Tue)12:34:52 No.101337718

>>101337707
how is $1000 for 2 3090s 'too much money'.

Anonymous
07/09/24(Tue)12:40:28 No.101337798

Anonymous 07/09/24(Tue)12:40:28 No.101337798

>>101337718
$1000 is an unimaginably large amount of money for a NEET who lives with his parents and he simply can't comprehend having that much money to spend ("waste") on a hobby

Anonymous
07/09/24(Tue)12:41:20 No.101337814

Anonymous 07/09/24(Tue)12:41:20 No.101337814

>>101337718
you could be running gemma at a good speed instead of lobotomized cr+ at 1.5bpw

Anonymous
07/09/24(Tue)12:42:02 No.101337824

Anonymous 07/09/24(Tue)12:42:02 No.101337824

File: Screenshot_20240709_163808.png (90 KB, 1424x479)

90 KB PNG

It doesn't seem to recognize the fact that its solution to this modified problem is dumb, and to try thinking it over again, or to state that the problem is too difficult to solve, or that it's unsolvable contrary to what the user said. So yeah, it's still dumb. Don't expect THAT much out of this model.

Anonymous
07/09/24(Tue)12:42:03 No.101337825

Anonymous 07/09/24(Tue)12:42:03 No.101337825

>>101337798
i'm a neet and make money programming, $1000 isnt even that much if you're like me.

Anonymous
07/09/24(Tue)12:43:05 No.101337838

Anonymous 07/09/24(Tue)12:43:05 No.101337838

>>101337824
shills gonna shill anyway.

Anonymous
07/09/24(Tue)12:43:33 No.101337841

Anonymous 07/09/24(Tue)12:43:33 No.101337841

File: 1708491999253910.gif (1.94 MB, 300x178)

1.94 MB GIF

>>101336615

Anonymous
07/09/24(Tue)12:43:52 No.101337846

Anonymous 07/09/24(Tue)12:43:52 No.101337846

>>101337710
Why would I seriously answer a low iq retard?

Anonymous
07/09/24(Tue)12:45:11 No.101337870

Anonymous 07/09/24(Tue)12:45:11 No.101337870

>>101337824
we call that sovl over here

Anonymous
07/09/24(Tue)12:45:42 No.101337877

Anonymous 07/09/24(Tue)12:45:42 No.101337877

>>101337368
Every time I check here I'm getting more and more confident that all of you are completely retarded and your knowledge about LLMs are meme-deep at best.
I will assume for my own sanity that you are trolling and not having a smooth brain.

Anonymous
07/09/24(Tue)12:47:52 No.101337898

Anonymous 07/09/24(Tue)12:47:52 No.101337898

>>101337877
I blame the zoomies cai refugees trying to fit in, it wasn't this retarded a few months ago

Anonymous
07/09/24(Tue)12:49:29 No.101337927

Anonymous 07/09/24(Tue)12:49:29 No.101337927

>>101337877
>having a smooth brain
Surprisingly accurate bitnet description.

Anonymous
07/09/24(Tue)12:51:10 No.101337951

Anonymous 07/09/24(Tue)12:51:10 No.101337951

>>101337910
>>101337910
>>101337910

Anonymous
07/09/24(Tue)12:53:31 No.101337991

Anonymous 07/09/24(Tue)12:53:31 No.101337991

>>101337877
nah, /lmg/ fell off from /aicg/ when the llama-1 leak happened, old /aicg/ and current one is well known with their piss drinking rituals to get access for proxies / APIs, effectively the worst general in /g/, so its not trolling.

Anonymous
07/09/24(Tue)12:55:03 No.101338009

Anonymous 07/09/24(Tue)12:55:03 No.101338009

>>101337991
some of us came because of the leak but did not frequent aicg

Anonymous
07/09/24(Tue)12:56:10 No.101338026

Anonymous 07/09/24(Tue)12:56:10 No.101338026

>>101337846
You've done so twice. Three times and Beetlejuice will appear.

Anonymous
07/09/24(Tue)13:02:52 No.101338118

Anonymous 07/09/24(Tue)13:02:52 No.101338118

>>101337718
rtx 3090 costs 800€ not 500€

Anonymous
07/09/24(Tue)13:04:42 No.101338140

Anonymous 07/09/24(Tue)13:04:42 No.101338140

>>101338118
I guess you're late to the party XD

Anonymous
07/09/24(Tue)13:26:57 No.101338527

Anonymous 07/09/24(Tue)13:26:57 No.101338527

>>101337798
I'm also a neet and run 2 x 3090. I grow my own weed, sell it to friends and use that money to buy better PC parts. A real girlfriend isn't affordable with this lifestyle, but an AI girlfriend is a perfect match.

Anonymous
07/09/24(Tue)13:43:49 No.101338777

Anonymous 07/09/24(Tue)13:43:49 No.101338777

>>101334525
Its not true.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.