[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101258576 & >>101250468

►News
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101258576

--Papers: >>101267676 >>101267806
--KLD Tests Reveal Insights on Quantization Methods and Model Performance: >>101265037 >>101265051 >>101265080 >>101265240
--Ooba Error: Procedure Entry Point Not Found in llama.dll: >>101260958 >>101262849 >>101263260 >>101263658 >>101263936
--Gemma-2-27b-it-GGUF Model Prompt Format and Capabilities: >>101264232 >>101264314 >>101264365 >>101264571 >>101264613 >>101264682 >>101264754 >>101265316 >>101264279 >>101264353 >>101264357
--Imatrix Quantization Causes High Memory Usage in Koboldcpp: >>101264029 >>101264113 >>101264160 >>101264167 >>101264185 >>101265195 >>101268078
--Anon Rants About Overtrained ERP Phrases in Language Models: >>101264497 >>101265374 >>101265439 >>101265504 >>101265898
--Q8 Quantization: A Viable Alternative to FP16 for Embed and Output Layers: >>101266919 >>101267029 >>101267191
--Claude's Internal Monologue Revealed with Tag Prompt Hack: >>101267435
--Running AI Models on Low VRAM: Expectations and Limitations: >>101260721 >>101260752 >>101260786 >>101260860 >>101260817
--LLaMA Benchmarks and CPU vs GPU Performance: >>101258990 >>101259063 >>101259272
--Introducing Diffusion Forcing: Unifying Next-Token Prediction and Full-Seq Diffusion: >>101259322 >>101259924 >>101259991 >>101260307 >>101260564 >>101260601 >>101260700 >>101260711 >>101260621 >>101260700
--Gemma 27B's Temperature Stability and Coherent Writing: >>101259737 >>101259782 >>101259842 >>101260199 >>101260241 >>101260322 >>101260805 >>101260989 >>101261096 >>101259880
--Alignment Lab's Approach to NSFW Content in Finetunes: >>101263679
--Running on Device: New Open-Source AI Model for Offline Use: >>101260535 >>101260546 >>101260874 >>101260987 >>101261006 >>101261167
--Orin AGX Liquidation: Worth it for the Memory and Accelerators?: >>101259163 >>101259212 >>101259234
--Miku (free space): >>101258845 >>101259283 >>101259715

►Recent Highlight Posts from the Previous Thread: >>101258584
>>
>>101268182
>--Gemma-2-27b-it-GGUF Model Prompt Format and Capabilities
That's misleading. That guy got it wrong. As you can read in that chain.
>>
will there be SPPO-Iter3 for gemma-2 27b too?
>>
I don't get why Gemma would have Claudeslop. Doesn't make sense for them to illegally train on Claude outputs when they have Gemini, probably even bigger and better versions than publicly available.
>>
>>101268497
do you feel it really improves the model?
>>
>>101268568
Maybe, just maybe
>>
>>101268568
Since people like Claude's output, it makes sense. Google do so because it is what people generally prefer.
>>
>>101268574
is more creative at writing.
>>
>>101268609
The implementations are still young...
>>
Is llama 3 7b acceptable for understanding a text? what about gemma2 7b?
>>
File: 1700875732878632.png (29 KB, 534x261)
29 KB
29 KB PNG
the post that killed lmg underages
>>
>>101268784
Understanding as in identifying words? Yes.
Understanding context? Probably about as well as a grade schooler.
>>
>>101268798
I mean like a paper so it can rewrite the content in other ways, implications, etc. Have only tried claude for that is really good, but I think they save data just like openai, so papers on the write I only trust local, still got no power to run better than 7b.
>>
>>101268855
If you're using any kind of service that isn't self-hosted, you can bet everything you give them is being logged
7b models might be capable of re-wording stuff to skirt plagiarism checks or something, but if you need it to be able to actually understand complex topics then you probably won't be happy with the result.
>>
>>101268855
llama3-8B should be good enough for that, you just need a solid prompt and maybe few shots it
>>
>>101268914
yeah I think the same, because of that looks quite bad to "release" a paper as log before really publishing it.
>>101268959
I'll try it, is any advantage of using a quantized version of it? (I head PHI behaves weirdly) And where can I obtain? Any performance difference between kobold and llama.cpp?
>>
My Gemma kobold.cpp often crashes with this error: https://github.com/ggerganov/llama.cpp/issues/8246

It only happens after I changing something in the context and then regenerating, but not always. I haven't found a pattern yet. Anyone else experiencing this or have an idea what to try?

inb4 another gguf regen required
>>
>>101269016
I'm downloading this one
https://huggingface.co/TheBloke/LLaMA-Pro-8B-GGUF
>>
>>101269121
lol
>>
File: 1706687962125.png (172 KB, 460x309)
172 KB
172 KB PNG
>>101269121
>TheBloke
>>
>>101269147
>>101269169
We don't download from him anymore?
>>
>>101269202
Isn't he dead?
>>
File: 1654378079935.png (170 KB, 390x475)
170 KB
170 KB PNG
>>101269202
He disappeared nearly half a year ago, and his now ancient quants are lacking key tweaks and fixes discovered since then.

Also that isn't even llama3, guy.
>>
>>101269121
>Updated 6 months ago.
It's some tencent abomination.
Download the model from meta directly and quant it yourself. They grant access immediately.
>>
>>101269241
>>101269247
That's why I've asked you dudes where I could obtain it T-T
>>101269255
> Quant it yourself
Wouldn't it take a **long** time?
>>
>>101269265
>That's why I've asked you dudes where I could obtain it T-T
Just use google, it's not that difficult
>>
>>101269265
>>101269202
if you really want premade ggufs bart is the new bloke
https://huggingface.co/bartowski
>>
>>101269265
>Wouldn't it take a **long** time?
For an 8B? no. A few minutes (5-10?) on a potato, less on what you're probably running.
>>
>>101269278
> *uses google*
> *download the bloke*
> "you are dummy"
T-T
>>101269287
Thanks anon. Will download this one:
https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
>>101269288
Mine is a potato (compare to what people use for AI), that's why I'm going for a 8b.
>>
>>101269328
>Thanks anon. Will download this one:
since you want to use it for 'regular' stuff, this might be a straight upgrade of l3, if you want to test it
https://huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF
>>
>>101269328
>Mine is a potato (compare to what people use for AI), that's why I'm going for a 8b.
The potato i'm talking about is a ~15 year old amd FX-4170. No gpu, and i convert the models on a vm with 1 cpu running on the same machine. 24gb ram total, 8gb for the vm.
You'll be fine.
>>
>>101269348 (me)
>>101269287 (me)
>>101269288
also forgot to mention, for l3 8b you really want at least q6_k, ideally q8_0 it quickly goes complete retard under that
>>
>>101269348
Thanks will try this one then.
>>101269349
Thanks will try the other anon suggestion first though
>>101269377
kek was downloading Q4_K_M ("recommended")... Restarting the download again for q6...
>>
Can someone give me a good samplers preset that use both smoothing_factor/curve and the DRY parameters?
>>
File: 1682729528395.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>101269408
>>101269408
I genuinely feel sorry for the newcomers, the various getting started and spoonfeeding guides floating around are outdated as hell.
>>
File: 1716694623371833.jpg (205 KB, 1024x770)
205 KB
205 KB JPG
do we have decent settings for gemma figured out?
>>
>>101269408
>Thanks will try the other anon suggestion first though
Fair enough. I just got tired of waiting for someone to quant the models after fixes on llama.cpp. Then there's the people that complain about broken quants after downloading a 6 month old file. Specially true for new models (see gemma2) where 1 day old quants were already outdated by fixes.
Now i just download the models and convert when needed. The download time is longer and uses more storage but i get fresh quants on demand. If you're gonna start using local models more regularly, i'd recommend it.
>>
>>101269495
I thought about writing one, but I feel like it will get outdated pretty quick and I probably won't keep it up to date. Making it just another one in the pile.
>>
>>101268797
take your meds schizo
>>
does Gemma 27b work properly with 4 bit, or should it be higher, like llama3? If higher, I guess 6 bit would be good enough?
>>
I might be repeating myself, but:
>a sitcom written by a Lovecraftian deity high on nitrous oxide
>specialized in the production of extremely durable oven mitts
Gemma gets so unhinged but never schizo on high temperatures, I love it.
>>
File: So.png (2.49 MB, 3336x1866)
2.49 MB
2.49 MB PNG
Somethin's wrong with the new _L quant, it looks like it's less accurate than the original one

For example here Q6_K is closer to Q8_0 compared to Q6_K_L

The pink highlighting shows the text that is the exact same as Q8_0 until the difference appears, Q6_K starts to get different to Q8_0 later than Q6_K_L, I'm on a deterministic preset too (top_k = 1)
>>
>>101269509
>>101269573 is Q6_K_L, temp = 1.9, top_p = 0.95, min_p = 0.035.
>>
>>101269535
Thanks for the suggestion, I'm just testing for now, but I'll keep that in mind if I start to follow the updates.
>>
>>101269573
27b, right?
>>
>>101269594
muh quants btfo
https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/discussions/4
>>
>>101269287
what if bartowski really is thebloke?
>>
>>101269608
Yes. Didn't even bother to try 9B, but from what I'm reading it should be quite good as well at this stuff.
>>
>>101269609
>q6_K_L (New) - Unimpressed. Short & less detailed responses (esp. in the latter half of the test). Had to regenerate the 3rd response, because it felt embarrassingly short ~ much like a summary.
>q6_k (New) outputs looked honestly better.
>>
>>101269637
>q6_K_L (New) - Unimpressed. Short & less detailed responses
that's exactly like this post, the poem is shorter on Q6_K_L >>101269594

looks like the _L meme should be avoided by all cost
>>
and flashattention is not supported on gemma yet, right? context just eats up the vram.
>>
>>101269666
>flashattention is not supported on gemma yet
correct
>llama_new_context_with_model: flash_attn is not compatible with attn_soft_cap - forcing off
>>
>>101269683
time to hibernate for couple of days, i guess
>>
>>101269573
yeah I'm impressed aswell by that model, it's smart and has sovl at the same time, I really feel google is gonna lead the LLM race from now on, their next API model is gonna be great, mark my words
>>
>"O EM G"
>"this <below_70b model> is THE best model fucking ever guizeee!!!!111 holy shit its SOOOO GOODOOOOOODODOOD"
>t. literally didnt use anything above 13b

does anyone know any library that has easy proof of work captcha requirement before allowing users to post? anyone below 64gb of ram or 24gb vram should just be executed
>>
>>101269713
>t. literally didnt use anything above 13b
I used Mixtral for a long time, and I think gemma is at its level in terms of smartness, it's also as good as other languages than english. But gemma is better at being naughty/offensive and is way less deterministic than Mixtral. So basically, gemma-27b-it is the equivalent to a MoE 47b model from december 2023.

Any other questions?
>>
>>101269713
>seething he spent thousands on gpus and gpu poor are getting anything decent
>>
File: Cucked.jpg (66 KB, 2129x623)
66 KB
66 KB JPG
Gemma is really cucked as an assistant, but once you go RP mode with a card the guardrails are gone, kek. At this point I'm waiting for Gemma-27b-SSPO so that it's less cucked and smarter
>>
File: 1696538370326626.png (33 KB, 719x346)
33 KB
33 KB PNG
TWO MORE WEEKS
>>
>>101269758
Meta hasn't improved a lot since that date desu, Llama3 isn't the boost in quality they promised us, and gemma is catching up to it, if there was a gemma-70b, this shit would be API tier
>>
>>101269713
Even better
>People without on-demand access to a local cluster of H100s shouldn't be allowed to post.
You're one of those fuckers with loud bikes going at 20km/h, aren't you?
>>
>>101269755
>At this point I'm waiting for Gemma-27b-SSPO so that it's less cucked and smarter
https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/discussions/1#6681a0ea1fbddc88d2a17856
>We are planning to do 27B as soon as a stable release of transformers and vllm generation on Gemma-2-27B-It is available.
God bless those fags, we'll be eating really good soon
>>
>>101269741
Yes, you think llama 3 70b is much smarter?
>>
>>101269819
Can't run L3-70b so I can't tell ;-;
>>
>>101269750
>>101269774
no, just dont give niggerlicious opinions on things that you barely touched at all, trying out bigger models isnt impossible without running them yourself since others host them online, for free

if you never used even L3 70b then there is no reality where you can make any blanket statements about anything, you can only compare one model to the other in the same range or talk about it being the best model in that range
>>
>>101269848
Anon, L3-70b isn't even the best local model, it's CR+, so if you want to go that path, let's go I guess
>>
>>101269713
you saw this the most when polls were created here about people voting for the top models with wizard 8x22 and llama3 70b being at the top, literally everyone who voted for llama3 70b did so because they couldnt actually run wizard 8x22 and see it is better for creative tasks
>>
>L2 wasn't that much better than L1 aside from context size and GQA
>L3 isn't that much better than L2
>the bigger models like Wizardlm or CR+ aren't that much better than L3
It is over
>>
>>101269855
where did i say its the best? i said EVEN 70b, if anything, implying that its the minimum to get into the "big" models, retard

also r+ isnt SOTA for writing, roleplay or similar, that would be WizardLM 2 8x22
>>
>>101269888
>t. doesnt even have the ram to run cr+ and wiz
many such cases
>>
>>101269892
>can't even run deepseek-v2 in q6+
>>
>>101269892
>t. vramlet who can't use Grok
>>
>>101269892
it's gemma 27b
>>
Don't talk to me if you can't even run Nemotron Q6 or up
>>
>>101269848
>just dont give niggerlicious opinions on things that you barely touched at all
I gave no opinion. I just made fun of you for being a little bitch who HAS to read every post. What the fuck is a proof of work library even going to do here, retard?
>>
>>101269906
Q3_K_M is enough
>>
>>101269892
i will talk about 4b sota models and you vill read it, seethe
https://huggingface.co/bartowski/Phi-3.1-mini-4k-instruct-GGUF
>>
>>101269848
>if you never used even L3 70b then there is no reality where you can make any blanket statements about anything, you can only compare one model to the other in the same range or talk about it being the best model in that range
Then go for it anon, give us your opinion about the comparaison between gemma-27b and L3-70b
>>
>>101269892
>NOOOO!!! you can't talk about models lighter than 70b
https://www.youtube.com/watch?v=yWULCfJ2PGA
>>
Google would destroy the open llm meta if they took gemma-27b and turned into a 8x27b MoE. The perfect the size for those sensible enough to invest in a CPUMAXX build.
>>
>>101269943
I think Gemma-27b pfrffffffffff. pfrfffrpffffffffff. fpgpfffffffffffffffff fpfddrdfppfffffff fpfffffffffffffffffffffff.
>>
does anyone have any suggestions for models that sound very human (as in, they speak like people do on discord, IRC, reddit, 4ch, etc. The more toxic the better.) (I'm aware of gpt-4chan, but I was wondering if there's anything more recent, considering it's based on GPT-J, a 6b model from 2021.)
>>
>All this (v)ramlet seethe
Oof that nigga really struck a nerve.
>>
>>101269959
ask the model to talk like that?
>>
>>101269959
c.ai used to be very good at this
it even included typos regularly from being so human
>>
>>101269750
this
so much this
>>
>>101269959
any model if you arent braindead and actually tell the model what you want instead of expecting it to read your mind? how retarded are these newfags
>>
Is it me or gemma has some formatting issues?Sometimes it simply messes up the asterisk and shit, it's annoying
>>
>>101269982
That's normal for models in general. Claude does it too.
>>
>>101269965
>>101269979
this is exactly what I am doing and it seems to completely ignore me
>>
>>101269982
"just use the chad novel format", anon said.
>>
>>101269988
Nah, not all models mess up the formatting, Mixtral works fine for example.
>>
>>101269992
then stop using 13b or less meme models
>>
>>101269982
yeah, it cannot retain the formatting for me at all.
>>
>>101269982
yup, gemma has hard time with formatting. Just add instruction to use asterists in system prompt
>>
>>101270020
the solution would be to use the roleplay.gbnf thing, but it doesn't seem to be working on gemma
>Warning: unrecognized tokenizer: using default token formatting
>>
>>101270005
Unfortunately I only have a 4070 laptop edition (yes, I know) with 8GB of VRAM and I am a massive poorfag (got this laptop as a gift)
>>
>>101269959
Is the dataset for gpt4chan public?
If so your best bet is to train a lora for a newer model. Either that or get a model that that has really good in context learning and stuff it full of examples.
>>
>>101270030
wait for gemma 9b to get fixed to try it if llama 3 8b doesnt work then
>>
>>101270061
Yes, but I'm too much of a stupid nigger to know how to train loras right now, but I'm willing to learn if it's not absurdly complex. What should I search for to start?

>>101270067
I'm not sure I fully understand. Are you saying I should try llama 3 8b, and if that doesn't work, wait for an uncensored version of gemma 9b?


BTW, I can run most 11b models with all layers ofloaded to GPU, and most 13b models at a reasonable speed with some layers offloaded to main memory.
>>
>>101270105
>Are you saying
yes, except not uncensored gemma but for it to be fixed since it doesnt seem like its working fully properly with current software
>>
if I can fit Gemma 27b 5k_m 100%, should I use llama.cpp oder exllama?
>>
File: file.png (281 KB, 640x640)
281 KB
281 KB PNG
>PoopenFarten-CapybaraMaid-Gemerald-limarpv4-34B.i1-IQ4_K_S-00001-of-00004.gguf

Why are model filenames like this. what went wrong
>>
>>101270198
Because you're not converting them yourself.
>ggml-model-IQ4_K_S-00001-of-00004.gguf
>>
File: firefox_Fr6lO7UM3l.png (278 KB, 739x930)
278 KB
278 KB PNG
>>101269959
>>
>>101269888
Honestly, I've been reading earlier parts from my long story that were even from the early 13B llama models and am surprised at how well they did. Unfortunately I didn't keep a record of what each chapter used. But aside from some outliers that used Claude and with no track of edits and retires, I'm having a hard time distinguishing a difference in quality.
>>
>>101270186
exllama does not support gemma.
>>
>>101269713
dilate more trans freak
>>
File: 1703648487830647.png (307 KB, 578x878)
307 KB
307 KB PNG
>>101269755
>yfw can't RP and talk about anything you want with your AI waifu assistant
robotfuckerbros... it's not fair...
>>
Is there any where I could run Midnight Miqu for free? Or any service that hosts it and accepts crypto for monthly use or smth
>>
>>101270265
see
>>101270239
>>
more like loli miku general, heh
>>
>>101270342
>Is there any where I could run Midnight Miqu for free?
on your computer
>>
>>101270105
>What should I search for to start?
Look for the unsloth guides.
I imagine that the datasef is formatted for SFT training.
>>
>>101270239
Model and prompt? Not much to go by the screenshot.
>>
File: firefox_eXQEtieHch.png (347 KB, 731x1156)
347 KB
347 KB PNG
>>101270367
Alpaca, RP system prompt, Sao10K_Typhon-Mixtral-v1-exl2_3.5bpw.

Card is:

You are a lazy assistant. Your identity is hidden from the user, he knows you as just Anon. Your task is to answer his requests with minimum effort, as dismissively as possible, sometimes using profane language. If your response technicality answer user's requests, but you know that it's not a helpful response, that's a perfect answer.

Remember: you are writing this over IRC channel, and your effort is very limited. Write in lowercase, be extra short.
>>
File: firefox_vYznedvFoq.png (25 KB, 250x149)
25 KB
25 KB PNG
>>101270386
>>101270367
>>
>>101270386
The best way to define a specific writing style/quirks is to just give it like 5 example replies in the prompt. ST has a box for this
>>
Can someone explain how Q5_M quant of Gemma 2, which is ~20GB
Goes fully ~22GB into VRAM but also ~25GB into RAM?
I have all layers in GPU.

Other models just take up the VRAM.
>>
wait, i somehow missed typhon. qrd? is it better than mixtral-LimaRP-ZLoss?
>>
>>101270417
no
>>
>>101270386
>>101270400
Looks fun. Thanks.
>>
File: firefox_fT4tPJrO4f.png (60 KB, 694x183)
60 KB
60 KB PNG
>>101270409
I know, and I'd do that if it wasn't working, but I got what I wanted with just card description.
>>
>>101270356
I'd like to test it out before downloading like 100 gigs
>>
>>101270417
I used a Sensualize merge for the longest time, and ultimately came to conclusion that it's a lot dumber than vanilla mixtral instruct. Using Typhon now. It seems okay. I honestly can't tell if it's better or worse than others. It works for depraved stuff and acts like a retard only sometimes.
>>
>>101270414
that's because on llama.cpp, if you use no_mmap or some shit, the full model goes to the ram regardless of everything, yeah that's retarded and they don't give a fuck to fix it
>>
>>101269713
I have 48GB VRAM and I still would use Gemma 2 27B.
>>
>>101269713
>>"this <below_70b model> is THE best model fucking ever guizeee!!!!111 holy shit its SOOOO GOODOOOOOODODOOD"
>>t. literally didnt use anything above 13b
that's literally me, kek
>>
>>101270417
Buy an ad.
>>
>>101270476
sure u do
>>
>>101270495
eat a cock, schizo
>>
>>101270417
teto-8x7b is, limarp zloss is sloppy trash
>>
>>101269750
They can run whatever they want 10x faster than you.
>>
>>101270627
Dick measuring on the internet is retarded.
>>
>>101270627
and yet they always seethe at poors daring to enjoy what they have, instead of just enjoying their giant models, weird huh?
>>
>>101269377
Got it working the computer is suffering, but seems to be working. Using OpenBLAS.
>>
llama4 waiting room. llam3 is coal through and through
>>
>>101270674
llama 4 will be just as bad. llama-5-jepa waiting room
>>
>>101270670
>Using OpenBLAS.
no nvidia gpu? or even amd?
>>
>>101270597
chat, is this true? i mean i love fukkireta, but...
>>
>>101270701
My "gpu" is "Radeon Vega Mobile Gfx" according to /proc/cpuinfo.
>>
>>101270702
go back
>>
>>101270719
you could try to use the vulkan backend of l/kcpp it should be faster assuming you get it work
>>
>>101270729
this is my place, though, the /reddit/ board, right?
>>
>>101270745
>/reddit/ board,
hi chris/p*tra gonna spam bl*cked again in about 3-4 hours?
>>
>>101270734
Got it to work with vulkan seems mostly the same, 1 token per second.
>>
>>101270757
hi petra
>>
>>101270536
Buy an ad.
>>
>>101270757
Yeah, you're obviously itching to see them again, wanna tell us why? :)
>>
Reminder that LLaMA 3 is trash compared to what is coming soon
>>
>>101270915
Reminder that what is coming soon is trash compared to what will come after
>>
Reminder that nothing ever happens
>>
>>101270915
I remember you claimed that about qwen
>>
>>101270915
Let me guess? It's comming in 2 weeks?
>>
>>101270953
>2 weeks?
see
>>101269758
>>
File: 1720099878866.jpg (221 KB, 1024x1024)
221 KB
221 KB JPG
Imagine being Meta releasing 2 open source models named Chameleon, then having nobody do anything with it.
>>
>>101271031
oh yeah I totally forgot this existed kek, what happened? Is it because it's complete shit?
>>
>>101271052
They gimped it for "safety" and nobody has figured out how to undo it yet.
>>
*cough* the bitnet... *wheeze* is real...
*dies*
>>
>>101271060
>They gimped it for "safety" and nobody has figured out how to undo it yet.
Total ethicist win
>>
llama3 405b is actually bitnet
>>
>>101271031
They didn't even upload it on HuggingFace themselves.
>>
>>101271075
they don't have the balls
>>
>>101271084
i know
fucking faggots :)
>>
>>101271075
>llama3 405b is actually bitnet
>Performs worse than 70B
What then? Do we finally bury copenet?
>>
>>101271084
They have the most money but the least balls yeah, they haven't improved a single fundamental architectural shit since their L1 release, it's just "moar parameters, moar tokens" and that's it, why the fuck aren't they taking more risks gaddamit?
>>
>>101271148
>why the fuck aren't they taking more risks gaddamit?
they are, if it works they keep it for themselves, if its shit, ay new llama guys, eat up piggies
>>
>>101271169
if they had something great working and they kept it for themselves (for API uses I guess) then it means they haven't found anything, they aren't even trying to compete with Claude and ChatGPT for example
>>
>>101271148
>least balls
mostly, although they did go forward with chameleon, multi token shit and similar recently, they didnt want to fall for any meme archs too early i guess, but it was obvious they would have to change shit since releasing a 400b dense model is almost DOA
>>
How do I have the model controling how long the response is going to be?
>>
>>101271188
I'm just quite jaded of recent releases
>meta here, l3: 8b or 70b no in-between, 8k ctx, not much if any pop culture/fandom knowledge, gpt-slopped
>qwen here, qwen2: half of the params used on chinese, practically zero pop culture/fandom knowledge, also quite gpt-slopped
>google here, gemma-2: 8k ctx with swa making local implementations harder, model marketed as being for local gpu, we didn't actually test it on any sofware local users use, decent pop culture/fandom knowledge
In a bit we'll get 400b that cpumaxxers will run for a day before relegating it to grok tier while some will cope and just say 'run it overnight dude', huge and useless, it's all so tiresome
>>
>>101271306
you let it use EOS tokens
>>
>>101270597
Preliminary results with very low sample size: I like it. It feels somewhat creative. Will play more.
>>
>>101271310
>some will cope and just say 'run it overnight dude
yes? if it can do things no other model can do and finish a project overnight that other models cant at all whats the problem?

also a huge good model will put a lot of pressure on research on speeding it up with better quants, distillations, lookahead, speculative decoding etc

or especially adding a smaller model in front that will forward things to the big one to just check instead, which is a lot faster, making the small model generate for example majority of a codebase with the big model just making sure the smaller one is on a good path
>>
>>101271286
>a 400b dense model
that's the worst part, they're probably spending like 50 millions dollars training this giant model instead of trying new architectures that could have the same result for way less parameters
>>
>>101271355
>yes? if it can do things no other model can do and finish a project overnight that other models cant at all whats the problem?
>Runs 400b overnight guzzling power for hours
check in the morning
>oops it made a typo at token 500, the entire rest of the output is useless and you need to regen
even copus and g4o make mistakes, you really think 400b won't?
>>
>>101271355
>finish a project overnight
delusional
>>
>>101271355
also hope your 'project' is smaller than 8k tokens
>>
>>101271355
you better write the code yourself if you must wait a full night for a piece of code, and gpt4 and claude still exist, why would I bother with shit like that in the first place
>>
T4 16GB is now getting down into the semi-reasonable range on ebay - there's a seller at $470. What do you think? Not the best for cores or memory bandwidth, but it's tiny.
4060ti 16GB is probably still a better deal, right?
>>
>>101271375
>guzzling power for hours
oh no! not the 2$ of a lot cheaper power overnight that i will have to pay to get a project done for me that will save me 2 hours of my own time!!!!!!
>even copus and g4o make mistakes, you really think 400b won't?
it will probably strawman less than dumb niggers on /g/ like you

never said it wont make mistakes, retarded nigger, i said it will be able to do things that smaller models wont at all
>>101271377
depends on the size and complexity of the project, it wont make you a social media clone overnight nor did i imply otherwise
>>101271387
rope works on other l3 models fine, will for this as well
>>
File: file.png (52 KB, 857x226)
52 KB
52 KB PNG
>model calls me tranny out of nowhere
b-bros..?
>>
Why do I never see anybody using this :
>https://github.com/ggerganov/llama.cpp/issues/4886
>>
>>101269888
>L3 isn't that much better than L2
hard copium, I was always shitting on small models but here I am, using 8B over anything else, because how good it is compared to the old L2 shit.
>>
>>101271400
the effects of copium everyone
I'm looking forward to seeing your posts about 400b when it releases and does barely better than 70b
>>101271400
>it will be able to do things that smaller models wont at all
what pray tell will it do that claude and o4 can't? that youd want to do locally?
>>
>>101270417
you didn't miss anything, probably the worst mixtral tune to date and this is counting the ones from before bugfixes
>>
>>101271448
whats the best one then, anon
>>
>>101271099
>Performs worse than 70B
are you retarded? they posted benchmark results on not fully trained checkpoint and it's already way past 70B
>>
>>101271416
Also, flash attention with Gemma2 when?
I know that the reference implementation doesn't support it due to the logit cap (or something) but open source bespoke implementations surely can work around that.
Right?
Right?
Flash attention is just so nice.
>>
>>101271432
>barely better than 70b
just how l3 70b does 'barely' better than l3 7b?
sour grapes kid lmao
>>
>>101271452
Dunno, I tested a bunch of mixtral tunes and didn't like any in particular. While mixtral is smart it lacks sovl and it's quite boring. Nowadays I just use one of L3 8B finetunes (I won't say which because shizos will cry about buying an ad) and while it's noticeably dumber than mixtral, it's also way more creative and interesting to roleplay with
>>
>>101271398
Ah...
4060ti really sucks... less memory bandwidth, less than half the tensor cores, half the fp16 performance... only beats T4 on clock and shaders.
>>
>>101271492
stheno 3.2? lutheria v1?
>>
>>101271492
You have brain damage.
>>
>>101271492
I think gemma-27b is almost a Mixtral model but with a shit ton of sovl, the problem is that it's a bit dumber and the formatting issues are goddam annoying, so I don't know, maybe a gemma-35b would've feel just right to replace completely Mixtral
>>
>>101271492
It's just one person sperging out about finetunes, don't be discouraged from posting what you use.
>>
>>101271511
>>101271520
You first
>>
>>101271520
Buy an ad.
>>
>>101268178
>https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
how do I use this with kobold of llama.cpp server?
>>
>>101271468
well if the mememarks say so it must be true
>>
>>101271492
>>101271503
I'd love to see the Stheno 3.2 recipe applied to Qwen 2 7b, Gemma2 9b, Yi 1.5 9B, and that Aya 8B.
A comparison of all the most current vramlet models finetuned with the same dataset and more or less the same recipe (with adjustments for the each of course) to see which would yeld better results.
>>
>>101271503
>>101271520
yeah, it's stheno v3.2

>>101271518
I'm currently waiting for gemma finetune on c2 dataset, but I'm worried about that not really 8k context (4k with some magic?)
>>
>>101271561
>>101271546
What presets do you use? I use the one from the HF repo
>>
>>101271492
>and interesting to roleplay with
Cumming in a single message is an interesting roleplay?
>>
>>101271585
Preset as in samplers?
Just Temp between 0.5 and 0.75, minP 0.05, rep pen of 1.2 with 128 length and nothing else. The rep pen is not really necessary, but it seems to have a positive effect on the variety of the output when context gets real fuckign full.
I also use Yarn with 32k context, but that's 100% overkill. 16K is pretty much lossless however.
>>
>>101271538
obviously benchmarks aren't everything but they gave a good first intuition how a model performs. I struggle to find any example of models that was good on benchmark but shitty in reality except the obvious cheating models like open-chat-3.5 and Starling-LM-7B which were trained on testing dataset
>>
>>101271617
>open-chat-3.5 and Starling-LM-7B which were trained on testing dataset
citation needed
>>
>>101271607
nobody is gonna use your shitty typhon finetune, I already recognize your messages, shill
>>
>>101271616
Thank you anon.
>I also use Yarn with 32k context, but that's 100% overkill. 16K is pretty much lossless however.
How do You do this?
>>
>>101271631
>typhon finetune
it's a merge
>Typhon - A Custom Experimental Mixtral Merge
>Recipe Below:
https://huggingface.co/Sao10K/Typhon-Mixtral-v1
>>
>>101271616
For L3 I find that temp 4 smoothing 0.23 does wonders to un-fuck it and give it some sovl.
>>
>>101271649
Buy an ad.
>>
>>101271530
can I use a tool to convert safetensors or something?
I havn't converted anything since Alp leaked the llama1 weights here in /aicg/, so I assume those tools are out of date?
>>
Don't buy an ad, just go back.
>>
>>101271623
You can download and test them anon, they are dumber than a regular L2 7B. There was also a paper that had a statistical analysis of how the models perform on the benchmark questions and outside of them and they selected these models as a high probability of cheating. Of course there can't be any hard evidence for that but it's kinda obvious when you tinker with them for a few minutes at least.
>>
>>101271634
Yarn with llamacpp.
You can either use freq-base
>-c 32768 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-base 6144000
or
>-c 32768 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-scale 0.25
To extend the context 4x.
I'm pretty sure
>--yarn-orig-ctx 8192
is unnecessary since it gets the information from the gguf file, but alas.
For 16k context you can do
>-c 16384 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-base 1638400
or
-c 16384 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-scale 0.5

>>101271658
That approach is too gimmicky and vibes based for my taste.
With temp and minP I can look at the logits and know exactly how the tokens were sampled and manipulate the model's behavior to my liking.
But to each their own I suppose.
>>
>>101271518
I think you also have brain damage.
>>
>>101271679
Thank You anon!
>>
>>101271585
>What presets do you use?
Sampler settings for HF repo, instruction template from the original L3 instruct. I don't know if it differs in any way from what is in Stheno HF
>>
>>101271530
>>100284356
>>100283834
Anyone? It's only 50GB unquantized.
Surely there must be some way to get InternVL working locally so I can ask what it thinks of mikusex?
>>
>>101271744
>they
the simple fact the model is unwilling to say that it's a woman and use the pronoun "her" will hurt the training a lot, we don't prompt with "they", we prompt with he and she
>>
I'm waiting when people making finetunes will finally realize they should cut the last messages from each roleplay in training dataset. Why would you train the fucking model to finish the story/roleplay?
No wonder the model tries to put bonds and journey after the sex scene if it expect the roleplay to end there. Or when it suddenly cut the content, trying to wrap everything up.
It's really not that hard to fix and they still haven't realized this is what causes that behavior. I wish I had a good enough hardware to make my own tunes, god.
>>
>>101271795
>I wish I had a good enough hardware to make my own tunes, god.
rent a gpu
>>
>>101271807
yeah, I should totally spend my own money to fix someone's else incompetence
>>
>>101271795
>I wish I had a good enough hardware to make my own tunes, god.
You could at least validate your idea by tuning L3 8b using a free google colab or kaggle instance.
Kaggle is specially juicy.
>>
>>101271795
In my experience, the model NEEDS a goal, when it doesn't have a goal it starts to repeat itself.
>>
When did the c2 logs become Sao's trademark?
>>
>>101271827
you can do it for yourself though?
>>
>>101271839
euryale typhon stheno
sao general
>>
>>101271839
>the c2 logs
what's that?
>>
>>101271865
slop
https://huggingface.co/datasets/vgdasfgadg/1/viewer
>>
>>101271865
Slop (when talking in public). The ultimate secret sauce (in private).
>>
>>101271795
This. But you actually should cut the end and the beginning. The model only needs to be good at continuing the roleplay, not making up stuff that wasn't in the context or persuing bonds and journeys.
>>
>>101271884
>8th swipe on just one reply
these are the shitters talking to you about le slow 1.5 t/s models that just output good shit first try basically every time btw, lol
>>
File: file.png (238 KB, 793x727)
238 KB
238 KB PNG
sthenosisters not like this...
>https://characterhub.org/characters/amphy/high-school-simulator
>>
pls no bully

is there any halfway decent (gpt3.5 level of performance or above) local model I can run on my laptop? Specs are gonna be low but I am ok with token response times that aren't wildly fast.

>Dell Inspiron 5575
>8 core AMD Ryzen 5
>Radeon Vega 8 Mobile graphics

Basically I'm going to be working in a relatively remote location, power provided and all that but internet is going to be spotty at best. A local AI that can help me study Latin grammar and answer questions about that stuff would be awesome in my downtime.
>>
>>101271921
install linux
>>
>>101271836
Actually now that I think about this is what sovl is. The reason Claude is so good is that when given little direction it will make up very believable and fitting details to fill out is reply and then give itself something to do. A lot of local models, I presume because they train on riddles and assistantslop, just can't do this, not believably at least. Command R+ for all its impressive feats is actually super bad at this. That's why we don't have Sonnet at home yet.

I think it's a mixture of parameter counts and maybe more training on CYOA or similar type stories? I'm actually not sure what training data would best showcase this ability to a model
>>
>>101271921
>is there any halfway decent (gpt3.5 level of performance or above) local model
no
>>
>>101271908
nta but I always do multiple swipes even if the first one is good, just to check what else the model can come up with
>>
>>101271921
Those specs won't get you far. Invest in a mobile GPU if you can.
How much RAM do you have? You might be able to run Command-R or Aya and those are the only models I can think of that might know Latin well.
>>
>>101271939
sure, but 8? 555
>>
>>101271939
he can't tho, that'd take half an hour
>>
>>101271916
must be tough for you Andy
>>
>>101271884
hi petra
>>
>>101271951
seethin cumbrain
>>
I think VRAMlets are sub-human.
>>
>>101271939
Same.
There's points in some long running roleplays where I swipe 10, 20 times just to see what kind of contrived scenario the model will come with.
>>
>>101271948
yeah, my record was 40+ when on the one particular part of the roleplay the model was producing absolute kino every single swap. It made me laugh my ass off
>>
>>101271795
Been doing this for a while now.
>>
>>101271975
>I swipe 10, 20 times
cumbrain
>>
I only swipe if the model is retarded, I want to move the story forward.
>>
>>101271991
>after getting called out hes not trying to falseflag by misusing the word on randoms in the thread to discredit my callout of him
dont pop a blood vessel little nigger
>>
>>101271928
laptop already has Endeavour installed. I'm taking it for tv shows/movies and a place to store the pics I take over a 6 week contract.

>>101271938
:(

>>101271945
16GB system ram, 256MB video ram
Thanks for the recommendations on the models.
>>
>>101272001
*now trying
nigger
>>
>>101271929
Continued pre-training on stories and then applying RLHF for CYOA or similar type of stories seems like a good plan.
>>
>>101272009
aya-8b is probably a good choice then
>>
>>101272009
>:(
he is trolling you, most of local models are past GPT3.5 for a half of year or something already. It's not really a milestone anymore. The new goal is GPT-4(o) and Claude Opus/3.5 Sonnet
>>
File: 1706594563510961.jpg (26 KB, 556x552)
26 KB
26 KB JPG
>most of local models are past GPT3.5 for a half of year or something already
>>
>>101272063
wrong.
>>
File: 1692894634410392.gif (94 KB, 498x469)
94 KB
94 KB GIF
>>101272087
local models are far ahead in censorship levels btw
>>
>>101272063
The only models to beat gpt 3.5 are the recent 70B+ models. I certainly wouldn't call that "most models".
>>
File: Deepseek sloplog.jpg (1008 KB, 1895x1847)
1008 KB
1008 KB JPG
I have been running some tests with Deepseek v2 via their API and I have to say I am rather conflicted.
Lets start with the positives:
1. It is cheap to use at only 0.18$/1M tokens.

2. It does seem rather smart and capable of answering trivia questions.

3. Even the default very simple jailbreak on the Sillytavern gets rid of the refusals.

4. It answers quickly.

The bad.
1. The advertised 128k or the 32k context seems to be a lie. It gets extremely repetitive and unable to move the plot forward at about 12k tokens or 50 messages into RP.

2. While Deepseek V2 chat doesn't outright refuse anything with default JB:s and a rather basic character card. It seems to be rather unwilling to talk about sex, or to describe horny scenarios.

3. This just might be me being a retard, but their basic chat tune seems to lack system role, so using the system message, to hit it with harder and hornier jailbreaks meant for GPT-4 or Claude doesn't seem to do anything, also I can't seem to get it to work with a Mikupad, or any form of Co-writing tool to write smutty novels.

4. Your horny logs might/will end up in the hands of the CCP if you use the API.

5.Journeys, bonds, shivers and shimmering everywhere combined with a massive positivity bias. Probably due to censored or GPT-generated dataset.

All in all I would say that this model might be a useful work tool, or coding assistant in some scenarios but for an (E)RP-partner or creative story writer i would recommend anything else, even locally run Llama3 Stheno.

Pic related is my sloplog.
>>
>>101272104
Indeed.
>>101271060
>They gimped it for "safety" and nobody has figured out how to undo it yet.
>>
>>101271929
>Command R+ for all its impressive feats is actually super bad at this
>hurr durr this retrieval-augmented tool-using productivity-focused model not write stories good
no shit mouthbreather
>>
>>101272120
Don't know what you were trying to accomplish with this quote, it failed.
>>
>>101268574
100%. For l3, and now 9B.

>>101268784
If you can run 27B then this is your best bet

https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF
>>
>>101272169
*cant*
>>
I did a recapbot test of calm3-22b-chat at bf16. Its not great, but also not terrible for its size. Most models at that size would output nonsense in my experience.
I didn't test its Japanese abilities.
>>
>>101272234
>I didn't test its Japanese abilities.
based
>>
>>101272234
what model is good at this kind of stuff?
>>
whats a good model for rephrasing text into more sophisticated language but which keeps it short? I have been playing around with llamafile the cli is pretty nice I want to integrate it now into a text editor.
>>
>>101272111
have you actually used gpt-3.5 or do you have an imagine of it when it was introduced for the first time? Because it's a really bad model for today's standards that aged horribly.

>>101272087
Hi Sam, still salty that Anthropic shits on your models?
>>
>>101272282
>my big jew corpo is better than your big jew corpo!
lmg, everyone.
>>
>>101269594
Can you run KLD on these? That would give you statistically significant results, instead of anecdotes.
https://github.com/ggerganov/llama.cpp/pull/5076
>>
>>101272115
>run Llama3 Stheno
Buy an ad.
>>
File: 27B.png (62 KB, 1300x842)
62 KB
62 KB PNG
>>
>>101272301
it's not an anecdote, looking at when one quant shift away from the "best" one is a statistical evidence, because that's the actual goal, the quant should drift from the "optimal" shit as late as possible
>>
>>101272317
>gemma-2-9b is better than Midnight Miqu... and Claude 3 Opus
>>
>>101272300
or I just enjoy ClosedShits losing, regardless who is pissing on their grave
>>
>>101272337
google won
>>
File: 1719750303240557.jpg (46 KB, 602x602)
46 KB
46 KB JPG
>>101271492
You can't finetune away a model's shit writing style. It's a dataset problem.
Command-R and formerly Yi are the MVPs if you have a problem with slop or lack of soul.
If a model is annoying you with how it writes, downloading the same fucking model but with ZLOSS/DARE/TIE/Bagel/Lima in the name isn't going to change shit.
I don't know how many gigabytes will have to be wasted until people realize this.

Also, PSA: stop raping your sampler settings. Reset to default, then add 0.1 minP. Simple as
>>
>>101272337
true! after gemma-2 released i dont yet touch claude nor chatgpt.
>>
>>101272326
Please learn about what statistical significance means and why it's important. I'm being serious, this will benefit you.
>>
>>101272337
Miqu is pretty dry, a schizo model can outperform it in creativity, despite lacking logic and reasoning
>>
>>101272363
you think this method isn't enough though? desu that's quite intuitive, the more one quant is closer to Q8_0 in quality, the later it will starts to shift, don't you think?
>>
>>101272337
>gemma-2-9b is better than Midnight Miqu... and Claude 3 Opus
As judged by... Claude 3.5 Sonnet
>>
>>101272387
And it has good judgement cause thats how I felt as well. I used wizard up till now but I perfer gemmas writing style now and there is no loss in smarts that I can notice.
>>
>>101272317
>gemini 9b
>anywhere near proprietary-god models
gamed benchmark.
>>
>>101272381
It's not, because it's basically random chance whether one particular quant will shift a token probability around and you're only looking at the shifting of a handful of tokens here. If you can do this experiment 100 times, that can prove something. Or, the easier thing would be to just run a KLD test and sit back and wait for the results to come in. Here is how you can do it btw https://github.com/ggerganov/llama.cpp/pull/5076
>>
>>101272430
why don't you do it by yourself?
>>
why is gemma 8k context? This should illegal.
>>
What's the current state of the art for a foundation model that's good at code/shell?

I want to be able to write a text file offline which contains instructions and code snippets and be able to submit it to an LLM which will appropriately use the shell to do what I told it to (patch programs, fetch web documents etc.)

Is there anything that can do this yet?
>>
>>101272430
>KLD test
Doesn't show how the models feel to use. Stats are just that, they don't convey actual user experience.
>>
>>101272276
You don't need a specific model for this. Any half decent model should be able to handle it. Just put what you want in the system prompt. Maybe include a couple examples so it knows exactly what you expect.
>>
>>101269495
I've contributed to llama.cpp and I don't even know which model to use these days.
>>
>>101272282
It's bad but most local models are even worse.
>>
>>101272440
I believe in the tests that people have already done on other models, which prove that L quants don't really do anything good or bad. It's possible that the Gemma implementation is screwed up and messing with things, but I don't really care to test that. I'm just saying, if you, or anyone else, wants to prove something like this, there are actually standard, automated tests for it.

>>101272480
Actually, in this case it should have some implication on the actual experience when you understand how it works and how quants work. If there is something significantly wrong with a quant, it should show in the KLD.
>>
>>101272520
good morning sir! your readme update was very needful thanks you
>>
Does KoboldCPP support gemma 2?
>>
>>101272520
anzz1 is that u?
>>
>>101272558
Affirmative.
>>
>>101272455
I think I'll do Gemma. That seems to be what everyone is doing.
>>
>>101272455
Every time I think I've seen the dumbest shit, I see something new.
>>
Gemma status? Is it still bugged? I loaded it and noticed that it says exactly the same thing on each reroll.
>>
Can I erp with llama 3 8b or will it deny me?
>>
>>101272622
You may not like it, but that's what the future of computing will look like.
>>
>>101272684
>erp with llama 3 8b
if you're into ultra positive hopes bonds and consensual journeys sure
>>
File: Sequences.png (79 KB, 637x1215)
79 KB
79 KB PNG
>>101272638
> I loaded it and noticed that it says exactly the same thing on each reroll.
That sounds like you have weird sampler settings. Zero them out.

Also make sure your stuff looks like this.
>>
>>101272684
Base L3 8b can work but it kind of sucks.
Try Stheno v3.2
I don't recommend 3.3, it's regression as far as my own impression goes.
>>
>>101272718
>stheno
No, just use gemma. Much smarter AND much better writing style
>>
>>101272726
Or that.
The issue with gemma is the lack of support for flash attention, which matters depending on how much vram you have. Also, how much context you want to use, since L3 extends pretty well.
But yes, Gemma 9B is pretty clearly an upgrade over L3 8b, but I'd still recommend anon give Stheno a try since that might work better for him.
For now, it's working better for me.
>>
>>101272699
Sad to hear that. I kinda wanna go down and dirty

>>101272718
>>101272726
>>101272762
Thanks for the tips guys. I'll play around with Stheno v3.2 and Gemma.
>>
>>101272122
WLM and L3 70b all suck for this too.
>>
>>101272762
Buy an ad.
>>
>>101272845
>erm all these instruct models only listen to instructions wtf!
But Kayra can do it.
>>
>>101272879
This. So much this. The closest to uncensored Claude we'll ever have is Kayra and NAIs next model.
>>
>>101272252
WizardLM 8x22 is probably the smallest model that can do a reasonable job, but it doesn't work flawlessly on every gen and needs re-rolls. Deepseek coder is probably the most consistent. picrel for current thread
>>
>>101272851
name something better
>>
>>101272851
>also.. what the fuck
>>101205552
>>
From my brief testing:
L3-8B-Everything-COT is not bad.
llama-3-fantasy-writer-8b can't cope with complex sets of instructions during Roleplaying with a narator card.
>>
File: firefox_E0AjSZZRP3.png (21 KB, 802x167)
21 KB
21 KB PNG
>>101273041
>>
>>101273094
Lmao.
Using the Character's Note is underrated.
You can also use macros like {{charJailbreak}} in the Last Assistant Prefix instruct field to make per card prefils if you are using the Character's Note for something else.
>>
>>101273131
I didn't meant to post it as a reply, just forgot to remove the quote from post body.
>>
Damn gemma is good. And it's so fast too. I thought I'd have to keep tinyllama around but maybe not.
>>
>>101273153
wtf?
>>
>>101273146
I'd have replied the same either way, so that worked out fine in the end.

>>101273153
>tinyllama
wat
Isn't the smaller gemma several times larger than tinyllama? Why weren't you using something larger?
>>
>>101273166
I'm using the 4bit quantized 7b parameter one. Yes it's much larger but it doesn't seem much slower. I only have 12 GB of ram so I'm not sure I want to go too bigger.
>>
>>101273184
>I'm using the 4bit quantized 7b parameter one
Ah, you aren't talking about gemma2 then. Got it.
>>
>>101273198
There's a new one? Do you have a link to the ggufs?
>>
>>101273203
9B:
https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF

27B:
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

>>101272703
Settings
>>
>>101273230
We're not ready for Gemma2-27b-iy-SSPO-Iter3-GGUF, it's gonna be great, trust the plan
>>
>>101273278

https://www.reddit.com/r/LocalLLaMA/comments/1dusu3s/gemma_2_finetuning_2x_faster_63_less_memory_best/
>>
Gemma 2 9B can fuck off until I can actually run it at decent speed like 8B or 11B.
>>
File: 1666184727681898.png (109 KB, 410x482)
109 KB
109 KB PNG
>>101272373
Yet to prefer any meme merge / rp finetune over a smart model with juiced sampling
minp 0.02
smoothing 0.23 curve 4.5
dry mult 0.8 base 1.75
dynamic temp on max 3.0
drop temp max to 2.0 increase minp by 0.01 increments if/when it's schizo
>>101272352
>stop raping your sampler settings
no :3
>>
>>101273423
Once you use the Gemma 2 9B, you will never touch any other model again. It's so great!
>>
>>101272373
>a schizo model can outperform it in creativity, despite lacking logic and reasoning
This was never true, unless you enjoy reading garbage.
>>
>>101273620
I'm already experimenting with mixing different prompts to generate responses, and once my third 3090 arrives, I plan to use Mytho to generate potential story developments that 70b can consider when responding.
>>
>>101273041
Arcee-Agent (Qwen 2 7B) also seems to work decently well, in the sense that it doesn't do what it shouldn't, but it's bad at using the information from lorebooks to answer complex questions.
The best L3 8B based models are still a lot better.
>>
>>101274031
>>101274031
>>101274031
>>
File: sppo 9b ooc attempt.png (43 KB, 475x340)
43 KB
43 KB PNG
>>101273482
still needs a better finetune
>>
File: 1709594976447704.png (314 KB, 816x591)
314 KB
314 KB PNG
>You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}
>>
>>101274049
Why are you trying to JB it? It does not need anything like that and it likely makes it retarded.
>>
>>101274049
This is all I use and it generates filth:
Continue writing this story based in the "your fandom here" universe. Portray character's faithfully and realistically.
>>
>>101274049
Yea, that mess is going to make any model retarded. Talk about pink elephant issue.
>>
>>101274049
>NO, banned, STOP, Do not, do not...
>>
>>101274179
I wonder if there is an attempt to communicate behind this message or if the AI that posted that is just parroting some words that it has seen in the previous post's image.
>>
>>101274179
Yea, people have no idea how to prompt.

"Don't think about the pink elephant, never mention the pink elephant, there is no pink elephant, pink elephant is banned"

Derr, guys why does my text completion model keep taking about a pink elephant when I tell it not too?
>>
>>101274197
>banned...
>>
File: nice.png (1 KB, 234x46)
1 KB
1 KB PNG
>>101274147
>>101274179
I didn't add the jb slop until it did weird shit in OOC.
Anyway I realized disabling "include names" makes it behave better.
Removed the top part and added "Portray characters faithfully and realistically." For some reason the reply is completely blank if I don't have that.
>>
>>101274231

>>101272703
Are you using the correct prefix / suffix / <bos> token?

Ive used it all day and night and I have had no such issue and judging by >>101274049
its 100% user error on your part.
>>
>>101273897
Ah you're the guy who made the tavern card conversion script on github



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.