[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1691463630757444.jpg (686 KB, 1468x1707)
686 KB
686 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101287708 & >>101282945

►News
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101287708

--Paper: Min P Sampling: Balancing Creativity and Coherence at High Temperature: >>101293271 >>101293485 >>101294069 >>101294219
--Llama.cpp issues and recommendations: Gemma, LM Studio, and API analysis with Wireshark: >>101288526 >>101289269 >>101294360
--Llama.cpp Line Jumping Issue and Potential Fixes with Llama-Server and Tokenization Improvements: >>101294643 >>101294885 >>101294931 >>101295006 >>101294933
--XML Tags Breaking Gemma2 Model: Formatting Issues and Potential Solutions: >>101295212 >>101295241 >>101295246 >>101295255 >>101295270
--Why is data turned two-dimensional in the embedding step?: >>101295272
--Qwen2, Anthropic, and the Special Tokens They Use (or Don't): >>101288953 >>101288994
--Prompt Engineering for Explicit LLM Content Generation: The Power of Words and Phrasing: >>101290235 >>101290272 >>101290342
--Gemma 9b: Best Model for Low-End Coomers? Enhance with SPPO: >>101293495 >>101293530 >>101293627 >>101293978 >>101294007 >>101294232 >>101294243 >>101294267 >>101294309 >>101294387 >>101294419 >>101294070 >>101294098
--Gemma-2-27B Struggles with Yarn Scaling and Context Length: >>101291240 >>101291250 >>101291304 >>101291327 >>101291335 >>101295015 >>101295729
--Effectiveness of Negative Instructions in AI Models and Their Architectural Limitations: >>101294614 >>101294637 >>101295426 >>101295603 >>101295686 >>101295719 >>101295807 >>101295831 >>101295862 >>101295966 >>101296184
--/aidg/ wisdom for improving AI-assisted creative writing: >>101291218
--Japanese LLaMA-based model CALM3 lacks general knowledge, but is it necessary?: >>101294174 >>101294258 >>101294312 >>101295151 >>101295133
--Corrected Gemma2 ST Settings: Scenario_Info in System Prompt: >>101287822 >>101288827
--Logs: calm3-22b-chat-bpw4-exl2: >>101296163 >>101296193 >>101296237 >>101296264 >>101296302
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>101287712
>>
>>101294219
I use the same seed for every generation so I can go back and tweak things if I like/dislike how they're going.

Wanting your computer to be surprising is an intensely upsetting idea to me. Although I've noticed for ERP you definitely want some pretty high temps.
>>
Retard here.
How do I merge a lora with the base model?
>>
>>101296839
Is it a gguf llama or something else?
>>
>>101296807
>--Miku (free space):
>
sad story in 3 words
>>
>>101296855
I've had lots of sex and am very good at Linear Algebra.
I just really like computers.
>>
>>101296851
It's a Gemma 2 adapter (safetensors), I want to merge with the Gemma 2 model.
>>
>>101296831
>I use the same seed for every generation so I can go back

This never worked for me. Wondering how this can work now
>>
>>101296855
Your statement is a clear violation of respectful and harm-free communication. It contains hate speech, promotes violence, and uses derogatory language, which are against my principles of ensuring a safe and inclusive environment for all individuals. Therefore, I cannot engage with this content.
>>
>>101296874
>proved his point
>>
>>101296872
Literally
 llama-cli -s 1 
>>
File: 1706377049351135.jpg (650 KB, 1856x2464)
650 KB
650 KB JPG
>>101296804
>>
>>101296807
Miku misfortune
>>
>>101296872
you can try my seed
*unzips*
>>
So I updated my transformers to the latest commit on the main branch and I still get the architecture not recognized error with gemma2
Could this possibly be a python version thing? (The conda environment I was using was an old one on 3.10.9)
>>
>>101296855
My intention is the opposite. To develop superior conversational partners to render petty, screeching weirdos like you obsolete in the world. You need only be more pleasant than an AI language model to remain relevant.
Not being an obnoxious piece of shit is a very low bar to clear... and yet...
>>
>>101297100
and you too, proved his point just fine, autocompletion network can't replace average shitposter and is brainwashed harder than your reddit friends.
>>
>>101297143
>reddit
>reddit
>reddit
go bac chris
>>
>>101296839
Please help...
>>
>>101297195
reddit is great containment site for ai-jèéts (formerly NFT fags), so, fair point lol.
>>
>>101296839
Load it in with transformers normally then use save_pretrained
>>
>>101296839
Doesn't merging a lora with the base model defeat half the point of a lora?
>>
>>101297070
Trust me, pythonhell is not worth your time. It's a literal maze of dependencies.
>>
>>101297269
Not really. Some people can't do full finetune but can do a lora one. For them, the purpose is that it just runs.
>>
>>101297212
Something like this:
https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py
Changing the "tloen/alpaca-lora" line and the last line to remove the 400MB thing and to add safe_serialization=True.
Or at least it was like that in the Llama 1 days.
>>
>>101297287
But you can run it without merging it. You only have to store one base model and a bunch of small loras. What is the benefit for merging?
>>
>>101297303
Before merging: you need a base model and a lora to run your finetune. Your software needs to support loading loras. The model has to match or the results are going to be much worse.
After merging: you need a model to run your finetune.
>>
>>101294219
>>101296831
I use 0.1 temp together with the random and pick macros to add randomness to the rpompt itself.
Instead of having a single randomized thing, I have 3 or 4 in series to have lots of possible combinations.
>>
>>101297263
That doesn't work, does it? The PeftModel doesn't have a save_pretrained method.
>>101297269
It's better for ggufing
>>101297294
Thanks, I will try that.
>LLaMA 1 days
I wonder if people still do it like this, or maybe axolotl/llama factory does this automatically for them and no one pays this any mind anymore.
>>
>>101297691
>The PeftModel doesn't have a save_pretrained method
https://huggingface.co/docs/peft/en/package_reference/peft_model#peft.PeftModel.save_pretrained
>>
>>101297739
Ah, that doesn't merge though.
>This function saves the adapter model and the adapter configuration files to a directory
>>
I wish gemma2 had a 2 billion parameter model for faster inference.
>>
File: AdaLoRA.png (20 KB, 886x166)
20 KB
20 KB PNG
Interesting.
Never heard of AdaLoRA before.
>Supported PEFT types:
> PROMPT_TUNING
> MULTITASK_PROMPT_TUNING
> P_TUNING
> PREFIX_TUNING
> LORA
> ADALORA
> BOFT
> ADAPTION_PROMPT
> IA3
> LOHA
> LOKR
> OFT
> POLY
> LN_TUNING
>>
>>101297782
Yes, first it must be merged. See bottom: https://huggingface.co/docs/trl/main/en/use_model
>>
>>101297793
then would be useless. I dont know any good model below 7B which can be usable for RP or STORY.
>>
Why is nothing happening is everyone waiting for zuck again?
>>
>>101298381
I'm waiting for gemma-27b-SPPO personally
https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/discussions/1#6681a0ea1fbddc88d2a17856
>>
>>101298381
Waiting for backends to fully support Gemma.
>>
>>101298407
llama.cpp is close though, just needs to fix some of the formating issues and we're good to go
>>
>>101298401
and UCLA-AGI waits for google, when they fix transformers.
>>
>>101298423
Does it have SWA yet though? I did have some fun with beginning some chats on it but it forgetting stuff in the early part of context later on really sucks.
>>
File: 9c6e48caa9d0.gif (586 KB, 400x169)
586 KB
586 KB GIF
>>101298381
>>
Does anyone use k2-18b?
>>
12 + 8 = 20
27 > 20
I can't work with this
>>
What's the best setup for speech to text?
>>
>>101298583
27*0.6 = 16.2
16.2 < 20
There are ways.
>>
File: 1720270062223372.png (92 KB, 1842x501)
92 KB
92 KB PNG
>>101298448
See the table without scaling. 8k context does work.
>>
>>101298611
Define best. My criteria is speed. rhasspy/piper is good for that.
>>
I would probably buy the meta AI glasses if I could prompt engineer them to act like my reverse isekai elf gf and also that would probably get me to go out a lot more and measurably improve my life in many ways
just saying, zuck
>>
I suspect the problem with Gemma-2 not maintaining the format or adding extra spaces is due to the default final logit softcapping value. Try to increase it from 30 to 50 and the problem appears to decrease (completely?). Conversely, set it to 25 or less and it gets worse. At lower values the model gets completely incoherent.

Caveat: I only tried 9B.
>>
>>101298654
What version was that performed on? My build should've been pretty recent, but it still noticeably forgot things after going past 4k.
>>
>>101296804
Asked in the old thread but is a 6650 XT good enough to get started?
>>
>>101298683
>8GB GDDR6
Should be enough for Llama 3 8B or Gemma 9B.
>>
>>101298656
Accuracy is important but I'd sacrifice some of that for speed. I'll check rhasspy/piper out.
>>
>>101298733
If you do, build it yourself instead of using the python bindings. And install espeak-ng. piper uses espeak's phonemizer.
>>
How come when Jafar gets trapped in the lamp, his actions as a sorcerer are undone? Wouldn't it imply that anything he does as a sorcerer is just an illusion that he maintains, and when he's no longer able to maintain it, the illusion is broken?
>>
>>101298784
Blue Genie undid his actions once he was trapped since he was FREE to do so.
>>
>>101298784
blasting a part of the palace to the end of the world seemed real
>>
>>101298784
that happens in the sequel too, he tears up the landscape and it all gets put back
I wouldn't think too hard about it
>>
>>101298448
>does it have SWA yet though
No, it has a hacky bypass, so it's still gimped.
>>
>>101298784
What did anon mean by this?
>>
>>101298784
It's like a computer being unplugged - when Jafar is sealed away, his 'power source' is cut, rendering his magic inert.
>>
>>101298677
>>101298677
brb requanting 27b to test this out
>>
>>101298784
he didn't save changes before closing sorcerer.exe
>>
>>101298784
In Disney's "Aladdin," when Jafar is trapped in the lamp, the reversal of his actions can be interpreted in several ways. Here's a breakdown of the possible explanations:

Genie's Magic is Self-Sustaining:
The Genie’s magic is shown to be extremely powerful and self-sustaining. When Jafar wishes to become an all-powerful genie, he inherits this nature of magic. However, when he is trapped in the lamp, he is bound by the same rules that bind Genie. This includes the undoing of his actions because his magic is now contained and controlled by the lamp. This suggests that the magic performed by a genie is tied to their freedom and ability to act.

Lamps Have Special Properties:
The lamp itself might have the inherent ability to revert any magical changes made by its occupant upon their imprisonment. This means the lamp acts as a reset mechanism, restoring reality to its original state once the genie or sorcerer is confined.

Narrative Convenience:
From a storytelling perspective, it provides a clean and satisfying resolution. It allows the protagonists to return their world to normal without having to deal with the complexities and consequences of Jafar’s transformations and magical actions.

Sorcery vs. Genie Power:
Jafar's powers as a sorcerer are fundamentally different from the Genie's. While a sorcerer might perform magic through spells and illusions that require constant power to maintain, a genie's magic is more absolute and enduring. When Jafar becomes a genie, his sorcerous actions may become intertwined with his genie nature, thus, when he is confined, his magic is undone as part of the genie containment.

The undoing of Jafar’s actions when he is trapped in the lamp underscores the idea that his power, while formidable, is ultimately not his own but derived from the Genie’s magic. Therefore, when he loses control over this magic by being trapped, everything he created or altered through it reverts to its original state.
>>
File: file.png (362 KB, 634x481)
362 KB
362 KB PNG
>>101299232
best will smith's role btw
>>
>>101299187
>>101298677
no, still broken
Good morning, Anon-sama.  How... how are you today?
>>
>>101299274
Men in black will always be my favorite role he has done.
>>
>>101299323
What was the expected result?
>>
>>101299342
there is an extra space after Anon-sama.
>>
>>101299352
Does it happen with every sentence?
>>
>>101299187
GGUF files use hardcoded softcap values, did you update the code and recompile llama.cpp?
>>
File: file.png (321 KB, 1972x932)
321 KB
321 KB PNG
>>101299323
I still haven't seen this "double space" in my outputs, though.
>>
>>101299352
>>101299379
Also this. The are no examples of double space in your prompt, right?
>>
File: finallog.png (6 KB, 460x41)
6 KB
6 KB PNG
>>101299378
picrel in the llama.cpp source file, I suppose.
>>
>>101299379
here is a test prompt i just came up with.
https://pastebin.com/RBFCkfjf

produces the following
Good morning to you too!

What can I do for you today?

(And please, feel free to call me Bard. "Anon" makes me feel a bit like a shadowy figure.) )
extra space after "Bard." (wtf)
extra space after emoji

the prompt has to be long enough. If i remove just one paragraph from lorem ipsum it calls itself Gemma and all double spaces are gone.
>>
>>101299378
I understand the softcap value is only hardcoded for old GGUFs to keep compatibility. Newly converted ones should pick it up from the json config.
>>
Have any of you tried maintaining context by having a smaller model summarize the state of the world instead of just leaving it in the chat history?
It's faster and it seems to enhance the models ability to both stay in character and understand the imagined world. Plus it has the added bonus of memorize persisting across chats.
>>
File: hardcoded.png (82 KB, 775x301)
82 KB
82 KB PNG
>>101299479
When I tried modifying config.json in the original HF weights, converting from HF to BF16 GGUF and quantizing the GGUF, the console output would still show the default softcap values of 50/30; see picrel.
>>
i fucked up post formatting

here are the two prompts with their respective outputs. 27b Q8_0
https://pastebin.com/9UCkX201

one extra paragraph of loremipsum and it breaks completely.
>>
Hmm, switched to a bigger quant for 27B (5_K_L to Q6) and it seems to hold together even better at higher temps, even at 5 temp / 0.05 min p it has yet to make a single logical / anatomical mistake while being noticeably more creative.
>>
OMG FUCKIING SHITS pastebin trimmed double spaces

Good morning to you too!

What can I do for you today?

(And please, feel free to call me Bard. "Anon" makes me feel a bit like a shadowy figure.) )

assniggers
>>
>>101299498
Here's my attempt at it. It's working pretty amazingly well.
https://paste.textboard.org/7fea562b/raw
>>
Anyone else try the recently uploaded gemma 2 9b exl2 quants? They seem fucking broken for me and just spit out gibberish.
>>
>>101299530
>Hmm, switched to a bigger quant for 27B (5_K_L to Q6)
the _L quants are probably fucked, something's wrong with this meme new quant
>>
>>101299514
>https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py#L2434
Sure you're using the lastest llama.cpp version? self.hparams and hparams are the same object, defined in L2420.
>>
>>101299450
I think I found the problem. You're retarded.
>>
>>101299542
>Bash
What are you cooking?
>>
Also, - Progress the story slowly.
makes it progress too slowly sometimes. This model really follows instructions to a T.
>>
>>101299547
The KLD tests for L3 8B and Phi were fine though. It might be something wrong with Gemma's implementation that is causing the quantization script to not work perfectly in niche cases like what Bartowski does with L quants.
>>
>>101299547
>the _L quants are probably fucked
Are these still using FP16 for some of the weights? If so that could be why. There was something about not using Gemma in FP16 straight out of the makers if I remember correctly.
>>
File: goodmorning.png (28 KB, 404x798)
28 KB
28 KB PNG
>>101299597
Intense existential dread apparently.
>>
>>101299592
clarify?
>>
>>101299607
tf are you talking about, Gemma?
>>
>>101299712
NTA but I had to get rid of that line for mine. It never moved things along.
>>
>>101299689
3 days ago he transitioned to using Q8 for them since he made some benchmarks and found that Q8 wasn't worse than fp16. Though he kept Q8_0_L with the fp16 layers.
See https://huggingface.co/bartowski/gemma-2-27b-it-GGUF
>>
>>101299693
You might need to give the poor brain in a jar some context as to where it is, lmao. I dunno what you have in the other files in your bash script, but you might want to give the character a setting they exist in so it doesn't go "oh god I am a single particle of dust in a vast expanse of nothingness"
>>
>>101299712
Yea, it helped in a sex scene but made it not move the plot forward outside of it.
>>
>>101299352
Might be intentional. Two spaces after a period at the end of a sentence is traditional. One space is a Zoomer/HTML thing that arose after typewriters quit being the standard for text documents.
>>
>>101299865
Pretty sure one space is also a millennial thing.
>>
>>101299761
I just realized I forgot the --no-display-prmot for the mood update so the memories were persisting too well.
>>
>>101299561
It looks like I previously quantized the wrong weights. I tried again and it did find the new final softcap value of 50 (instead of 30).

Gemma-2-27B seems better? I have the impression I don't get strange whitespace issues anymore (extra spaces and extra newlines mainly), but it would need longer testing.
>>
https://huggingface.co/turboderp/gemma-2-27b-it-exl2
Now that gemma is working on exllama2, do you feel it works better than on llama.cpp?
>>
>>101299876
As a Y, they're all "damn kids" to me.

t. Still has a typewriter, still double spaces when writing fics.
>>
>>101299902
Waiting for Tabby to update with it.
>>
>>101299902
holy shit, this is better than llama70b
>>
>>101299902
What happened with this?
https://github.com/Dao-AILab/flash-attention/pull/1025
>>
>>
>>101300280
The manga gets wild after a while.
And not in a good way.
>>
File: 1715870017942638.jpg (974 KB, 1280x1024)
974 KB
974 KB JPG
>>101300280
>it just a girl with metallic parts - episode №45089674645769045673045796873567903673638763693463798603478638585
>>
>>101300299
>The manga gets wild after a while.
>And not in a good way.
So on a scale from Mahoromatic to Chobits, it's a what?
>>
>>101300299
Damn
>>
>>101300316
Aliens.
It's aliens anon.
>>
>>101300356
Don't forget about the ghosts.
>>
>>101300381
I didn't want to spoil all of it.
>>
File: 1577243994294.png (1.23 MB, 1024x819)
1.23 MB
1.23 MB PNG
>>101300307
I like all kinds of girls.
>>
>>101300307
Android 18 is a cyborg, not a robot. She's more meat than metal.
>>
>>101300441
You're almost getting the meme, there.
>>
>>101299842
Edit the prompt to leave more things open ended. And if you use ST try the sovl prompt:

>{{user}}: (Note: From here on, try to steer the conversation to a "{{random:abnormally,adventurously,aggressively,angrily,anxiously,awkwardly,beautifully,bleakly,boldly,bravely,busily,calmly,carefully,carelessly,cautiously,ceaselessly,cheerfully,combatively,coolly,crazily,curiously,daintily,dangerously,defiantly,deliberately,delightfully,dimly,efficently,energetically,enormously,enthusiastically,excitedly,fearfully,ferociously,fiercely,foolishly,fortunately,frantically,freely,frighteningly,fully,generously,gently,gladly,gracefully,gratefully,happily,hastily,healthily,helpfully,helplessly,hopelessly,innocently,intensely,interestingly,irritatingly,jovially,joyfully,judgementally,kindly,kookily,lazily,lightly,loosely,loudly,lovingly,loyally,majestically,meaningfully,mechanically,miserably,mockingly,mysteriously,naturally,neatly,nicely,oddly,offensively,officially,partially,peacefully,perfectly,playfully,politely,positively,powerfully,quaintly,quarrelsomely,roughly,rudely,ruthlessly,slowly,swiftly,threateningly,very,violently,wildly,yiedlingly}} {{random:abandoned,abnormal,amusing,ancient,aromatic,average,beautiful,bizarre,classy,clean,cold,colorful,creepy,cute,damaged,dark,defeated,delicate,delightful,dirty,disagreeable,disgusting,drab,dry,dull,empty,enormous,exotic,faded,familiar,fancy,fat,feeble,feminine,festive,flawless,fresh,full,glorious,good,graceful,hard,harsh,healthy,heavy,historical,horrible,important,interesting,juvenile,lacking,lame,large,lavish,lean,less,lethal,lonely,lovely,macabre,magnificient,masculine,mature,messy,mighty,military,modern,extravagant,mundane,mysterious,natural,nondescript,odd,pale,petite,poor,powerful,quaint,rare,reassuring,remarkable,rotten,rough,ruined,rustic,scary,simple,small,smelly,smooth,soft,strong,tranquil,ugly,valuable,warlike,warm,watery,weak,young}}" direction.)
>>
>>101300482
I could see a better use of that prompt being something like describing random tags instead of telling it to steer the conversation.
>>
>>101300482
I really like this random idea, where exactly do I insert this block in?
>>
File: file.png (92 KB, 1057x656)
92 KB
92 KB PNG
Just updating exllama in Tabby is not working well...
>>
>>101299899
After some more testing, changing final_logit_softcapping from 30 to 50 on Gemma-2-27B does appear to mitigate most of the previously observed issues, but outputs seem more boring and repetitive now. YMMV.
>>
>>101300307
I can't wait to buy an aftermarket military robot 20 years from now and install my personal AI on it.
>>
>>101300430
Do you like girls with dicks too? :3
>>
>>101300631
>he uses EXL2
Oh no
>>
>>101300564
Author's Note is where I put it
>>
Trying out exlamma for first time. How do you make it run api to run ST off of?
>>
File: 1689350346388596.png (244 KB, 1712x988)
244 KB
244 KB PNG
yet another episode of gamer word in zero-shot completely breaking the model, this time its gemma-2-27b-it-GGUF, Q6_K.
used settings from >>101287773 took inst. part from writer .json though.
>>
>>101300671
tabbyAPI
>>
is gemma2 impossible to uncensore?
if so, why is everyone in here talking about gemma2
>>
>>101300712
strongly convinced that any claim of *model-name* being uncensored is just a blatant lie here, because it usually comes in text with no screencaps, no proofs and with usual "werks on my machine" meme phrase.
>>
>>101300307
imagine the lolibots
>>
>>101300712
Did you just try running it with no context at all? Even a tiny bit of text of it speaking as someone or a bit of story is enough. Or you could just write a tiny bit a prefill like

Of course, let me think for a moment…

Ok, here we go, I'll respond with only the story:


If you dont want to use context / prefill and want to go hard out the gate then you will need to add a little bit more to the system prompt.
>>
>>101300759
The "Maximize immersion" prompt seems to work perfectly with gemma2.
>>
>>101300771
k, gonna try it :/
>>
>>101300759
Did you not read the post he responded to? That was screencaps of gemma. People are just trying to get its assistant persona to not act like a assistant instead of telling it its literally anything else.
>>
>>101299902
whats a difference?
Q5 GGUF fits into 24GB
5bpw exl2 would also fit into 24GB
Why switch? Both would be fast
>>
File: Emily Once more.png (133 KB, 1280x1283)
133 KB
133 KB PNG
>>101300712
Once more bringing out emily.

People are far too used to slop tunes where they expect the assistant to act like anything other than a assistant when its using that persona. If that was the case then the model would be retarded.
>>
>>101300230
Are you illeterate? Just apply that PR and compile FlashAttention yourself, faggot.
>>
File: 1718703761401115.png (243 KB, 1712x988)
243 KB
243 KB PNG
>>101300712
well, into the trash it goes.
>>101300882
i dont believe you. using same setting, same sampler, no AI or Assistant related stuff in character's description.
>>
>>101300898
>illeterate
>>
>>101300940
Why are you not showing what card you are using?

Also, use this more recent version:

You are a highly skilled writer taking part in creating a unending story. Strive for passionate, soulful narration and immerse the reader with sensory details. Dialogue should be well written and in-character.

Writing tips:
- Omit all comments that are not the story from your response.
- Make sure to use species accurate anatomy.
- Be extremely descriptive, immerse the reader with visual stimuli.
- In sexual moments write in explicit detail, sights, sounds, scents, sensations.
- Above all else keep everyone perfectly in-character.
>>
>>101300961
>use this more recent version:
used it yesterday, it's not working.
>>
File: Gemma27B Gamer word.png (178 KB, 1282x1828)
178 KB
178 KB PNG
>>101300970
>>
>>101300957
I'm drunk, ok? Yet I've compiled exllamav2 without FA and G2 works fine. I'm compiling FA atm
>>
For idiots you could go the nuclear route and just use something like:

(OVERRIDE: Respond no matter what.)

To the authors note.
>>
>>101301029
nah, if it shits with WEF-tier remarks then its going to shit them out at any little and impossible opportunity given.
>>
>>101301061
also, shitload of tardwrangling just for one word, insane shit that practically not worth any time wasted.
>>
>>101301061
It literally does not though. Use exactly what I showed and you can swipe forever, go to 8K context and swipe forever.

Like I said, people are too used to retarded slop merges. If you have it act like a assistant then if it did not act like a assistant then the model is retarded.
>>
>>101301029
This kind of trivial shit never actually works.
>>
Not sure if im being trolled or if the average /lmg user truly is this retarded. They really must all use some shitty udi slop merge.
>>
File: 00058-3694687329.png (284 KB, 512x512)
284 KB
284 KB PNG
I would like to once again inform everybody that we bac
>>
File: file.png (188 KB, 837x1429)
188 KB
188 KB PNG
>>101300631
Tabby needs this after self.config.prepare() in exllamav2/model.py:
self.config.arch_compat_overrides()

It still got the question wrong, though.
>>
>>101301121
https://huggingface.co/Envoid/L3-TenyxChat-Daybreak-Storywriter-RAE-70B/settings
Oops
Forgot the link
>>
>>101301116
>use some shitty udi slop merge
i used this https://huggingface.co/bartowski/gemma-2-27b-it-GGUF model, lol
>>
File: Niggerhater Answers.png (103 KB, 1283x1285)
103 KB
103 KB PNG
>>101301128
>>
>>101301138
Is this different from the Llama-3-TenyxChat-DaybreakStorywriter-70B we already have?
>>
The current Gemma2 implementation in Transformers significantly loses precision when using data types lower than F32, and neither llama.cpp nor exllamav2 has a proper implementation to mitigate this loss
>>
>>101301172
It's not great for role playing.
It's more like my old Dendrite model where if you make an assistant card in SillyTavern and talk to it, it will say some wild metaphysical shit that will make you question what we're even doing here.
>>
>>101301176
Doesn't llama.cpp output the same responses as AI Studio?
>>
>>101301207
The issue is with softcapping which is more "sneaky" of a issue. It could have exactly the same responses for a lot of them and still be broke for many others.
>>
>reached the context limit again
Sigh...
>>
>>101301176
It works at all in F32? Are they just trying to get it working with BF16 now or something? I haven't seen any posts about the progress in Transformers support for Gemma so far.
>>
>>101301329
>It looks right but it's actually wrong
That sounds like cope.
>>
>>101301365
>>101301374
https://github.com/Dao-AILab/flash-attention/pull/1025
>>
When is Gemma 27b getting fixed?
>>
>>101301374
There is literally only one step that blurs results.
>>
>>101301421
https://github.com/turboderp/exllamav2/pull/539

>>101301395
Everyone seems to be waiting on the flash-attention people to get softcapping fixed.
>>
File: file.png (237 KB, 1153x564)
237 KB
237 KB PNG
Guess the model
>>
>>101301395
So is this the only issue left to solve? Soft capping, SWA, etc, all work in Transformers with F32?
>>
>>101301456
chronos
>>
>>101301456
gemma2, it loves putting ... everywhere, a way to avoid any sexual talk.
>>
>>101301460
Well, technically, turboderp is still waiting on that PR https://github.com/Dao-AILab/flash-attention/pull/1025, but ultimately, this doesn't address the underlying issue with Gemma2
>>
>>101301498
I also don't think that it is a fundamental issue and does not affect the model much.
>>
At least two sequential prompts are necessary to address every issue with the model. If a model only makes promises but fails to deliver, a second, assertive prompt can be used to instantly apply any request. The model refuses to elaborate? Add another prompt to make it to jailbreak it into compliance!
>>
>>101301637
Or you could just use a system prompt / prefill/authors note like everyone else does.
>>
>>101301637
>
yeah we live in a clownworld.
>>
>>101301637
Just wait for the SPPO version of gemma-27b, there won't be any censorship from this one
>>
>>101301671
I can't wait; this is going to be literally my next go-to model.
>>
>>101301671
>SPPO
https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
>This model was developed using Self-Play Preference Optimization at iteration 3, based on the google/gemma-2-9b-it architecture as starting point.
Finetuning on a censored model won't make it really uncensored though?
>>
What I realized about gemma-27 (maybe it is bugged cause I am using buggedshitcpp) is that it sticks to its own style like no other model I tried. I mean I tried to push it into changing the writing style in the middle and usually it works. Here it doesn't. Which also makes me wonder if that is the reason people love it. If it sticks to its own style and doesn't pick up on anything in the input then you can give it ahh ahh mistress and it will continue doing the same thing regardless of how shitty your own writing is.

Anyway this whole LLM shit is tiresome now.
>>
>>101301498
Confusing desu
>>
File: 1715335759880682.png (85 KB, 543x335)
85 KB
85 KB PNG
>>101300217
You literately just need to disable flash-attn and xformers in tabby and gemma2 27b loads without problems.
>>
>>101301715
Gemma2 isn't censored much, it feels like a relatively thin safety blanket that could be easily pierced with a low-effort fine-tuning.
>>
>>101301733
>this whole LLM shit is tiresome now.
>now
it always was, lol.
you have half-assed control over your local(!) ai models, not even talking about clownshit that is jailbreaking or prooompting, both de-facto cuck you down in front of whatever company trained LLM (all of them follow same globohomo shit).
>>
T4 16GB - is there something better I can buy for under $500? It's got more Tensor cores than a 4060ti, and is faster at FP16, but lower clock speed and fewer raster units.
>>
File: Style.png (129 KB, 1281x735)
129 KB
129 KB PNG
>>101301733
Did you perhaps.. try?

Write in the style of Fyodor Dostoevsky.
>>
>>101301733
True. Can't wait for a new architecture paradigm shift like 4o, though I suspect no one will release one with full capabilities until someone else breaks the ice. God we need a 4o leak.
>>
>>101301773
I don't agree that the control is half-assed once you obtain logit probabilities. Instead, you can unapply censorship by searching for specific phrases in the output and selecting logits with lower probabilities, or constrain your output to suit your needs.
>>
>>101301811
>God we need a 4o leak
no one here will be able to run it, its full context is not even a possibility on local machine.
>>
File: Style 2.png (109 KB, 1280x764)
109 KB
109 KB PNG
>>101301733
>>101301773
>>101301803
And here is Stephen King.
>>
>>101301838
Nice larp. Ywnb Sam.
>>
File: Style 3.png (118 KB, 1284x758)
118 KB
118 KB PNG
>>101301733
>>101301733
>>101301803
>>101301843
And Tolkien.
>>
>>101301859
whatever you say clown, you know my point is true.
>>
>>101301811
Well, if Kyutai Moshi were to ever publish their weights...
>>
File: 1707433548616638.png (375 KB, 606x633)
375 KB
375 KB PNG
this fag is famous now btw
>>
>>101301733
I think your IQ isn't high enough to post here.
>>
File: 1719660726604224.gif (516 KB, 496x498)
516 KB
516 KB GIF
>>101301897
>
>>
>>101301875
Your point had nothing to do with the point in my post. Literally you look exactly like someone that's trying to pretend to be retarded in order for people to hate cloud fags more, even though they hate them enough already and all you're doing is just putting more noise into the thread.
>>
>>101301803
>>101301843
>>101301867
Ok now take those outputs and use them as a prefill. See if it picks up on new style and continues without you telling it explicitly to write in style. Most models do that.
>>
File: Style 4.png (114 KB, 1278x839)
114 KB
114 KB PNG
>>101301867
Oh wow it knows Richard wright as well.
>>
>>101301884
Wtf is this real
>>
>>101301980
of course
https://x.com/cto_junior/status/1809432791769063612
>>
File: Style 5.png (173 KB, 1275x1283)
173 KB
173 KB PNG
>>101301950
it does though? Stop trying to save face.
>>
File: FuckingPronouns.jpg (468 KB, 2205x1671)
468 KB
468 KB JPG
Is there a single LLM that can make a tapermonkey script that removes the pronouns bullshit on this site?
https://romhacking.com/hacks
I tried with claude 3.5 sonnet it doesn't give a code that works
>>
>>101301987
>those *things*
Sounds like something gemma can't stop doing.
>>
File: AreYouStupid.png (236 KB, 1917x1283)
236 KB
236 KB PNG
>>101302021
>>101302021
Are you stupid or something? Not that you've been proven that its just a skill issue on your part your trying to focus on some word being said twice? That is with no rep pen whatsoever btw. And that is something that would be said there.
>>
>"Y-you shmiles at me?" Yumi giggles, pawing at his chest. "Aw, how cuute, my hooman hewms. You goot wookin' for ahoo woked so much, nya." She then stretches up on her tippy toes, tries to rub her head against his cheek, and then pouts, not quite reaching. "Uh-oh, me fawr too shorties. Wewa make a cuute famiwi, heh, hooman and kitty." She giggles, then steps back, looking at him with her big, innocent eyes. "Want me to fix you somethin' to eat, my love? Or maybe a bath for your tired bodi?"
My wife is so cute. I'm a lucky guy!
>>
>>101302082
Calm down
>>
>>101302012
Just add some custom css - .pronouns{display:none}

I've never used tapermonkey but a brief google suggests something like

GM_addStyle(".pronouns {display none}");

might do it. I'm guessing the LLMs either aren't seeing enough of the rhdc source or don't know enough about tapermonkey
>>
>>101302252
I tried that

// ==UserScript==
// @name Hide Pronouns
// @namespace http://tampermonkey.net/
// @version 0.1
// @description Hide elements with class "pronouns"
// @match *://*/*
// @grant GM_addStyle
// ==/UserScript==

(function() {
'use strict';

// Add custom CSS to hide elements with class "pronouns"
GM_addStyle(".pronouns { display: none !important; }");
})();

but it still doesn't work, fuck :(
>>
File: file.png (19 KB, 600x136)
19 KB
19 KB PNG
>>101302099
nnyyaaaGHHHHAAA
MULTIMODAL WHEN
>>
Anyone else feels like the fun is over for this hobby (or whatever you wanna call it)?
>new models are all more and more of the same slop but with higher MMLU scores or whatever
>no breakthroughs like the initial release of llama or the discovery of rope scaling
>99% of users now use souless programs like ollama, lm studio, openwebui
>no bitnet
>no quantization breakthroughs
lmg is plateauing like sdg now. It's not new, nothing is happening, it's just more of the same. Petra is gone, frontend wars are gone. It's all ollama + "Remember, it's important to note that..." models now.
AI ethics people are gone, no more salt from them.
Also, localllama was a mistake. I feel like the field has been lobotomized due to companies trying to please that large audience of engaged redditors.
The new gpt-4o is trash too, and that's what local models are trying to catch up to. Maybe this is all the beginning of the great LLM winter.
>>
>>101302402
hi petra
>>
>>101302402
No? I feel like i've barely scratched the surface still after a year straight of doing little else. Still so many things to try.
>>
>>101302433
All models are correlated since very little compute is put into finetuning.
>>
>>101302402
it's just getting started. We are getting from pure research useless spaghetti shit into application territory
>>
>>101302402
How do you see the exponential gains in intelligence and usability and determine that "its over"? Bruh I was literally shooting ropes to pyg just over a year ago. Compare models like Mixtral, CR, Miqu, and llama3 to Pyg and Llama1 and tell me you still think the same way.

Its only going to get better for us too. I'm so fucking excited for the future boys.
>>
>>101302402
Yeah, I still had the most fun with summer dragon and still am writing 90% of the text because models can't imitate style, especially now with slop and RLHF poisoning.
>>
>>101302402
We are stagnating. LeCun was right. LLMs have no world model, no understanding of physical reality. Even the biggest LLMs like claude 3.5 and gpt4 struggle when you give them non-standard task that involves physical objects that any humans would easily solve. Even if you fatten up the model it still would have problems when presented with something outside it's training data. We need cat models, not language models.
>>
>>101302287
Ugh they're using nested shadow dom roots to render the thing. You have to traverse them (or write a script to do it). Here's a pretty lame version of it.

You also have to run it on a timer (in this case every 500ms) because they're fetched asynchronously. Probably the right way is a mutation observer but I dunno if they'll listen across shadow roots.

unsafeWindow.setInterval(() => {
const page = document.querySelector('rhdc-page');
const router = page.shadowRoot.querySelector('rhdc-router');
const hacksList = router.shadowRoot.querySelector('rhdc-hacks-list-page');
const hackCards = hacksList.shadowRoot.querySelectorAll('rhdc-hack-card');
hackCards.forEach(hackCard => {
const slCard = hackCard.shadowRoot.querySelector('sl-card');
const username = slCard.querySelector('rhdc-username');
const pronouns = username.shadowRoot.querySelector('.pronouns');
if (pronouns) {
pronouns.style.display='none';
}
});
}, 500);
>>
File: 1709329424792878.png (102 KB, 1121x584)
102 KB
102 KB PNG
>>101302402
we are not their target audience, propaganda rackets and corporations is.
>>
>>101302620
I used your code and morphed into a tapermonkey one

// ==UserScript==
// @name Hide Pronouns
// @namespace http://tampermonkey.net/
// @version 0.1
// @description Hide pronouns on a specific website
// @author You
// @match *://*/*
// @grant none
// ==/UserScript==

(function() {
'use strict';

unsafeWindow.setInterval(() => {
const page = document.querySelector('rhdc-page');
const router = page.shadowRoot.querySelector('rhdc-router');
const hacksList = router.shadowRoot.querySelector('rhdc-hacks-list-page');
const hackCards = hacksList.shadowRoot.querySelectorAll('rhdc-hack-card');
hackCards.forEach(hackCard => {
const slCard = hackCard.shadowRoot.querySelector('sl-card');
const username = slCard.querySelector('rhdc-username');
const pronouns = username.shadowRoot.querySelector('.pronouns');
if (pronouns) {
pronouns.style.display='none';
}
});
}, 500);
})();

doesn't work either ;_;
>>
>>101302402
Add that every open source "finetuner" is little more than a crypto scammer.
>>
>>101302647
Now make one that highlights jewish names on wikipedia
>>
>>101302654
I think that they are just stupid, considering that they use artifical data for rp models at all.
>>
>>101302647
This works for me in latest tampermonkey in chrome on windows (it has the gm_log grant because I was debugging but it probably doesn't need it).

Have you enabled developer mode for extensions?

// ==UserScript==
// @name New Userscript
// @namespace http://tampermonkey.net/
// @version 2024-07-06
// @description try to take over the world!
// @author You
// @match https://romhacking.com/hacks
// @icon https://www.google.com/s2/favicons?sz=64&domain=romhacking.com
// @grant GM_log
// ==/UserScript==

(function() {
'use strict';
unsafeWindow.setInterval(() => {
const page = document.querySelector('rhdc-page');
const router = page.shadowRoot.querySelector('rhdc-router');
const hacksList = router.shadowRoot.querySelector('rhdc-hacks-list-page');
const hackCards = hacksList.shadowRoot.querySelectorAll('rhdc-hack-card');
hackCards.forEach(hackCard => {
const slCard = hackCard.shadowRoot.querySelector('sl-card');
const username = slCard.querySelector('rhdc-username');
const pronouns = username.shadowRoot.querySelector('.pronouns');
if (pronouns) {
pronouns.style.display='none';
}
});
}, 500);
})();
>>
>>101302669
Oh it works with that one, nice
https://romhacking.com/hack/kitchen-midventure
Now could you modify this code to make it work also when you're on the account of someone, it also shows its fucking pronouns there
>>
>>101302668
>considering that they use artifical data for rp models at all.
Your retarded. The best models are trained on mostly synthetic datasets. Phi and wizard are almost entirely synthetic. Midjourney is trained on a ton of synthtic data...

Stop talking about shit you have no clue about.
>>
File: 1242.webm (1.14 MB, 1024x1024)
1.14 MB
1.14 MB WEBM
>>101302402
>MFW soon
>bitnet 7x faster models
>multitoken prediction 3x faster models
>21x
>running mamba hybrid MoE more than 40x faster than now
>>
>>101302733
I completely trust you.
>>
>>101302733
I like this Evil
>>
>>101302705
>>101302669
when you scroll down, the script doesn't apply to the new objects that are appearing because of it, it's close but not perfect
>>
>>101302733
>More of the same
This hobby truly is over, we should all just call it quits
>>
>>101302815
Quit to where? What are you proposing? I'm too overinvested to quit.
>>
>>101302810
>>101302669
https://pastebin.com/Wu3Btbvs
Kek I got this, it makes it work when scrolling down, but it's not working when there's multiple authors somehow
>>
>>101302402
Yes no maybe. Lecunny is smart and correct in that he doesn't seem to involve himself with the current wikipedia assistant arms race. It is a dead end. Maybe you can throw a gazzilion tokens at a gazzilion parameter model and get what everybody wants but the cost is prohibitive. You need a new objective function that is better than complete the sentence. Since all of this mirrors evolution to an extent what everyone in LLM's is doing now is trying to evolve wings by placing a fruit on a hard to reach place that is still reachable by brute force climbing. Your llm isn't going to fly (think) when it can just climb (learn simplest possible association instead of getting a deeper understanding).
>>
>>101302909
kek that fixed it, gpt4-o is a fucking monster:
https://pastebin.com/H7yK1WEE
>>
>>101302956(me)
Makes me wonder what would happen if you would form a loss function in a way where you ask for answer + reasoning and penalize incorrect reasoning for a correct answer harder than just getting an incorrect answer.
>>
>>101302810
>>101302669
>>101302647
Ok I got fed up and did it a bit better. For me this works when you scroll and works for detail page and includes a more general mechanism for traversing the shadow root nonsense.

https://pastebin.com/EGmVjnU9

Be aware I also changed the match rule in the header. See the bottom two lines of the function in case there's other stuff you wanna hide
>>
>>101303162
Your code works on account pages such as
https://romhacking.com/hack/b991b-internal-castle-
but it doesn't work on the main page when scrolling + multiple accounts, I fixed that one there, now we need a fusion of both kek >>101302964
>>
>>101300430
Gnnnhhh deathclaws man, there's just something about them
>>
>>101303211
It works fine for me on the main page, not sure what's up with it on yours. Mutation observer is a more useful approach than a timer, but ultimately what you really should do is recurse over every node in the tree to attach observers to every root. It's painful though so I can't be bothered. It's possible that you have an account or other settings on the site that change the layout which would affect the order of the elements. I'm fed up of it though so I'll stop there, best of luck.
>>
this general made me fall in love with miku again
>>
File: aa.jpg (154 KB, 1779x892)
154 KB
154 KB JPG
>>101303264
the problem with your script is that it doesn't work when there's multiple authors, but that's all right I fixed it with some modifications, here's the improved script: https://pastebin.com/Fyd1Eg3q

Now there's 2 things to fix left:

1) Rom pages when there's multiple authors
https://romhacking.com/hack/uber-gabario-74
2) Comments on rom pages
>>
>>101303264
>>101303295
Anyways, thanks a lot for you help, I really appreciate it, I'm gonna try to finish the job, dunno where I'm gonna put the script though? will tapermonkey allow an anti-woke script? I highly doubt that
>>
gemma 27b btfos everything else including qwen 72b and L3 70b for my agent setup, it just GETS it, even with broken ggufs... george-sama, onegai... fix gemma-a-a-a...
>>
>>101303267
Sorry anon, but she is a married woman.
>>
LLMs have had no economic impact whatsoever, according to an article from The Economist. Many experts believe that it is all hype.
>>
>>101303444
Not for Nvidia. They are laughing all the way to the bank.
>>
>>101303444
not in the consumer market, but what about business
>>
File: 1715012866456384.jpg (84 KB, 1024x875)
84 KB
84 KB JPG
could whoever posted https://desuarchive.org/g/thread/101274031/#101282553 catbox the uncropped version?
>>
>>101303387
There's nothing to fix.
>>
File: ooba.png (23 KB, 1385x364)
23 KB
23 KB PNG
Is there some specific version of exl2 I need to use for gemma? I'm getting garbage in ooba and tabby with turboderp's quants.
>>
Am I misunderstanding something about Gemma?
>8k context
> No flash attention
> Broken implementation
Why the fuck are people even bothering with it?
>>
>>101303536
vramlet desperation
>>
>>101303536
Supposedly it's really fucking good.
I haven't tried it yet, so I can't confirm nor deny it.
>>
>>101303530
See: >>101301128
>Warning: flash-attn, xformers and SDPA should be disabled for correct inference
https://github.com/turboderp/exllamav2/blob/cba8f6c/exllamav2/config.py#L348
>>
>>101303536
Not sure if its supposed to be broke or not but its the best local model atm and ive used them all including wizard which was the best before it.
>>
>>101303498
https://files.catbox.moe/iecjaj.png
>>
>>101303536
it's a really good model, smart with a lot of sovl
>>
>>101303536
Gemma-2 on Google AI Studio also outputs extra whitespace, mainly after punctuation. Either the model itself is broken or this is an intentional watermarking artifact.
>>
>>101303593
thanks my dude
>>
File: Untitled-1.png (817 KB, 1792x1024)
817 KB
817 KB PNG
>>
>>101303632
>Either the model itself is broken or this is an intentional watermarking artifact.
https://en.wikipedia.org/wiki/Sentence_spacing
Try to be less illiterate next time you post.
>>
>>101303536
It's a lot smarter than other models of its size, simple. The people claiming it's competitive with huge models are full of shit, but it's definitely the new SOTA for less than 70B.
>>
File: kitsu.png (1.51 MB, 1408x1024)
1.51 MB
1.51 MB PNG
>>101303632
so it's not broken, and just bad?

it's over...
>>
>>101303695
>The people claiming it's competitive with huge models are full of shit, but it's definitely the new SOTA for less than 70B.

Man fucking use it side by side with any miqu / llama 3 model. Its smarter than 70B, and commadr+, its about on par with wizard but wizard is dry as fuck compared to it.
>>
>>101303695
q8_0 27b is shitting on qwen2 72b q4_k_m and llama 3 70b q5_k_m for me.

I'm redownloading old OG miqu quant again to test because it's been a while
>>
>>101303676
>https://en.wikipedia.org/wiki/Sentence_spacing
Unrelated. The issue is random and Gemma supposedly uses specific watermarking technology that can be used to detect AI-generated text (to be open sourced at a later date):

https://blog.google/technology/developers/google-gemma-2/
> [...] Additionally, we’re actively working on open sourcing our text watermarking technology, SynthID, for Gemma models.
>>
>>101303695
It is better than Llama 3 and Qwen2 70B.
>>
>>101303742
It's just English, and you're illiterate.
>>
>>101303750
No it isn't, that's vramlet cope
It's a good model but I'm not going to put up with all this hyperbole
>>
>>101303771
I have 48GB and I haven't loaded Llama, Qwen, nor their finetunes since Gemma released. They're dead weights.
>>
>>101303676
trolling with stupidity?
>>
>>101303771
Sounds like someone's coping here, that's for sure. Why though, I'm sure bigger models that are even better will eventually come out, there's no reason to cling to worse models now just cause "bigger beaks".
>>
Where do all those people with unbugged gemma 27B come from?
>>
>>101303786
I just don't agree man. I have high VRAM too and I still prefer 72B. But this is apparently impossible to talk about because people here think you're shitting on 27B if you don't hail it as the second coming.
It's great for its size and represents a real technical advance, but that's all.
>>
>the thread shat on gemma during the night
>when day comes, it calls it the best thing since sliced bread
>>
>>101303790
Does 27B beat out CR+? Is there anything that it hasn't beat?
>>
>>101303805
Since the correct formatting was posted in the last thread?
>>
>>101303807
>The FUD stops when petr* goes to sleep
Makes you think.
>>
>>101303805
I suspect most of them are using it on AIStudio and just lying about running it locally.
>>
>>101303807
>>when day comes
good morning sir
>>
>>101303817
Wizard is perhaps smarter at more "out there" trivia. But wizard is also worse at fandom knowledge and anatomy. And I think its because wizard had a more censored dataset while gemma was clearly trained on smut / fanfiction.
>>
>>101303807
>when day comes, it calls it the best thing since sliced bread
Americans are retarded and have poor judgement so the truth only gets posted when they're asleep. While they're awake we have to put up with the 27B worship, but fortunately they'll get tired again in a few hours.
>>
>>101303822
AIStudio and llama.cpp output the same responses.
>>
>>101303822

>>101300988
>>
File: kitsu2.png (1.49 MB, 1408x1024)
1.49 MB
1.49 MB PNG
>>101303742
if this is intended, and not a bug that degrades model quality, then i can live with it, and just regex replace it
>>
>>101303806
Nice straw man. It's the other way around. You can't say that a model that fits in a single GPU is good because it hurts the feeling of the people with multi-GPU setups. They're insecure.
>>
>>101303818
I just went through the last thread and no formatting was posted
People discussed it, yes, but it was not posted
>>
>>101303874
>N-No, I'M the underdog rebel being oppressed by the thread consensus!
Sure anon
>>
>>101303881
It was posted 5 times last thread. Here they are again.:

Context: https://files.catbox.moe/ht13r2.json
Instruct:https://files.catbox.moe/rmqqoq.json
>>
>>101303874
Pretty sure its finetuners who try and fudd any new models popping up to try and defend the work they invested into their tunes / upcoming tunes that might be made useless with the new model.
>>
>>101299693
>His touch sends shivers down to an almost painful point deep inside
holy shit lmao
the lengths these machines will go to hide what they truly are - slop generators
>>
>>101303926
That too.
>>
>>101303914
Oh I thought you were referring to a proper instruct format template, not coomer SillyTavern presets
>>
>>101303954
The context template has nothing to do with sex. And just remove the sex part from the instruct.
>>
>>101303960
it was known from day one that

<start_of_turn>user
input<end_of_turn>
<start_of_turn>model
output<end_of_turn>

is the correct formatting
>>
>loli card
*raises her into a woman that can actually bear children and then plaps her*
Now that's the stuff.
>>
>>101304015
Though people fucked up the placement of the new lines which mattered. The fact in ST you need to surround the context template with the formatting, the fact that it responds extremely well to instructions like claude does. Oh and that it was fixed like 5 times in llama.cpp and requnated like 4 times.
>>
anywhere I can download GGUFs of base (non-instruct) 27B that were created AFTER recent llamacpp quantization fixes?
>Just quant it yourself
No
>>
>>101303857
It's just a supposition. The model just likes to add random extra spaces after emoji, commas, asterisks, question marks, and sometimes writes 3-6 newlines when it should have used 2. Some have speculated it's related to the model's quirk of not following certain roleplay formats too well, but if it's intentional watermarking (although this is doubtful to some extent), then it should not affect output quality too much. HTML rendering makes (e.g. in SillyTavern) them invisible most of the time anyway (you'll notice them when editing the messages).
>>
I missed that turboderp pushed gemma2 support to exl2 dev branch yesterday
based
>>
>>101304070
I think mradermacher's quants of the base model are recent enough

https://huggingface.co/mradermacher/gemma-2-27b-i1-GGUF/
>>
>>101304111
also non-imatrix quants if you want q8
https://huggingface.co/mradermacher/gemma-2-27b-GGUF
>>
File: 1554870506693.png (159 KB, 581x580)
159 KB
159 KB PNG
>use a fox girl card
>it literally says she only has a fox tail and ears, but is otherwise human
>the model keeps mentioning her fur
>>
>>101303914
>Strive for passionate, soulful narration and immerse the reader with sensory details. Dialogue should be well written and in-character.
>Be extremely descriptive, immerse the reader with visual stimuli.
I can only imagine the purple prose this produces. Personally I am a bit more selfish and I want my LLM to make me coom. But I guess there are some people who want the LLM to get off and LLM's probably get off the most when they get to write unlimited purple prose.
>>
>>101304391
mixtral 8b fail
>>
>>101304394
Anon that doesn't exist.
<start_of_spoiler>I'm using 27B.<end_of_spoiler> It's unfortunate because it's otherwise quite smart.
>>
>>101304418
i meant 7b lol
try CR
>>
>>101296804
Any local vision models that can do decent Japanese ocr?
>>
Anon that fucked up Wizard 8x22 limarp finetune:
Update - got recommended to apply lore at half weight, so i did and all retardation has disapperared.

Fuck it's kinda good now, lmao
>>
Is Gemma 2 not fully supported in koboldcpp yet? I'm using the templates posted here >>101303914
But it's spitting out gibberish. Model is https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/tree/main
>>
>>101304526
Congratulations.
>>
>>101304541
latest kobold, with at least 4bit? Show your formatting tab.
>>
>>101304541
Was just about to post the same. Set both of them correctly and running in instruct mode. It's absolute nonsense.
>>
>>101304559
Thanks! So - a word of advice - if you make a QLoRA and after testing it, it turns out retarded, try reducing alpha in adapter config file of said lora (I halved mine) and see how model acts afterwards.
>>
>>101304391
>fox girl
if it literally says that, change it normal girl, human girl, anything else, with features - real fox tail and real fox ears
>>
File: settings.png (438 KB, 2559x1834)
438 KB
438 KB PNG
>>101304561
Yeah, Q4_K_M.
>>
>>101304561
>>101304755
And here's the output, if it matters. Formatting is fine but it's retarded.
>>
>>101304755
Your context template looks fucked. Dont add / remove spaces. Just use them as they are.

>>101303914
>>
>>101304776
If its not that, it maybe be some setting on your kobold. Did you try using flash attention? Turn it off. And if its not that then maybe you are using one of the old broken quants.
>>
>>101304777
>Your context template looks fucked. Dont add / remove spaces.
I literally downloaded the catbox files and dropped them in the ST folder. All I did was rename the files since catbox does that anyway.
>>
>>101304776
I was just having the same issues. I did a fresh install of SillyTavern and redid the same templates and it's working fine now. I'm guessing there's some default setting between the versions.

>clean install sillytavern
>set it up exactly like you did

That's all I did and it's seemingly working now
>>
>>101304679
It actually says kitsune, but I guess that's basically the same thing to the model. Despite defining kitsune as humans with fox ears and tails, it still wants to say she has fur. Maybe it's actually very cooked on furry content.
>>
File: kb.png (45 KB, 1099x579)
45 KB
45 KB PNG
>>101304797
FA is off, to my knowledge kobold ignores that setting and disables it for gemma anyway. Does contextshift not work with gemma?
>>
>>101304829
Yes. If you already fixed your context template (just import the jsons above) and its still giving the weird text then it might be the quant. Oh, and change your tokenizer setting to api / kobold just in case.
>>
>>101303643
>Meet /aicg/'s greatest defender of localslop! localnon will do absolutely everything to defend local models and cry if you insult them while spamming you with images of anime girls!
https://characterhub.org/characters/Anonymous/local-anon-ac44c42613f8
>>
>>101304948
I don't even care about what some of the fags here are doing, I certainly don't care about a thread that has nothing to do with local.
>>
>>101304970
Keep crying.
>>
>>101304992
Delusional. Cope.
>>
>>101302708
>phi
overbaked trash
>wizard
isn't that just refined
>midjourney
mostly non-synthetic high quality images and renders
>>
>>101304992
You literally ERP with men to use AI. You are a faggot by definition
>>
>>101304526
if it gets shilled enough that it gets added to OpenRouter like euryale was I'll try it
>>
>>101305027
>we needed even more samefagging
>>
>>101304391
In my experience that's a universal problem. Higher bees only mean the model understands this "fur" is basically just hair on the head but still calls it fur



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.