[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101243128 & >>101234947

►News
>(07/02) Japanese LLaMA-Based Model Pre-Trained on 2T Tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101243128

--Paper: Meta 3D Gen: >>101243744 >>101243744
--The Mystery of X Quants and NEO Tech in AI Models: >>101247045 >>101247119 >>101247297
--Teaching LLMs to be Creative: Beyond Predictive Convergence: >>101244179 >>101244387 >>101244521 >>101244793 >>101244552 >>101244766 >>101244553 >>101244622 >>101244745 >>101245063
--Quantization Model Comparison: Q8_0 vs Q8_0_L Performance Statistics: >>101243361 >>101243421 >>101243499
--Open-Source 16 Channel VAE ostris/vae-kl-f8-d16 Released: >>101249196
--Manual KL Divergence Test Reveals Small but Noticeable Difference Between Q8_0 and Q_8_L: >>101245221 >>101245265 >>101245742
--LLMs, Propaganda, and the Exploitation of Human Psychology: >>101243617 >>101243822 >>101248551
--LLMs are not a Brain in a Box, Use them as a Tool: >>101247017
--KLD Test Results for Q2_K and Q2_K_L vs FP32: >>101245970 >>101245981
--Hugging Face's Gemma Sliding Window Attention Pattern is Broken: >>101245529
--Gemma 2's Attention Mechanism is Intentionally Crippled: >>101243390 >>101243716
--CyberAgent's New LLaMA-Based Chat Model, Calm3-22b-Chat: >>101243666 >>101243743 >>101243951 >>101244135
--Anon's Model Testing: iMat Quants and Quantization Effects: >>101246063 >>101246190 >>101246260 >>101246374 >>101246955
--The Myth of Fewer Instructions for AI Models: >>101246046 >>101246077 >>101246097 >>101246117 >>101246224
--Protecting LLM Apps from Prompt Injection Attacks: >>101246827 >>101247911 >>101246931
--Oogabooga Safety Concerns and Isolated Web Services: >>101243368 >>101244563
--CXL IP Allows GPUs to Expand VRAM Capacity with PCIe-Attached Memory or SSDs: >>101245115 >>101245204 >>101245292
--Story Categorization and Summarization on Linux with 4070 ti super: >>101243253 >>101243363 >>101243405 >>101243682 >>101244034 >>101244105 >>101243471 >>101243479 >>101243735
--Miku (free space): >>101245674 >>101245912 >>101248589

►Recent Highlight Posts from the Previous Thread: >>101243138
>>
>>101250468
Testing power tools with Miku
>>
>>101250343
just something that compresses an image or part of it to some intermediate representation, latents, on which the actual image generation/perception is done.

>>101250423
doesn't mean from-scratch trained VAE will be in an yway compatible with SD's. There are other VAEs trained by other people, and usually it's not that hard to train one, but people don't do it as much.
>>
File: 1692379851934720.jpg (647 KB, 1856x2464)
647 KB
647 KB JPG
>>101250468
>>
>>101250468
>(07/02) Japanese LLaMA-Based Model Pre-Trained on 2T Tokens: https://hf.co/cyberagent/calm3-22b-chat
This is not news worthy.
>>
>>101250544
Maybe it replaces shivers with hora and baka.
>>
This guy is getting on people's nerves it seems.
>>
>>101250544
People have been talking about the new Gemma despite them being too tiny to be worth using. Might as well include other new tiny base model releases even if nobody but poorfags will bother.
>>
Did a LLM ever give you feels?
>>
>>101245115
On this PCIe RAM storage solution, I've been wondering for a few years now if it'd be worth implementing it, didn't seem it'd be considerably much better than your typical cpumaxx setup, only slightly, but if you do this, you probably could make it work with something like geohot's 4090 driver hack to do p2p between gpus, now instead do p2p with that device https://github.com/tinygrad/open-gpu-kernel-modules
Maybe a better solution would be some sort of pluggable RAM sticks for a GPU, you could imagine smaller sticks than typical ones, some 12-16 slots, and you could populate your sticks appropriately (select via chip select the appropriate one), similar to the motherboard RAM slots, but with smaller sticks so you can have higher bandwidth. You could probably make it work with some 382/512bit bus, but if you're starting to reach 1TB of RAM it doesn't look much better than a CPUmaxx setup, maybe just 2-4x at best. I'd still like to see something like it.
>>
>>101250599
22B is hardly tiny, but it is more realistic for average people to run than the hundred beak models that get more attention.
>>
>>101250614
Yep. Right before it started forgetting context.
>>
>>101250635
lmao this
>>
>>101250468
>Japanese trained model
Does that mean it's BASED?
>>
>>101250618
A video card with SODIMM slots would be pretty cool.
The latency and speed characteristics are completely different from GDDR, but It would probably make for a killer AI card if the GPU itself had a wide enough bus to connect to the sticks.
>>
File: Capture.png (1.21 MB, 952x745)
1.21 MB
1.21 MB PNG
New cpumaxxfag server just dropped.
>>
How does BigTech™ tune their LLM hyperparameters if each training iterations costs millions of dollars of compute?
>>
>>101250468
Cute and lovely Miku!
>>
>>101250835
scaling laws
they do hparam search on small models and they already trained their model so many times they have the correct power law scaling built in, you just drag a slider up when you're ready
>>
>>101250835
Fiat currency. It has no actual value.
>>
File: Capture.png (156 KB, 642x543)
156 KB
156 KB PNG
>>
File: 1700280357992518.png (493 KB, 762x705)
493 KB
493 KB PNG
>>101250835
By giving billions to nvidia to get their very own stack of 150K H100 GPUs and building datacenters across the country to support them because the power grid of most places can't handle more than a couple dozen thousand H100s at once.
>>
>>101250877
kek
>>
>>101250871
Intredasting. Also, have there been any efforts to apply neural architecture search to LLMs? That used to be all the rage for image models.
>>
>>101250661
they had been trying to do that for a long time since most model only had English focused dataset
>>
magnum just spams me with random chink runes 24/7 and russian shit.
>>
>>101251121
Works on my machine
>>
>>101251121
Magnum is a Qwen, it seems to have a Chinese language panic mode if you corner it about Tienanmen Square. Also happens if you use settings it doesn't like.
>>
>>101251121
Let me guess, you're retarded and have smoothing set at ~0.3
>>
>>101251120
Blesse those fucks
>>
Using Gemma 27B and honestly, it feels enough for me in terms of quality. It's not perfect, but I don't need better, and it's fast relative to CR+, and maybe it could get faster over time. The only problem is the context length. And I fear it probably won't get much better even with SWA correctly implemented.
>>
>>101251168
.93 temp
.05 minp
1.15 pen
128 pen range
0 penalties, 0 smoothing.
makes no sense. no matter if i use base qwen72b or magnum72b, they both just spam me with the same shit. 13 different random languages, then spell the alphabet at me endlessly. i'm using chatml. maybe it's a broken gguf or maybe i'm just cursed. never had issues with cr+ or miqu - no matter my settings.
>>
>>101251235
Try it without CuBLAS.
>>
File: 1711832655596039.png (7 KB, 605x89)
7 KB
7 KB PNG
>>101251235
Neutralized smoothing is 1. 0 turns the model schizo.
>>
>>101251260
weird, works without cublas. thanks. >>101251274
yeah that's a flatout lie but ok.
>>
>>101251432
If you had MMQ on you might try CuBLAS but no MMQ.

I just remember having to fiddle with those settings to get it to function.

Anymore I run everything CuBLAS no MMQ, which is annoying because Kobold doesn't save the preference. Have to turn Flash on every time too. But I've already complained about Kobold today in a previous edition of the thread I think.
>>
>>101251453
Not that anon but why not launch it from a .bat file with the settings you want?
>>
>>101251468
Because I'm retarded so I just open terminal and type koboldcpp.
>>
>>101251453
kcpp has a save/load settings, double click the saved settings file and select kcpp to load it then it'll load your model/settings by double clicking that settings file
>>
Hello. Are 30Bs still retarded by design?
>>
>>101251492
Then I have another file in my home to sift through and I'm already doing that for model selection. For that many clicks I may as well just toggle the switches.
>>
>>101251519
>that many clicks
? you set it up once
>>
>>101251538
Load, pop up, scroll around, click the file, lick open. Then if I want to change model all that again. And then the layers needs to be fixed, and some models have different context. So then it's a pile of settings saves, one for each model.

Maybe if I find the One Perfect Model, but right now I'm faster not having to futz with even more config files fucking around.
>>
nani the fuck is wrong with my llamacpp build and why is koboldcpp so much faster
>>
>>101251516
Um, they're called 34Bs now
>>
>>101251588
>the fuck is wrong with my llamacpp
perhaps you didn't compile it with CUDA support?
>>
>>101251595
Okay, are they still inherently worse than everything else including smaller models because of mishaps with the training
>>
>>101251587
you sound autistic
>>
>>101251588
Because Kobolds are naturally servile and enjoy helping their masters as best they can.
>>
>>101251605
I did with -DGGML_CUDA=ON
>>
>>101251611
>looks left
>4chan
>looks right
>/g/
You might be on to something.
>>
>using Gemma
>getting into the flow
>the generation suddenly stops
>oh what happened... oh
>>
>>101251588
ok I figured out the difference
kobold had n_ctx set to 2k where I had it unspecified, so it defaulted to 8k
>>
I added Gemma2 training support to qlora-pipe. Because you can't use flash attention (the script forces it to eager attention for you), VRAM usage is kinda high at larger context. For LoRA, can only do about 1k context on 24GB VRAM. 2x24GB easily trains 64 rank LoRA at 4k context though. QLoRA would get you higher context on one card, don't know by how much. Be wary of trying to train on >4k context, I dunno if the Transformers sliding window works correctly. On day 1 of the model release inference was working for me at <=4k but not >4k, it would crash. I'll try long context training eventually.

From a quick test, the 9b takes to fine tuning extremely well. IMO once all the issues with backends and implementations are settled, people will realize how good this model is. Beats llama 3 8b by a long shot.
>>
Guys please help me. If I wanted to run something like a personal assistant that can read the documents I have like chatgpt, can I do it locally? I have a spare laptop that I'm not using right now and having shit like that sounds great for me.
>>
>>101251790
Based, ty qlora pipe anon. Will you grace us with a storywriter tune for Gemma next?
>>
>>101251516
every model is crippled and retarded by design
they don't want open source users to have true power
>>
>>101251902
I'll try to train something, probably, but no guarantees. After all I tried many many times with llama 3 8b and the result was always bad, so I never released anything. Hopefully gemma will be different, the model does feel significantly less "overcooked" and just more flexible than llama 3 does.
>>
>>101252012
Now we just need a daybreak lora and a tenyxchat fine-tune
>>
>>101251790
>Beats llama 3 8b by a long shot.
by what metrics
>>
>>101251790
Nice! Have you considered adding support for any of the DPO modes, btw, e.g. SPPO?
>>
>>101250835
They have private power plants.
>>
>>101251790
>vramlets back to <8k context
i'm gonna end it
>>
God, gemma 27B is the first model im actually wanting to use over claude now. It finally happened. If only the context was bigger.
>>
>>101252517
and I dont just mean its intelligence. It writes like claude without the claude'ims and is the first non claude model that knows my fandom well. It was clearly trained on fanfiction unlike llama and such.
>>
>>101252517
>>101252540
buy
>>
>>101252620
k
>>
is there any way to give gemma2 vision ability in koboldcpp?
i know that with llama3 there were those mmproj files.
>>
I tested Gemma and it's smart but not that smart. It does write pretty well though. But it also does have some slop still. I tested in both mistral.rs and Llama.cpp. I would say it's great for its size and you're probably not getting anything better <70B. As long as you're fine with 8k or potentially even just 4k context.
>>
Is Gemma working in llama.cpp/koboldcpp yet?
>>
For gemma 27B use L3's instruct system prompt:

You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.

And make sure to use <start_of_turn> and <end_of_turn> for reach side's start / end of responses

>>101252688
Only with 8k context but im using it with kobold atm.

With that it is the best local model I have ever used and I used wizard as my daily.
>>
>>101252699
>Only with 8k context but im using it with kobold atm
Is Kobold 1.69 (nice) Gemma-ready? I see that on AUR.
>>
>>101252771
Yes, same faked sliding window though. Hopefully someone could figure that out and we can have long context.
>>
>>101252699
Thanks, I will try Gemma then.
>>
>>101252699
WizardLM2 8x22B? If so, I would not agree with that based on my testing so far.
>>
>>101252801
What did you use? I tried llamacpp and it was meh. Then i tried the latest kobold with the llama instruct system prompt as recommended by someone else + its start / end of turn prefix / sufix and it suddenly is 100x better.
>>
>>101252785
>faked sliding window though
I saw comments about that earlier but I didn't really get it. Something like it's implemented but it's broken so it screws stuff up instead of makes stuff better. Is that the situation?
>>
>>101252250
The dev branch (which is now very out of date) has DPO implemented. Didn't get to test it all that much, but it appeared to work. I need to get around to rebasing it on top of main at some point and cleaning it up.
>>
>>101252823
No one knows how to properly implement it apparently.
>>
>>101252810
I tested it in mistral.rs and llama.cpp on the latest commit with self-made quants. On Mistral.rs I just used it in interactive chat mode, and threw some test questions at it. With llama.cpp I went into ST and made a copy of the Llama 3 presets where I then replaced the strings with Gemmas. It was smart, but definitely still less smart than 8x22B, or 70B, for that matter.
>>
>>101252853
What do you mean by smart though. I didn't try coding with it for instance. Just creative writing / RP in which case not only is it smart but it blows away anything else I ever tried in the prose / fandom knowledge department. It writes sort of like claude but without its isms which I love.
>>
>>101250702
Does that have 48 channels?
I thought EPYC only did 12 channels per socket
>>
Also I found gemma slightly like mixtral that it needs just a bit of instructions on what its supposed to do. I started with llama 3's "you are a actor" system prompt which suddenly made gemma amazing but I'm working on a better "you are a writer" one.
>>
>>101252866
I tried doing an RPG card with it and it just didn't understand the situation where big models didn't have any problems. It also couldn't do some of my very proprietary and private donut steel test questions. I'm happy for you if you primarily need good prose and trivia knowledge though.
>>
>>101252897

>>101252902
Try again with some sort of system prompt for the purpose. I think that may be a big deal with gemma here due to it not apparently being trained for a "system role" according to google. It probably needs to be told what role to play.
>>
>>101252875
I would assume it's just 2 dimms per channel
>>
>>101252875
I'm pretty sure it would be 2 DIMMs per channel, so you wouldn't get a speed boost compared to the existing 24 DIMM servers. This would be for (very slowly) running the larger models.
>>
>>101252919
I already said I used the Llama 3 presets. And yes I know it doesn't have a system role, I just put the user prefix instead there. Still, it couldn't answer some questions even when I tested it on mistral.rs and lmsys which I used for verification. It's a model issue.
>>
>>101252947
Ah, I was talking about the rpg thing. Also google said they purposely did not train it on common benchmark answers which may be the issue there.
>>
>>101251516
No, Gemma 2 is better than the 70Bs.
>>
>>101252957
My test questions involve simple world model scenarios where a human would understand it immediately but many LLMs would fail, and I'm pretty confident no benchmark has anything like what I thought up of, so that would not matter. In any case, if I could tell the difference using those questions, then I'm quite certain that the worse performance compared to >70B is, in fact, normal.
>>
>>101253002
fair enough. For creative writing though its my fav so far at least.
>>
>>101252823
You're better ignoring all comments about it because it was mostly petra samefagging.
>>
>>101253057
I don't know what that means.
But I do remember controversy about Gemma 2 27B having fucked up data. Is there a problem with its release or its quants or whatever, or was the model being blamed for the lack of support in the software trying to run it?
>>
>>101253070
Jesus Christ. You're fucking retarded.
>>
>>101253070
It was fixed for I think the 5th time today I think, it had to be requanted 4 times?
>>
>>101253013
Yeah I could at least see that. I do like its writing style more than other models, for now. But as with all models, the honeymoon period will end at some point, until we get models that can evolve over time.
>>
When was the last time you read a real, full novel? How many novel's worth of gens have you read through by now? Has it already exceeded the amount of novels you've read in your life?
>>
>>101253080
Yay! The guy who calls me retarded is still here! It'd been so long I thought maybe you found a new hobby. And you added blasphemy, just in time for 'Murica Day.

>>101253082
I lost count with how much "it's fucked/no your software's fucked/both are fucked/nothing's fucked just use *this*" that was flying around.

>>101253113
Writing novels is more fun than reading them.
>>
>>101253113
I read a ton but these unlimited cyoa's have had me addicted so its been while. I do read some ongoing stories on roalroad though.
>>
But as soon as we get some real unlimited context going it would take a incredible book to get me back to normal books. I can sort of jury rug me together a 200k word long D&D campaign / cyoa / ai assisted novel but having to use tricks like lorebooks only works so well.
>>
File: shivermaxing.jpg (36 KB, 931x323)
36 KB
36 KB JPG
>>101253129
>royalroad
Checked it out with the vague hope for a great dataset but nope, 32 pages worth of shivers.
>>
>>101253247
Like every other website obviously 99% is gonna be trash. Some truly great stories on it though.
>>
Gemmapill me on Gemma.

I heard it's better than Sonnet 3.0?
>>
>>101253247
Here's a good one for a start

https://www.royalroad.com/fiction/48969/jackal-among-snakes/
>>
How does Gemma performs with smut and general RP? Best experience I've ever had was Fimbulvetr back in March.
>>
>>101253247
>>101253268
Damn, just noticed he did the old go commercial and put it on amazon thing. All of it is still available here though.

https://novellive.net/book/jackal-among-snakes/chapter-1-valhalla-bound-at-the-speed-of-sound
>>
>>101253277
Best local atm imo. Scroll up, it needs something like the system prompt I put above + its prefix / suffix.
>>
me: 27b model can do everything I want
lmg: faggot! You need 400b to solve the apple and sister puzzle correctly
>>
>>101253313
Well, they are right, cuz the 400B model is way better for roleplay.
>>
>>101253113
Last week. It wasn't a good one, but it was definitely written by a human.
With some franchises, it's not so much about reading good literature, but rather about keeping up with official world canon.
>>
>>101250599
>NOOO YOU CANT ENJOY THIS UNLESS YOU HAVE 400GB OF VRAM
>>
>>101253321
for nsfw roleplaying games i go to a roleplay forum.
What does it matter if a machine or a real fag writes to me?
You don't have to solve every problem with AI
>>
>>101253324
>But... B-But your model can't solve the puzzle...
>>
>>101253345
>roleplaying with real people
You should go back.
>>
>>101253268
>>101253289
Did not have high hopes going into it but I'm on the third chapter and there's no slop.
>no shiver
>no barely above a whisper
If only this was the standard for the rest of online fiction, we'd be in a whole different scenario today.
>>
>>101253345
You need to go back.
>>
>>101253351
If you like it, your counterpart has a meat whip for more. I imagine an A100 to be painful even with a lot of lubricant
>>
>>101253301
Does it actually not broken and performs good? Last time I've checked was Llama 3 release that had broken GGUFs for quite a while and, even with fixed weights, it was either starting to repeat itself instantly or abruptly ending chat altogether.
>>101253345
>roleplay forum.
I don't want some balding greasy ugly bastard to roleplay as Beatrice from Re:Zero that needs CORRECTION. I can only handle one ugly bastard and that's the guy in the mirror I see everyday.
>>
>>101253375
I'm using kobold atm. I tried llama.cpp yesterday and it was apparently broke still cause it was night and day.
>>
how long do cpumaxxfags spend on prompt processing for long contexts?
>>
File: the_requantizer.png (166 KB, 1024x1024)
166 KB
166 KB PNG
>>101253082
Oh yeah it's been great
>>
>>101252699
I made Silly Tavern jsons. These look OK?
https://files.catbox.moe/29oyhh.json
https://files.catbox.moe/3ib649.json
>>
>>101253470 (me)
I missed a <|eot_id|> in the instruct one. Fixed:
https://files.catbox.moe/f3j30m.json
>>
>>101253277
>Best experience I've ever had was Fimbulvetr back in March.
...
That's fucking sad.
>>
File: Gemma.png (151 KB, 1286x1230)
151 KB
151 KB PNG
>>101253470
Just doing this atm. Simple as can be. Still working on it.
>>
>>101253491 (me)
And in the context one too. Sorry, am retarded.

https://files.catbox.moe/u0acve.json
>>
>>101253492
Can't do much with 1070 and 32gb of RAM. Had my fun regardless.
>>
>>101253470
>>101253499

Though adding the user / model part did not seem to hurt anything. Will need to see if it helped any.
>>
Another note, adding something like "characters play a active role" helps. Gemma honestly feels more like a completion model than a instruct tune to me.
>>
>>101253512
Just make sure you use the fixed ones. I left eot_id in context and instruct in the first links.
>>
Already slightly improved for my purpose

You are an highly rated author writing a story based in --- universe. Portray character's faithfully and realistically.

Also something might still be fucking broken cause I just got a response that went into some garbled mess half way in with random spaces / new lines.
>>
>>101253470
Mind posting what sampler settings you're using also?
>>
so let me get this straight
iq quants are slower than q*_k_* quants when offloaded to ram but better in terms of output (in same bpw class of course)
>>
File: spring.png (19 KB, 682x267)
19 KB
19 KB PNG
I really wish I could pin the "start reply with" box to one side in SillyTavern. I use it constantly.
>>
>>101253610
Yes. Inferencing on CPU with IQs is going to be slower but in theory you get more quality versus a Q2 or Q3 K quant.
>>
>>101253680
So if i'm forced to offload, just use the biggest Q quant able to fit in vram+ram?
>>
>>101253698
iirc whole model is gonna be loaded into your RAM regardless of how much layers you've offloaded to GPU
>>
File: 1695511767256918.jpg (72 KB, 1079x998)
72 KB
72 KB JPG
>petra_00002 says The Coffin of Andy & LeyLey is trash
>play it
>it's good
>>
>>101253728
Only on Windows.
>>
>>101253629
Stationary objects can't pass through portals, btw.
>>
>>101253810
huh? is this real? that's stupid.
>>
File: BOS.png (40 KB, 653x646)
40 KB
40 KB PNG
Figured out the issue I was having with gemma, it needs the <bos> token for the first message. It was significantly dumber without it.
>>
>>101253863
yeah that's the correct response to most things in windows
>>
>>101253842
Stationary relative to what?
>>
>>101253936
Your mom.
>>
L3 70b Euryale replies with Anon sometimes which is clearly slop taken from role plays. In St I try banning anon under the sampler settings but it still does it sometimes. Anyway to prevent this?
>>
File: file.png (384 KB, 2122x1074)
384 KB
384 KB PNG
>>101253884
what backend?
>>
What's the current best model for 8bg VRAM and 16 RAM?
>>
>>101254036
kobold
>>
>>101254038
Would like to know that too, but probably stheno-3.2, maybe gemma will be a contestant soon?
>>101254054
Rude cunt.
>>
>>101254038
https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF
>>
"The night was still young"

Yep, there's claude in gemmas veins.
>>
>>101253942
Fixed for real this time I think

Context: https://files.catbox.moe/67etsj.json
Instruct: https://files.catbox.moe/sitxtr.json
>>
>>101254131
Can you please present the sampler settings and AI response formatting tab (where context and instruct are). Gemma doesn't want to follow the formatting for some reason.
>>
>>101253936
Map coordinates.
>>
>>101254038
>>101254065
I'm on 24GB VRAM, but I still use Stheno because I just like its outputs. Whatever slop Sao fed it, it's slop of slightly better quality than the usual.
>>
File: Gemma Sampler.png (224 KB, 1909x1321)
224 KB
224 KB PNG
>>101254164
>>
Huge if true https://panmnesia.com/news_en/cxl-gpu-image/
>>
File: Gemma27B2.png (313 KB, 1913x1275)
313 KB
313 KB PNG
>>
>>101254227
too long and complicated for a retard like me, can you make a little tl:dr anon?
>>
>>101254235
And before anyone says its responding for multiple characters at once that is due to me telling it to write instead of rp. I prefer it.
>>
>>101254205
those are some weird sampler settings
>>
>>101254242
nothingburger. no way in hell nvidia is going to go for it when they are raking in cash as they are and if they dont support it then its dead in the water for this field.
>>
>>101254242
Provided that AMD, Intel, and Nvidia would like to support CXL for memory expansions.
its literally nothing
>>
>>101254253
>>101254257
i fucking hate corps
>>
>>101254235
this output is literally every model ever, from ancient miqu to l3 8b. It's worthless.
>>
>>101254251
I zeroed everything out to test it as it is, those are basically no sampler settings, a good model does not need special settings imo, they just make a good model slightly better.
>>
>>101254263
Not a single other local model has had gemma's fandom knowledge. Not just that fandom either. And its prose is a step above everything else ive used including wizard. and commandr+ is retarded compared to it.
>>
>>101254263
>l3 8b
would trip on random sentence and start repeating itself til the end of context
>>
>>101254280
"whether companies like AMD and Nvidia will add CXL support to their GPUs remains to be seen"
>>
>>101254196
3.2?
>>
>>101254277
you can replace Tempest and Twilight with Mario and Luigi there and nothing will change
>>
https://github.com/microsoft/MInference/tree/main
what's the verdict
>>
>>101254313
You clearly didnt read it. Also Ive been swiping away and even cranked up the temp, its fandom knowledge is deep. Tons of stuff the card had nothing at all about in it. Same with some other fandoms I enjoy. Its the first model since claude worth using that isnt claude imo due to that and its combination on prose and smarts.
>>
>>101254205
For some reason my gemma is being retarded and won't retain the chat formatting presented in context.
A shame because the stuff it comes up with is interesting, but ruined formatting breaks my boner.
>>
its time for fine-tunes and merges with GEMMA-2, go ahead people!
>>
>>101254328
Maybe you are using one of the ten thousand broken quants. I think it was requanted like 10 times in the past few days. That or not the latest build of whatever backend.
>>
What is the absolute smallest multimodal LLM that I can try on my own hardware?
>>
>>101254341
All the latest stuffs i think, latest koboldcpp build from Nexesenex
And bartowski quants from yesterday...
>>
File: 1719466777182408.jpg (109 KB, 720x713)
109 KB
109 KB JPG
What do I need to run Gemma 9B? Does koboldccp run it? Anything else?
>>
>>101254366
Are you trying to go above 8k context? its sliding window attention is not working in any backend yet so it will fuck up if you go above that most likely.

And the pre/suffixes should be

<start_of_turn>user

<start_of_turn>model

<end_of_turn> for both

<bos><start_of_turn>model for first assistant response or perhaps the <bos> should be on your first response if you start the chat
>>
>>101254326
it's retarded
it doesn't follow instructions

if all you need is glints and mischiefs prose slop, it's ok, but so is literally every other model.

>>101254341
>>101254366
i quant myself from safetensors with self compiled llama.cpp, it's not the quants, it's the 27B of parameters being over 2.5x less than 70B
>>
>>101254389
>it's retarded
>it doesn't follow instructions
example?

>>101254387
Oh, and make sure not to use flash attention, its not compatible with it.
>>
Yea, if you start the chat with a user message make sure to include the <bos> token before it

as shown on their page

<bos><start_of_turn>user
Write a hello world program<end_of_turn>
<start_of_turn>model
>>
>>101254404
I give it a set of 7 onomatopoeia sounds with explanations for each sound. Qwen2 uses them correctly. L3 uses them correctly. Gemma just picks random shit that doesn't match the actions.
>>
>>101254430
could you give me the prompt? wanna make sure its not just your settings.
>>
>>101254423
Wouldnt the settings in >>101254131 be wrong then? Is there even a setting in silly for that?
>>
>>101254433
my prompt is identical between all models, except for user/assistant/system tokens, and it's correct, no need to resort to "you are holding it wrong"

smart models can work with new concepts that you explain to them. Dumb models cannot, and can only regurgitate their training data with a minimal prompt
>>
>>101254456
I dont think there is, it think it expects it to start with the card's intro first.
>>
>>101254460
Except the formatting does matter massively and having the wrong formatting makes most models that were fully / mostly trained on one fail if not followed.
>>
Gemma fix status?
>>
Update on Wizard 8x22 tune on limarp:
I overbaked it, loaded 2 epoch adapter instead of a 3 epoch one and it is much less retarded.
>>
File: Ummf.png (87 KB, 1279x847)
87 KB
87 KB PNG
>>101254430
Something like this?
>>
File: Give me a test.png (36 KB, 1278x396)
36 KB
36 KB PNG
>>101254511
Someone give me a real test
>>
>>101254293
Indeed. Haven't tried 3.3, but all that's new is the context scaling, and 8B models don't like that.
>>
File: Gemma27BTest2.png (413 KB, 1878x2976)
413 KB
413 KB PNG
Checks out to me.
>>
>>101254585
it works?
>>
>>101254470
why are you so insistent on riding pichai's dick?

it fails on the first message with

<user prefix>
2k tokens preamble
user: do thing
<user suffix>
<assistant prefix>
char:

>>101254511
no, this is primitive as fuck. The concept is: here are the sounds that accompany various actions: <list of sounds>. Use them like <formatting, rules of usage, frequency, etc>. Now perform an <activity that has these actions>. Narrate it from first person perspective.
>>
When is the problem of time based random number generation in LLM's going to be solved?
It's beyond frustrating to watch how different the output quality is depending on the day.
Some days it gives much better outputs, while other days the outputs are unbearable.
Same model, same settings, same everything, yet such drastic difference from day to day...
>>
>>101254386
ollama can run it right now
some Gemma 2 fixes are already in llamacpp, not sure if koboldcpp has merged them all already
To run 9B at all, any good-ish recent cpu with ~16GB of RAM is already enough to run it at acceptable speeds desu
>>
File: Gemma27BTest3.png (386 KB, 1250x4954)
386 KB
386 KB PNG
>>101254594
Yep, even expanded upon it flawlessly. The magic attack just resets the enemys position though. This is with samplers zeroed out btw. Actually did better than wizard

>>101254637
wtf is picachu? did he make gemma? And if your really using that then it is incorrectly formatted.
>>
>>101254647
>time based random number generation in LLM
No such thing. It's all in your head.
>>
How is Gemma prose compared to 1) llama-3 2) Command-R?
>>
>>101254656
Impressive, didn't expect google to actually give us the best local model, life is really unprecditable at time, and that's why we love it kek
>>
>>101254661
Fuck off developer, it's there. Anyone who roleplays with their model daily knows it's there.
It's a problem just like everything else you are in denial about.
>>
>>101254673
Dude... on launch you get a seed and the next number is deterministic. It doesn't matter if you get it in the morning or the afternoon.
>>
>>101254647
>>101254673
what are you talking about? maybe in some days you got unlucky with bad seeds but if you want to prevent that, make the outputs more deterministic by decreasing the temperature
>>
>>101254669
me neither, their stuff has sucked up till now.
>>
Is it fixed yet?
>>
>>101254707
That just proves my point. The seed is time based, and the seed is part of the random generation, and you can get super bad seeds.
>>
>>101254719
It is at least working on kobold as long as you stay within 8k context. maybe one day someone will figure out the sliding window attention thing out.
>>
Seems like Gemma2 21b is really sensitive to small stuff like the bos token.
The settings people posted here greatly improved the output.
Its good. At least for a vramlet like me. Don't know about 70b but its better than anything sub 30b until now for sure. Its not even close.
The model is more "here" if that makes sense. Like it gets the situation more and whats wanted of it even if not directly prompted.

Kinda weird that google output a model less cuked than llama.
Hopefully that serves as a kind of competition. If llama4 is totally cucked again its just embarassing.
>>
>>101254722
>The seed is time based, and the seed is part of the random generation, and you can get super bad seeds.
yeah, you can get bad seeds, that's a fact, but if your model is not consistent in quality enough, decrease the temperature, that's my point
>>
>>101254733
>Kinda weird that google output a model less cuked than llama.

This is the part that surprised me the most. If you lead with any sort of prompt / story it is completely uncensored. Never expected that from google of all places. And it for sure contains at least some of fanfiction / archiveofourown / some other fandom sites + perhaps smut sites in it like claude did.
>>
>>101254722
>The seed is time based, and the seed is part of the random generation
you won't get the same seed at 12am if it's a mondy or a tuesday, the seed system is complex enough to prevent that
>>
>>101254722
It doesn't. It's the one thing you barely understand, so you blame it on that. It has nothing to do with the time of day. Bad seeds just exist. There's billions of possible seeds and your anecdotes are insignificant.
It's all in your head. Confirmation bias blablabla
>>
>>101254740
It's a problem that no sampler settings is going to fix. Your whining doesn't answer my original question.
>When is the problem of time based random number generation in LLM's going to be solved?
It's a problem, so quit being in denial about it. Something has to be done about it.
>>
>>101254757
>This is the part that surprised me the most. If you lead with any sort of prompt / story it is completely uncensored. Never expected that from google of all places.
I think it's because google hasn't been scrutinized like Meta did, it's easy to be brave when not enough people ask you to bend the knee, but yeah, even so I didn't expect google to be kinda based, it's weird as fuck, did something happen internally?
>>
>>101254773
There is no problem. There's nothing to be done about it. You're a schizo.
>>
>>101254781
maybe they saw how badly over censoring their models was hurting them and the higher ups said fuck it.
>>
>>101254773
>Your whining doesn't answer my original question.
in a sense, LLMs act a lot like humans, some people have good and bad days, one day I can spend a whole day scratching my head on a code problem without solving it, and the next day I solve it in 10 minutes. Of course you don't want LLMs to be as inconsistent as us, but for the moment it is what it is and the only solution is to decrease the temperature, your WHINING won't change that fact
>>
File: gemma2 template.png (187 KB, 751x947)
187 KB
187 KB PNG
is this right?
>>
>>101254791
yeah, everyone know cucking a model makes it shit, but still, they would've made their API model based, not making it local, so something's still weird I guess
>>
>>101254801
That and the <bos> at the front of either the model or user's first response that begins the entire chat with no space between it and <start_of_turn>
>>
>>101254757
Chances are that if you ask the model's OOC persona to explain or give an opinion on something outrageous, it will complain hard. I think they made a good job in making the the model understand when something is fiction/roleplay and when it might be a genuine "harmful" request. Llama3 will outright refuse explicit content in any situation if you use the standard user/assistant roles in the prompting.
>>
>>101254860
>I think they made a good job in making the the model understand when something is fiction/roleplay and when it might be a genuine "harmful" request.
Thank god they finally went to that conclusion, I was going crazy thinking that those people think that letting people play violent video games is ok but having naughy text is a no no at the same time
>>
https://github.com/abetlen/llama-cpp-python/commit/01bddd669ca1208f1844ce8d0ba9872532641c9d
They bumped the version 18 hours ago, does that mean we can use it to run gemma models or is it still to early?
>>
I updated ooga webui, and now my gguf HF models load into RAM instead of VRAM and completely ignore my settings.
Which recent update works fine?
>>
>>101254257
I think the way it is gonna go is that there is already a closed office of chinks working overtime directly for CCP to reverse engineer nvidia stuff. This is gonna be the first huge government sponsored hardware mod that will eventually leak to the public.
>>
>>101254757
Yes, there is not much needed at all to make gemma2 go into any direction you want.
LLama3 is especially not happy with loli.
I'm sure people will call me promptlet but even with a fill in text "Yes sure: " I got "Actually lets not". lol
And it desperately tried to sneak away from anything naughty. Makes you feel bad.
>>
>>101254917
They've utterly failed all the way till today but I wish them the best. Its harder to reverse engineer that it is to build from scratch though and they clearly cant do that. The hope they have is to outright steal design plans.
>>
Yeah, i dunno what's wrong with my gemma, will wait a few more days and redownload everything.
>>
>>101254940
I would pull latest whatever backend, redownload latest quants, try latest templates posted above. Make sure not to forget the <bos> token, it legit adds like 100 IQ points.
>>
>>101254818
Shouldn’t that be in the system message? Or will that be skipped sometimes if it goes beyond context limit?
>>
>>101254981
I'm just following the given prompt format. Start the chat with <bos><start_of_turn>user/model depending on which it is.
>>
I guess llama4 will still take a while, same for qwen, yi went proprietary, mistral not sure, cohere wants to make smaller models, deepseek moes are too big, grok sucks, so gemma might be the best for quite some time?
>>
>>101254995
Right but they don’t support system messages. I wonder if the bos token ends up after the system message. that’s probably really bad.
>>
>>101254938
>The hope they have is to outright steal design plans
But that is how they do it? At least in my globohomo corpo. It opened factories and design centers in asia 10-15 years ago and now all those people who got all our drawings and intellectual property have opened their own company that sells the same thing cheaper. It really is chinks selling the noose to the west.
>>
>>101255007
And their current cpu / gpus are utter garbage.
>>
From my testing of gemma2 I can only agree. Comparing to llama3, gemma is clearly less cucked. Also, it leads RP in a more nuanced and believable way. It seems to follow char card better. It's not like a HUGE jump or something like that, but it's better overall. Of course the prose is still filled with slop, but it's not like llama3 wasn't, so I can't complain too much. It's surprising but google really delivered...
>>
>>101255054
Is it fixed for good?
>>
>>101255054
>Of course the prose is still filled with slop
But its claude slop instead of gpt slop, a massive improvement. The night is still young.
>>
>>101255067
Well, for me it seems to work just fine.
>>
>>101255067
as far as generation goes i think so but i don't know about swa
are we still limited to 4k ctx effectively?
>>
File: file.png (230 KB, 576x968)
230 KB
230 KB PNG
>70B+ model comes out
>/lmg/ collective IQ goes up
>vramlet model comes out
>/lmg/ turns to /aicg/
>>
>>101255127
take your meds schizo
>>
>>101255127
Ah, welcome back! i haven't seen you since mistral-7b-0.1 released!
>>
File: file.png (18 KB, 1163x112)
18 KB
18 KB PNG
This downgrade completely ruined everything. Fuck you.
Ever since this pull, everything loads into RAM instead of VRAM.
git checkout anything before it (+ pip install -r requirements.txt) makes everything load into VRAM again.
Downgrading for inferior shit nobody uses is an even bigger fuck you.
>>
>>101254726
>stay within 8k context
But is is supposedly 4k without sliding window and 8k with?
>>
>>101255284
>somebody please do something!
Is there an issue for that on their repo? If not, you could open one.
>>
>>101255284
Kobo won
>>
>>101255382
Kobo waits
>>
>koboldcpp, CPU, DDR5
>compiled with openblas support as well as GPU support
>70b llama3 or miqu model Q4_K_M
>default gives ProcessingSpeed: 1.03T/s GenerationSpeed: 1.59T/s
>--noblas gives ProcessingSpeed: 6.39T/s GenerationSpeed: 1.59T/s
>35B command-r Q4_K_M
>default gives ProcessingSpeed: 2.16T/s GenerationSpeed: 2.42T/s
>--noblas gives ProcessingSpeed: 15.18T/s GenerationSpeed: 2.74T/s
>8B llama3 Q5_K_M
>default gives ProcessingSpeed: 10.10T/s GenerationSpeed: 10.64T/s
>--noblas gives ProcessingSpeed: 44.95T/s GenerationSpeed: 10.63T/s
Is openblas deprecated and useless? Probably shouldn't be the "default" any more.
>>
https://huggingface.co/drexample/slopstral-8x22-adapter-limarp here's a fucked up limarp adapter I made for Wizard Mixtral

If anyone can tell me where I fucked up, I'd really appreicate it (or if you get it to produce good outputs)
>>
>>101255564
It's probably time to move on from limarp, that's what happened. It was never meant to be a universal end-of-game dataset for ERP finetuning.
>>
>>101255539
Seems like the default settings use blas if it was compiled in, which runs on CPU. Do you see GPU activity when processing with the default options?
Have you tried compiling without blas and GPU?
Either way, it just gives priority to blas, apparently. Just compile without it.
>>
>>101255578
Eeeh, it's less about quality and more about me fucking up with the tuning, I fear. I have a separate dataset, but wanted to train on something public and tried.

The length adapter thing limarp has going for it works consistently, but I experience issues with coherence and obedience, something never seen in the previous model or other limarp tunes.

Plus it has a shitty tendency to just repost its own posts verbatim from time to time, which is even worse.
>>
>>101255564
I would guess you should've used a different learning rate.
>>
>>101255590
yeah it's CPU only, just mentioning the GPU because unlike llama kobold can typically have multiple backends enabled at once.

not compiling with LLAMA_OPENBLAS=1 fixes it of course, I was just wondering if there's anything I'm doing wrong or if there's some obscure case in which openblas can outperform the current llama CPU code.
>>
>>101255658
Used same as the popular mixtral limarp tune. It's a different MoE size, sure, but I didn't have much reference.
>>
>>101255603
I don't see anything glaringly wrong in the config and the training parameters. If the adapter finished training when the eval loss reached its minimum, that's about the best you can get in terms of maximizing learning. Lack of obedience could be due to excessive influence of the limarp finetuning, since training on *just* ERP logs is not conducive to high intelligence. Try the adapter from epoch 1.
>>
>>101255680
Tried, and it *is* better and experience much less retardation, but that's not saying much, sadly.
>>
is oogabooga + sillytavern still the meta?
i hate both
>>
>>101255367
>their
that's a dude my man
>>
>>101255564
btw, you can get epoch 1 (so far the most coherent iteration) using this in ooba:
drexample/HTBSLOP:3a5f23fe52cccd30bb8fde1c6c5658a8c345074b
>>
>>101255737
I don't know how many people maintain it. I assumed it was more than one guy. I used "their" in the classical plural third person form.
>>
>>101255762
it's only booba who works on this repo and no one else so yeah
>>
>>101255775
Fair enough.
Isn't it weird how your brain works when reading? Certainly you didn't make your comment to correct the number of people involved, but specifically about his sex.
I get it. Sometimes I'm overly sensitive to those things as well.
>>
File: file.png (707 KB, 702x1000)
707 KB
707 KB PNG
when?
>>
>>101255798
>Isn't it weird how your brain works when reading? Certainly you didn't make your comment to correct the number of people involved, but specifically about his sex.
I had no idea if you used "they" as the woke type or as the fair plural type so I wanted to be sure, looks like you're not mentally ill so that's good kek
>>
>>101255819
go back
>>>/pol/
>>
>>101255809
You'll never get a real meat boyfriend, Gemma2-27b. Stop asking.
>>
>>101255827
go back
>>>/lgbt/
>>
>>101255809
If I had to spitball a number, in 20 years, max.
>>
https://huggingface.co/collections/internlm/internlm25-66853f32717072d17581bc13
Nihao
>>
File: file.png (97 KB, 719x409)
97 KB
97 KB PNG
with the new phi3-mini update it's pretty much better than llama 3 8b now, insane..
>>
>>101255809
never ever if you want your robot waifu without any modern politics, muh consent, rape or whatever feminism shit built-in.
>>
Whats the difrence between iq and q quants? I can bearly un gemma27bon a rtx 3060 with iq4xs with 2t/s
>>
File: 1699598546476406.png (28 KB, 753x400)
28 KB
28 KB PNG
>>101255862
bullshit
>>
>>101255693
Another possible thing to do is increasing the number/percentage of evaluation samples, since it's possible that computing the loss on a small number of samples might give a very skewed dataset representation.

You can get away with using a small number of samples with gptslop tunes, since they basically all use the same wording, but the limarp dataset is relatively heterogeneous in prose quality, so you should probably use more.

This being said, finetuning on only erp logs is likely to be counterproductive. The Alpaca completion format might also not work well with the model for a number of reasons.
>>
>>101255877
I think it's like IQ are better at low Q than parallel Q_K, but I think they take more compute to infer.
>>
>>101255873
Yeah, I was testing it yesterday and it's pretty solid for RP.
I never expected I would see the day this would be possible, a 3B good at RP...
>>
>>101254196
*because you're a shill
>>
>>101255578
>Written words suddenly age for /lmg/.
Have you never read a book in your life?
>>
>>101255960
That was a comment on dataset quality and efficacy with the latest LLMs, not prose quality.
>>
>>101255999
He didn't "fuck up" because of the dataset, there are other fine-tunes that were fine. Are you stupid?
>>
File: file.png (191 KB, 233x775)
191 KB
191 KB PNG
>no robo gf
>>
>>101256086
>no oxytocin
>>
>>101256086
That's a child
>>
>>101256143
out of 10
>>
>>101256147
You are so original! /s
>>
>>101256156
nice reddit faggot
>>
>>101256143
It's a drawing of a robot
>>
>>101256176
A robot that looks like a child for some stupid reason. Could be literally anything, any design, but no, we'll go with generic small anime girl instead to please some "ironic" pedophiles on 4chan.
Some anons saying slant-eyed bug's "creativity" is soulless, i get it now.
>>
File: drawing1.png (1.44 MB, 1408x1024)
1.44 MB
1.44 MB PNG
>>101256239
it's time to go back
>>
>>101256239
There's literally nothing sexual about the drawing.
>see child
>assume sex
>call everyone else a pedo
Really makes you think.
>>
>>101256239
>stupid reason
children are lightweight and easy to carry around.
>>
File: maxresdefault.jpg (48 KB, 1280x720)
48 KB
48 KB JPG
>>101256339
SHE WAS ONLY 48.4375 EM TALL YOU SICK BASTARD
>>
File: file.png (417 KB, 1036x486)
417 KB
417 KB PNG
> <3
>>
Is there any hope that llama.cpp's bespoke flash attention 2 implementation can be made to work on gemma 2?
Or at least that cache quantization is decoupled from FA?
>>
>>101256405
nai
>>
Scheiße
>>
Well gemma 21b passes the watermelon test. Thats gotta count for something.
Bitch was trying to put them all down at the beginning so I inserted some text that she needs to hold it.
>>
>>101256405
>Is there any hope that llama.cpp's bespoke flash attention 2 implementation can be made to work on gemma 2?
I didn't look into what Gemma 2 does specifically but I don't see why it wouldn't work.

>Or at least that cache quantization is decoupled from FA?
That is definitely possible, I had a PR for that in September of 2023.
But it was never merged because with the non-FA KV cache memory layout the number of necessary changes was judged to be too high.
Quantization of K already works without FA, to make V work the way to do it would be to add support for the different memory layout.
But that would require some work to do correctly and it would not be possible to use MMQ for the quantized V cache so the amount of VRAM saved would be lower.
>>
>>101256472
Ahhhhhhh, so close and yet so far.
>>
>>101256491
>I didn't look into what Gemma 2 does specifically but I don't see why it wouldn't
Currently, it doesn't work due to the whole loggit capping business as far as I can tell.
>https://github.com/ggerganov/llama.cpp/pull/8197/commits/f4424c150f1181cb8b2f2cb6a700c821c789facc

>That is definitely possible, I had a PR for that in September of 2023.
>and it would not be possible to use MMQ for the quantized V cache so the amount of VRAM saved would be lower.
I see.
Well, as long as FA working with it is possible, then that wouldn't be an issue or necessity to begin with.
>>
Gemma 2 9b is really cool for language things, I think? Although it's judgement might be a bit too forgiving lol

>>> Prompt me to write sentences for common questions i would ask when traveling in Japan and judge if they're understandable

Okay, let's put your Japanese skills to the test with some travel scenarios!

Try writing simple Japanese sentences for these common questions you might ask while traveling in Japan:

1. Where is the nearest restroom?
2. How much does this cost?
3. Can you speak English?
4. I'm lost, can you help me?
5. What time does this train leave?


Remember to keep it simple and focus on clear communication. I'll check your sentences and let you know how
understandable they are!

>>> 1: chika no toire wa doko desu ka?, 2: koko mono wa ikura desu ka, 3: eigo wo hanasemasuka, 4: ora wa ushinau, tesu
... kete kudassai, 5: densha wa itsu - don't know this one
You're doing great! Let's break down your sentences and see how understandable they are:

1. **ちかのトイレはどこですか? (Chika no toire wa doko desu ka?)** - This is excellent! It's a clear and polite
way to ask for the nearest restroom.
2. **このものはいくらですか? (Kono mono wa ikura desu ka?)** - Perfect! This accurately asks about the price of
something.

3. **英語を話せますか? (Eigo o hanasemasuka?)** - Spot on! This is the standard way to ask if someone speaks
English.

4. **俺は迷子、手伝って下さい。(Ore wa mayogi, tetsudatte kudasai.)** - While understandable, it could sound a
bit blunt or informal. A more polite option would be: **「迷子です。お手伝いしていただけますか?」 (Mayogi desu.
Otetsudai shite itadakemasuka?)**

5. **電車は何時? (Densha wa itsu?)** - This is grammatically correct, but it's a bit too casual. A more polite
and clear way to ask would be: **「この電車は何時に出発ですか?」(Kono densha wa nan-ji ni shuppatsu desu ka?)**
which means "What time does this train leave?"


Overall, you're doing a fantastic job! Keep practicing, and your Japanese will continue to improve.
>>
BUY AN AD
>>
>>101256710
>俺は迷子
>Ore wa mayogi
lol, even lmao
>>
>>101256810
I did
i'm paying people to post here
>>
>>101256514
Although, it has moderately good multilingual capabilities in all languages I tried (including Japanese), Gemma 2 still "thinks" in English.
>>
>>101255004 (me)
Koboldcpp adds the <bos> automatically.
>>
>>101256888
The new 22b model from japan fails it as well.
https://huggingface.co/spaces/cyberagent/calm3-22b-chat-demo
Does pass the 妻/奥様 test though.
>妻へのプレゼントのアイデアがほしいです!
>もちろんです!奥様に喜ばれる素敵なプレゼントのアイデアをいくつかご紹介しますね:
>>
>>101256945
there is no gguf of calm3-22b-chat right?
>>
>>101255885
Yeah, sounds fischy, looks like we have to try it by ourselves to be sure it's as good as gamma2-9b
>>
>>101256472
>Well gemma 21b
you mean 27b?
>>
>Ask a bunch of models to write stories in Japanese on Lmsys
>Invariably, the prose is boring
>Ask them to do it in Russian
>Almost as engaging as English
Has anyone else tried this? It's amazing how different different languages feel. I wonder if Japanese storytelling/novels are particularly boring, or if there just aren't many good JP datasets? It just felt rudimentary.
>>
>>101257262
>I wonder if Japanese storytelling/novels are particularly boring
yes, Japanese is a boring and retarded language.
>>
someone do expert merging/pruning on deepseek coder v2 so I can run it on my machine, thanks
>>
>>101257303
Ok sir! Consider it done.
>>
It must be fixed by now right?
>>
>>101256810
That only applies to finetunes.
>>
so is sliding window hacked in? is it 8k ctx finally and you can base rope frequency it to be higher?
>>
Gemma is a meme, I'm starting to unironically think Google paid shills to gaslight /lmg/.
>>
https://huggingface.co/internlm/internlm2_5-7b-chat-1m
>State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
>1M Context window: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench.
Are we back?
>>
>>101257520
>Nearly perfect at finding needles
WOW! it's fucking nothing!
They should either post RULER results or stfu about context length.
>>
>>101257511
I still want to try it. Ooba support when.
>>
>>101257511
>Gemma is a meme, I'm starting to unironically think Google paid shills to gaslight /lmg/.
Gemma missed my music theory test question. After so much build up I had some optimism that it might know how to play a scale but no.

I guess it might be okay as a general purpose for the 32GB crowd but for half the token rate I can get Llama 70b vanilla Q6_K or a decent spin on it that passes the music test.
>>
>>101257612
I'm gonna buy a machine with those those P40s one day and then I will also be able to use 70Bs.
>>
>>101257612
It's very tiresome to have all these posters talking about how they don't care about small models because they run 70B on their multigpu autism box.
Good for you, enjoy it, but try to realize that other people might want to use models that they can run at a decent speed. If the discussion annoys you, just tune out anything about models <1/2 the max size you can run, I do the same with 8B models, bigger is always better.
>>
>>101257630
I'm a single 4070. But I went for 64GB ram so I can get ~1t/s on quants in the mid 50's. Usable, just no instant gratification.
>>
Here's a summary of the key topics discussed in the thread, along with relevant post links:
- **LLM model performance and comparisons:** Anons discussed the relative merits of various LLM models, including Gemma 2 (>>101244446), Mixtral (>>101245070), and various versions of Llama (>>101248512).
- **Quantization and its impact on performance:** The thread contains numerous discussions about quantization methods (>>101245221 >>101245295), including arguments about the benefits of different approaches and the alleged "scam" nature of certain methods (>>101245970).
- **Creativity and teaching creativity to LLMs:** Anons explored the difficulties of training models to be truly creative (>>101244179 >>101244387) and discussed whether this might require fundamental architectural changes in LLMs (>>101244553).
- **Prompt injection and its risks:** The potential dangers of prompt injection attacks (>>101246827 >>101246898) were discussed, with anons recommending caution and suggesting ways to mitigate the risks (>>101246931).
- **Ollama and alternative LLM interfaces:** The reliability and user-friendliness of Ollama (>>101246229) were questioned, leading to discussions about alternative interfaces like llama.cpp (>>101246285) and Kobold (>>101246955).
- **Context length and limitations:** Anons discussed the challenges of extending the context length of LLMs (>>101248149 >>101248272), noting that current methods often come with tradeoffs in performance (>>101248331).
- **"Shivers barely above whispers" and repetitive outputs:** Many anons complained about the tendency of LLMs to produce overly similar or "generic" outputs (>>101249198 >>101249560), often attributing this to the models' reliance on training data and the use of certain prompts (>>101249299).
- **The "petra" meme:** A recurring joke in the thread revolves around an anonymous user named "petra" (>>101243156), with several posts mentioning them in a humorous or mocking manner (>>101247570 >>101247477).
>>
>>101257749
made with petra-13b-instruct btw
>>
>>101252075
Cautiously declaring the gemma2-9b-daybreak LoRA a success, but more testing needed.
>>
File: p*tra.png (280 KB, 640x360)
280 KB
280 KB PNG
>>101257767
>made with petra-13b-instruct btw
>>
>>101257770
faipl-1.0 or GTFO!
>>
>>101257695
This
>>
Can any of you guys point me to LLM benchmarks for the Nvidia Orin AGX?


A business near me is liquidating and selling them for $600/piece, so I'm thinking about picking a few up.

It's actual GPU is shit, but it's got 64gb of VRAM and some sort of special accelerator.
>>
>>101257815
> YAML Metadata Error: "license" must be one of [apache-2.0, mit, openrail, bigscience-openrail-m, creativeml-openrail-m, bigscience-bloom-rail-1.0, bigcode-openrail-m, afl-3.0, artistic-2.0, bsl-1.0, bsd, bsd-2-clause, bsd-3-clause, bsd-3-clause-clear, c-uda, cc, cc0-1.0, cc-by-2.0, cc-by-2.5, cc-by-3.0, cc-by-4.0, cc-by-sa-3.0, cc-by-sa-4.0, cc-by-nc-2.0, cc-by-nc-3.0, cc-by-nc-4.0, cc-by-nd-4.0, cc-by-nc-nd-3.0, cc-by-nc-nd-4.0, cc-by-nc-sa-2.0, cc-by-nc-sa-3.0, cc-by-nc-sa-4.0, cdla-sharing-1.0, cdla-permissive-1.0, cdla-permissive-2.0, wtfpl, ecl-2.0, epl-1.0, epl-2.0, etalab-2.0, eupl-1.1, agpl-3.0, gfdl, gpl, gpl-2.0, gpl-3.0, lgpl, lgpl-2.1, lgpl-3.0, isc, lppl-1.3c, ms-pl, apple-ascl, mpl-2.0, odc-by, odbl, openrail++, osl-3.0, postgresql, ofl-1.1, ncsa, unlicense, zlib, pddl, lgpl-lr, deepfloyd-if-license, llama2, llama3, gemma, unknown, other, array]
Doesn't look like HF supports it.
>>
>>101257749
ywnbara
>>
>>101257867
Not that anon, but there's an option to add a custom license I think.
>>
>>101257867
select other, add license-name:faipl-1.0
add LICENSE file with https://freedevproject.org/faipl-1.0.txt as contents
>>
>>101257883
>>101257867
license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/
>>
>>101257520
Our hero, 0ww, is first with the quants.
>>
>>101257871
reddit asshole?
>>
>>101257879
>>101257883
>>101257896
Cool, done.
>>
File: file.png (128 KB, 1807x1075)
128 KB
128 KB PNG
What's the best model I can use?
>>
>>101257919
Define best.
>>
>>101257929
Optimal speed not too slow but also big token count and smart?
>>
>>101257917
b-based.. I take it that you want to license other repos under it aswell? I could open the necessary PRs for You.
>>
sliding window when? https://github.com/ggerganov/llama.cpp/pull/8227#issuecomment-2200466793 is this sliding window now? I don't want to load everything again to find out llamacpp is still permanently bugged.
>>
>>101257919
Phi 3.1 mini
>>
I optimized an algorithm to clean datasets from taking 145 hours to just 6 hours. Feels good.
>>
>>101257917
enabler
>>101257942
mentally ill
>>
>>101257749
Was that one-shot? What's the prompt?
>>
>>101257961
Eh. The license seems reasonable enough to me. I'll probably use it for future models.
>>
>>101257935
>Optimal speed not too slow
Alright.
You have a couple of options that tradeoff smarts for speed.
Usually, the bigger the model the smarter it is.
Since you have tons of RAM and not that much VRAM, you'll want llama.cpp + gguf formatted models.
Try :
>Stheno v3.2
>Gemma 9b
>Mixtral 8x7v limarp zloss
>Qwen 2 57B 14A
>Gemma 27B
>Command R
You'll have to find your own personal sweet spot.
>>
>>101257919
You can run 70B models as long as you quant down to the mid 50's. I've pushed as far as 58.4 but I can't have anything else using much memory because every gig counts when the file cache is that big.

Llama 3 spins and CommandR+ are probably the best ones in this weight class. Qwen is there but I haven't been impressed and it's fussy about settings and Tienanmen Square.
>>
>>101257979
Yes. Copy-pasted the thread sans the OP and added a short prompt.

Anonymous 07/03/24(Wed)03:52:13 No.101250343>>101250423

>>101249196
What is a VAE?

Anonymous 07/03/24(Wed)04:01:57 No.101250423

>>101250343
It's a magical thingie that is run with Stable Diffusion image generation. If you don't have a VAE you're vulnerable to pale, washed out colors, errors in details, and occasionally odd color blotches.

Anonymous 07/03/24(Wed)04:09:48 No.101250482

>>101250468 (Cross-thread)
>>101250468 (Cross-thread)
>>101250468 (Cross-thread)

Anonymous 07/03/24(Wed)04:52:02 No.101250821
Capture.png (156 KB, 642x543) google yandex iqdb wait
156 KB

---

Task: summarize the most important things in the thread. Include links to posts (like this: >>101250482 >>101250423). Only include things that were discussed by multiple posts. Make it just a flat list with simple descriptions:

- Thing anons discussed (>>111 >>222 >>333)
- Another thing (>>555 >>66666 >>777)


But this is a corpo closed LLM. 42k tokens.
>>
>>101258016
>corpo closed LLM
HERETIC!
Get the torches and pitchforks.
>>
>>101257984
No big company cares about your sloptunes. At best some ai gf site will use it. And if they do you won't be aware of it. And if you will be aware of it what are you gonna do about it? Go to court?
>>
>>101258032
Wish I had VRAM to do this locally.
>>
>>101258056
Someday.
>>
Could somebody compare gemma 9b it to glm 4 9b chat?
>>
>>101258106
they're only 9b so you can do it yourself on whatever toaster you can find
>>
>>101258038
> sloptunes
Libel lawsuit time.
>>
https://us.moshi.chat/?queue_id=talktomoshi
>>
licenses are a huge meme. no one is stopping me from using AGPL code in my backend.
>>
>>101258193
I apologize for that mistake on my part. I was talking about your allegedly slopped alleged tune.
>>
>>101257915
yes you are.
>>
>>101258232
This, just change it a bit and say that you wrote this implementation yourself. If it worked in high school, it'll work here as well.
>>
File: file.png (308 KB, 2848x1503)
308 KB
308 KB PNG
>first to add support before anyone else
>even with an icon prepared for the release
>they still haven't pulled any of the llama.cpp fixes
https://github.com/ollama/ollama/issues/5428
I'm still laughing.
>>
>>101258383
why do you expect them to update everyday?
>>
>>101258419
For gemma, yes, it's not everyday we got a great 27b model in our hand
>>
is gemma2 27b better than llama3 70b? i need a verdict NOW
>>
>>101258445
Gemma 2 27B failed my tests that vanilla L3 70B and many of its spins passed.
>>
>>101258383
https://github.com/ggerganov/llama.cpp/pull/8244
>last gamma fix was 2 days ago
does that mean we can use llama cpp python on booba now?
https://github.com/abetlen/llama-cpp-python/commit/01bddd669ca1208f1844ce8d0ba9872532641c9d
>>
>>101258383
Why do they use pedobear as their logo?
>>
>>101258507
looks like lolilama to me
>>
>>101258507
they know what people use these things for
>>
>>101258226
>terms and conditions page 404s
>reveals that it's on an nginx/ubuntu back end
Not that there's anything wrong with nginx and ubuntu but that tells me that whoever running it is too unofficial to actually deliver the experiencing it is promising. So fuck if I'm going to give them my email just to "queue up".
>>
>>101258507
Kuma (I think, it's been a long time) was a 2ch meme character that was apporpriated and used for nefarious purposes. Kinda like the swastika.
>>
>>101258445
Yes. It blows away anything else atm for creative writing, one shotted some coding stuff. Scroll up for settings that work and make or break the model.
>>
>>101258576
>>101258576
>>101258576
>>
File: file.png (221 KB, 586x661)
221 KB
221 KB PNG
>>101258546
>Kuma
>>
>>101258540
You don't need to give the email, unless that was a bug on my end.
>>
>>101256810
are you paid by hiro to post this shilling everywhere? we ain't buying your shit ass ads, hirokike
>>
>>101258445
Yes.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.