[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101337910 & >>101328074

►News
>(07/10) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101337910

--Optimal Format for Storing Bitnet Weights: 8-Bit Integers vs Packed Parameters: >>101338865 >>101338912 >>101338929 >>101339006 >>101339019 >>101339044 >>101339262 >>101339361
--The struggle of quantizing Gemma 27b yourself due to potential model abandonment by developers: >>101342775 >>101342852 >>101342980 >>101343071 >>101343490
--The Future of AI Testing: Beyond Riddles and Tricks: >>101338043 >>101338083 >>101338231 >>101338325 >>101338486 >>101338544 >>101338717 >>101338738
--RTX 3060 vs RTX 3090: VRAM, Bandwidth, and CPU Speed Considerations: >>101338192 >>101339240 >>101339275 >>101339317 >>101339354 >>101339384 >>101339473
--Midnight-Miqu-70B-v1.5 MMLU-Pro Benchmark Evaluation: >>101342270 >>101342322 >>101342346 >>101342404
--Gemma, the Drama Queen, Devastated by Snacktime Burp: >>101344477 >>101344497
--Gemma 2 and its Position Embeddings (or Lack Thereof): >>101338712 >>101338753 >>101338781 >>101338803 >>101338821 >>101338847 >>101338914
--GPT-40 Performance Metrics and SenseNova 5.5: >>101342438 >>101342510 >>101342672 >>101342692 >>101342767
--LLMs vs Doctors: Navigating the Healthcare Landscape and its Challenges: >>101341176 >>101341404 >>101341523 >>101342192 >>101341526 >>101341633 >>101341790 >>101341928 >>101342101 >>101342221
--Correction: A100 SXM2 32GB GPUs in Teslas are likely SXM4 models, not engineering samples: >>101338900
--Anole: Open, Autoregressive, Multimodal Model for Interleaved Image-Text Generation: >>101344297 >>101344370 >>101344404 >>101341577 >>101344424 >>101344461 >>101344499 >>101344558 >>101344361 >>101344767
--PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings: >>101343326 >>101343358 >>101343392 >>101343860 >>101344262 >>101344296 >>101344440
--Miku (free space): >>101338589 >>101339395 >>101340732 >>101342095 >>101342772 >>101344926 >>101345079

►Recent Highlight Posts from the Previous Thread: >>101337920
>>
>use gemma-2-27b-it to simulate talking to my ex after 7 years
>she tells me to fuck off
>>
Mikulove
>>
>>101345838
Listen to your AI ex and move on, anon
>>
Return to nous-hermes-13b
>>
>>101345838
just gaslight the AI like you did her
>>
>>101345838
based non-positivity-biased model
>>
File: 1528433961994.png (401 KB, 559x638)
401 KB
401 KB PNG
>>101345838
>>
I don't really like gemma, but I must acknowledge it's one of the few models that doesn't horrendously fail the "kino and sovl" test (simply asking what it means for something to be kino and sovl)
>>
File: b3f8i9eg4sad1.jpg (22 KB, 736x663)
22 KB
22 KB JPG
Any opinions/links on the best context/instruct set for gemma 9b on sillytavern?
>>
>>101346220
aka zoomer ebonics test
>>
>>101346220
dayum bruh dat be bussin
>>
Just watched a streamer run a d&d campaign for 3 AI characters.
RP quality was shit cause GPT, but the interactivity of it all was very fun.
He set up his own custom front-end with TTS and STT, gonna start making my own version when I wake up.
Good night /lmg/
>>
>>101346383
Good night not-Miku
>>
File: 1712711490064930.png (353 KB, 640x517)
353 KB
353 KB PNG
write a somewhat complex scenario (Alice and Bob are long-lost relatives who are looking for each other while being romantically involved without realizing who they are)
8-16k tokens of slowburn
personalities and scenarios are developed differently each try
lots of hand-crafted text and the story is not allowed to degenerate into slop
when the big reveal comes, Alice is only capable of producing the exact same 3 canned reactions, almost word for word identical with previous tries

wizard 22x8 is smart enough to figure out the twist from a just few subtle hints, but it is incapable of producing anything but canned slop when push comes to shove without very persistent tard-wrangling
>>
>>101346220
Why don't you like it? Like you said, its kino, its smart, and adjusts to writing styles well (just use a famous / semi famous author)
>>
>>101346458
That's why i've switched to gemma since. Wizard is too plain / "goody" / robotic and commandr / miqu are too retarded to do non human anatomy right.
>>
Did they remove lolis from Chub? A lot of stuff is gone but many NSFW things are still there. I'm talking about the legacy site. I can't tell if they're deleting things intentionally or just incompetent and the site doesn't work correctly.
>>
File: file.png (16 KB, 800x600)
16 KB
16 KB PNG
I wonder what Gemma looks like
>>
How are people getting longer context with gemma in llama.cop? I tried -c 16000 but it just got extremely retarded.
>>
>>101346493
“As an AI I don’t…”
I killed it there to save compute.
>>
>>101346525
>How are people getting longer context
They aren't
>>
>>101346584
Oh I thought it got fixed last week.
>>
is gemma shit or just misunderstood?
>>
>>101346622
There's one guy who shits on every new model just to troll. Literally just try it. People have posted settings / logs the past dozen or two threads, there's also some stuff on reddit.
>>
https://github.com/catid/cuda_float_compress
>If your network link is faster than 10Gbps, then it may not be an improvement over just sending the file uncompressed since it compresses at about 12 Gbps. So, it's well-suited for most kinds of Internet transfers, but maybe less useful to send data between servers that are connected via 100G+ InfiniBand or some other supercomputer-class switched network. I'm personally planning to use this for distributed training on the Internet, so it's a better option for me than a faster CUDA-only approach that gets a worse compression ratio.
neat could be nice for federated training
>>
File: Untitled.png (417 KB, 720x1298)
417 KB
417 KB PNG
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
https://arxiv.org/abs/2407.07071
>When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.
https://github.com/voidism/Lookback-Lens
interesting if with this you could target hallucinations you don't want (made up history facts or locations) while keeping hallucinations you do want (model roleplay ability)
>>
>>101346637
it still fucks up the formatting, sad.
why is it so hard for it to place * and " in right places?
>>
also new lilian weng blogpost
https://lilianweng.github.io/posts/2024-07-07-hallucination
https://archive.is/NIm5r
>>
File: 1694863601116390.jpg (27 KB, 828x646)
27 KB
27 KB JPG
>>101345838
You're in love with your past, that person doesn't exist anymore
>>
>>101346383
>custom front-end
Not worth it. You'll spend at least 2 months on that shit to get 1/10th of ST options.
>>
>>101346963
ST options are lacking for seamless TTS / STT interactions
>>
File: Retard Apu.jpg (31 KB, 680x546)
31 KB
31 KB JPG
>OSError: [WinError -1073741795] Windows Error 0xc000001d

I'm retarded. Why does llama throw this error as soon as I try to gen? Running staging build of ST.

I don't have AVX2 on my CPU. Is that why?
>>
>>101346493
Cute and retarded.
>>
>>101347214
Who's more cuter and/or retarded
Stheno or Gemma?
>>
>>101346477
>non human anatomy
off yourself you mentally ill coomer
>>
>>101347330
you lost buddy?
>>
>>101347349
i'm lost if i don't share your deranged fetishes? this isn't reddit, you're not free from criticism here
>>
>>101347282
Buy an ad.
>>
>>101347197
Yeah you probably need to recompile it with those extensions turned off.
>>
Threadly reminder for P40 users to utilize the PState patch -
https://github.com/sasha0552/ToriLinux/blob/main/airootfs/home/tori/.local/share/tori/patches/0000-llamacpp-server-drop-pstate-in-idle.patch

Drops idle from 50 to 10W and improves temperature levels considerably.
Same automatic PState switching can be added to KoboldCpp as well by adding three or four lines to koboldcpp.py,
>>
File: 123df.jpg (39 KB, 500x360)
39 KB
39 KB JPG
>>101346239
pls resbond
>>
>>101347583
gemma is junk, nobody uses it
>>
>>101347583
there were some posted like 2 or 3 threads back, you should be able to find them pretty easily, just ctrl + f for catbox
>>
https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md
>FSDP1 -> FSDP2
neat didn't know they were making this
>>
New to this whole LLM thing. Seeing as I’m a vramlet, I just downloaded that Gemma 27b model and got it to run on ooba booga.This is some amazing stuff ngl. I might have to look at that silly tavern thingamajig you whippersnappers are using. Looking at past threads though, apparently this model breaks down when it goes past some token count? Truth to be told, I just tested it very briefly, like 609 tokens before I closed webui.
>>
>>101347660
609 is nothing. For me it goes haywire after like 3-4k tokens.But I'm still tinkering with settings andd such
>>
>>101347684

Can that be mitigated with being studious on updating lorebook and essentially resetting the chat?
>>
>>101347510
How many t/s do you get on a 70B?
>>
>ministrations
It never bothered me until someone pointed it out.
>>
File: 1720600723921561.jpg (41 KB, 889x849)
41 KB
41 KB JPG
Behold the future of AI.
>>
>>101347724
not sure how you want to update lorebook with info such as:
>{{char}} entered {{user}}'s house. {{user}} offered her some snacks, but she politely refused.
IG best you can do is to ask model to summarise story, then start a fresh one and inject the short version into chat.

Gemma really likes flowery prose, so it takes like 20-30 messages to reach a point where you need to sum up and start over.
>>
>>101347385
Shut the fuck up, Hiro, we aren't giving you a penny.
>>
>The demon laughs, a horrible cackle that echoes across the mountainside. "別想欺騙我,你這個下等的生物!" it sneers
From Command-R. I was annoyed but the dialog makes sense which makes me kind of wonder how this shit works. Surely the model wasn't trained on a corpus with English narration and Chinese dialog. Is it just random chance that the Chinese dialog was appropriate?
>>
>>101347629
Why are people posting webms instead of jsons?
>>
>>101347730
NTA, but 7t/s, empty context at q6
>>
>>101347887
Wait, can you really fir 70B at q6 into two P40? That doesn't seem right.
>>
>>101347808
There is an option in Silly to see token probabilities. See how likely the moonrune was to appear after the double quote.
>>
>>101347912
It's too late now
>>
>>101347788

Yeah, is that possible with ST? Like, manually add entries of key moments that occurred in your RP. Then, when the bot starts losing its shit, reset chat, enter a message or two, like your summary idea, and still have it pull additional stuff from its lore book or whatever? Either way, it’s off to messing around and learning Silly Tavern for me.
>>
>>101347921
No it's not, you can always put

>The demon laughs, a horrible cackle that echoes across the mountainside. "

into the same context and ask it to continue.
>>
>>101347902
3xp40
>>
>>101347941
Right.

And due to how offloading works, only one card works at a time and the others are waiting for that one, to finish calculating its layers, right? So there'd be no difference between say 2 or 4 cards if using same quants? Do you get coil whine because of constant switching between working/idling?
>>
>>101347769
>hur hur tokenizer is blind to spelling
Everyone less retarded than you is already aware. At least have it rewrite sentences according to grammar production rules. That’s actually insightful about the limitations.
>>
>>101347730
Haven't tested yet, still waiting for some parts to get third card installed. For the screenshot I loaded Gemma 27B 8_0 roped to 32k context on KoboldCpp which gets around 8t/s around 1K context.
That's without FA and any P40 specific optimizations though.
>>
>>101347965
coil whine yeah, and we have to use rowsplit or else we suffer half the t/s so it's exactly as you have described.
q4 is still slightly faster iirc since it's still less work for them at the end, but anything above 5t/s is faster than my reading speed anyway.
>>
>>101348008
>>101347769
I think id they include artificial entries about spelling into the dataset, the model will learn it. Not that it's really needed, of course...
>>
>>101348028
Well, fuck, that's painful. I wonder if there can be an option to keep the card busy with useless calculations just to keep the coil whine at bay.
>>
>>101348049
It doesn't add that much to my already shitty lousy 3x40mm fan setup. I'd gladly take the 10w idle.
>>
>>101347510
Does it work for a 3090?
>>
>>101348209
Don't think you need it, any more modern GPU should be able to handle pstate switching well enough on its own.
>>
>>101347197
Sometimes you have to grab a llama.cpp release from several months back for Windows.
>>
>>101347197
buy a normal up to date PC, retard
>>
>>101347965
>And due to how offloading works, only one card works at a time and the others are waiting for that one, to finish calculating its layers, right? So there'd be no difference between say 2 or 4 cards if using same quants? Do you get coil whine because of constant switching between working/idling?

Depends on how you set --split-mode .
With --split-mode layer (default) it works like you described, with --split-mode row the matrix multiplications are parallelized across the cards.
But whether this is actually faster depends on how fast the GPUs are relative to the interconnect speed; for P40s it should be faster unless you're trying to split a very small model.
>>
So I've noticed hiccups and slight delays as I start typing in my prompts. Should I be emptying the text history as I go? It's currently 76,000 words.
>>
>>101348459
That shouldn't be happening unless you're editing the card's first greeting and it's updating in real time ie it's a new chat.
>>
>>101348416
>with --split-mode row the matrix multiplications are parallelized across the cards.
Oh wow, that's cool. Do memory bandwidth problems come from having to deliver intermediate tensors entirely to all videocards before each attention layer?
>>
File: 3219043203.gif (929 KB, 480x358)
929 KB
929 KB GIF
>>101347197
AVX2 is 13 years ago
>>
>>101348485
The current architecture is that there is a single --main-gpu that for each matrix multiplication distributes the activations to all other GPUs and then collects the results afterwards.
Honestly the bandwidth problems to a large degree just come from poor optimization.
>>
>>101348481
>>101348459
My understanding is it should: he's way over context limit, and silly removes old bits of text from chat history right after system prompt, to fit new text, so the change happens at, say first 10% of the context, which changes 90% of the remaining context and calculations for it can't be used from the cache. And delays are the prompt processing for that 90% of the context.
>>
>>101348481
Hmm, I wonder what it could be then. According to the last request served info it's not actual processing time that's increasing, but it takes like a whole second to register me sending the prompt and when I start typing the prompt or edit the existing text it takes an equal amount of time to start visually showing me typing. It's accepting input during that downtime, because if I just keep typing through the delay it fills everything in when it catches up.
>>
>>101348517
That absolutely should not affect actually typing the text, though.
>>101348521
Ram usage? Your browser might be cooked.
>>
>>101348530
I have degen tab management habits (259 open tabs), that would make sense. Thanks.
>>
>>101348530
Oh, I have misunderstood his problem. So it's freezes when you're typing? I had a similar thing when editing card description, but it's more or less gone now with newer versions of ooba and silly.
>>
I've been using mixtral-8x7b-v0.1.Q4_K_M which is 24GB and it runs really well on my system.
Tried gemma-2-27b-it-Q3_K_M which is 12GB and it ran painfully slow, so I tried gemma-2-9b-it.Q8_0 which is 9GB and its more usable but still slower than mixtral.
What gives?
What can I read to learn how this shit works?
>>
File: 11__00258_.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>101348725
Mixtral is an MoE model.
It's not using all of the parameters at once like a 27b.
It's the same reason why a 8x22b tends to be faster than 70b - same concept scaled up.
>>
>>101348774
Doesn't explain why one 9b is slower than two 7b experts for him. Although maybe the quant? 8 vs 4?
>>
>>101348725
>>101348783
The generation speed is roughly proportional to the number of active weights times the bits per weight.
Mixtral is faster than Gemma 27b because it has fewer active weights.
Mixtral is faster than Gemma 9b despite having more active weights because it has fewer bits per weight.
>>
>>101347510
Isn't pstate = 16 more power efficient than 8?
>>
File: 1698451366825957.png (116 KB, 1139x1163)
116 KB
116 KB PNG
>>101344658
MMAP is bugged and just doubles any model in ram so I keep it off.
>>
>put instruction after last message
>model ignores part of the message
>put instnruction before last message
>model ignores part of the instruction
i'm tired
>>
>>101348415
I am literally waiting for new Ryzens.
>>
Vntl Leaderboard anon
Can you test this
https://huggingface.co/LLaMAX
https://github.com/CONE-MT/LLaMAX/
https://huggingface.co/papers/2407.05975
>>
>llama server randomly breaks and doesn't respond to requests since a couple days ago, give up trying to fix it and go do something else
>come back today and finally figure out someone just went through and renamed all the build flags so everything I was using to compile was getting ignored

why
>>
>>101348996
'why' has been deprecated and will be removed in a future version.
>>
>>101348890
yeah, people should disable mmap, mmap wasting memory, especially for big models
>>
>>101349113
*on windows
>>
>>101348996
https://github.com/ggerganov/llama.cpp/pull/8006
>>
>>101348882
>https://pypi.org/project/nvidia-pstate/
16 is "high"/"let driver decide" and 8 is "low" (power).
>>
>>101347583
This seems to work alright. The story string comes from the virt-io stuff that Lewdiculous suggests.
Note that you'll need to insert
<bos>
at the start of the story string, if you aren't using llama.cpp, ollama etc..
>>
>>101349152
Link to the virt io stuff? I would def if I could incorporate it into other models like miqu
>>
File: IMG_2612.jpg (685 KB, 1284x1924)
685 KB
685 KB JPG
It looks like used 3090s have gone under $600 now, tempted to get a second one, maybe when they hit $500
>>
>>101349197
Thanks just bought all those listings
jj tho
>>
>>101348920
Temp 0, rewrite until you get what you want.
Use the common techniques to make the model "pay attention" to your instructions, stuff like turning your instructions into a list of tasks.
>>
>>101348774
>MoE
Huh that's an interesting concept. Doubt I'll ever really understand what all that math means.

>>101348783
Tested the Q4 version of the 9b model and it's much better.

>>101348853
I'll have to keep that in mind.
>>
>>101349489
Non-MoE transformers layer (repeated a lot of times):

A. Enrich each token with information about other tokens using attention neural network layer
B. Process each token independently from others using feed forward neural network layer

MoE transformers layer (repeated a lot of times):

A. Enrich each token with information about other tokens using attention neural network layer
B. Choose two out of 8 available feed forward neural network layers and process each token independently from others using those, ignoring 6 others
>>
>>101348920
You're probably not differentiating enough the user's message from the instructions. If you're using Gemma-2 try something like this. It's not the "approved" format but it works. Test instructions one by one to make sure they have the correct effect.

<start_of_turn>user
Last user message here.<end_of_turn>
# Instructions for your next response
- Inst 1
- Inst 2
- Inst n
<start_of_turn>model
...
>>
>>101349540
This raises questions for me, but I'll save them. I'll go read some more docs.
All this thinking is headache inducing.
>>
>>101349197
Wait for the flood of 32GB V100s brother
>>
>>101349596
https://archive.is/8r7t9
Good explainer from hf
>>
Anyone happen to have HF files for gemma 27b-it?
>>
>>101349547
>>101349470
thanks, i'll try the list thing. I already use custom "headers" inside messages like [instruction]\n but the rest of the message does look similar to normal messages, especially since everything is in first person.
>>
File: prompt.png (59 KB, 1464x371)
59 KB
59 KB PNG
>>101346493
>>
File: Gemma.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>>101349740
>>
>>101349698
https://huggingface.co/unsloth/gemma-2-27b-it
>>
>>101349752
>repo also has the files included
I'm fucking blind. Thanks, anon.
>>
Is gemma 27B worth using now or is it still kinda broken?
>>
>>101349769
LLMs are a meme in general, so no.
>>
>>101349636
Ah good this is the one I was looking at.
>>
Hey /lmg/. What about [insert current subject that has been talked to death]? Are there any updates? I can't be fucking bothered to scroll up, let alone check previous threads in the op.
>>
>>101349810
2 weeks
>>
What's the verdict on gemma2? I'm still using llama2 btw
>>
>>101349627
two more weeks
>>
>>101349727
One thing that I found works really well for character cards and general, non-character specific instructions, is to not try to address the model behind the character.
You'll often see things like
>You are {{char}} with this and that characteristics
in the character card and
>You will write {{char}}'s next message in such and so way
Try rewriting those as definitions instead of direct instructions to an abstract narrator, like
>{{char}} is so and so and has this and that characteristics
in the character card and
>Write {{char}}'s next message in such and so way
or
>{{char}}'s next message will be such and such
That kind of thing.
Prompting is not a meme as it turns out. You can get even dumb models to focus on do some really impressive things, although the gene lack of "intelligence" is just a thing one has to contend with, although it might not matter for most ERP.
>>
>>101349794
Why are you zoomers like this?
>>
>>101349836
4 months is the window I've heard about (at least for microsoft's). V100s at this point aren't worth their place in the datacenters since they're actually capacity and not gpu constrained. Doubt you care though since you're using some stale ass meme that only zoomers still latch onto
>>
>>101349846
Boomers regurgitate whatever talking points newsman says, zoomers regurgitate whatever talking points their favorite hugbox youtuber says. Millennials regurgitate whatever talking points their favorite DC faggot league super says. Nothing has changed, really.
>>
>>101349878
You're a prime example of why I generally stop talking to people as soon as I find out they are circumcised.
>>
gemma sucks balls, back to midnight miqu
>>
Got em
>>
>>101349888
What did he do wrong?
>>
>>101349888
what do you have against americans?
>>
>>101349965
blabbling miqu
>>
>>101349888
>circumcised
Are gentiles still mutilating their sons?
>>
ST gemma's templates? anyone?
>>
>>101350141
Americans are evil imperialists
>>
why is it that when i connect sillytavern to oobabooga the streaming does not work.

OS: Endeavour OS
What I did:
>install SillyTavern
>Install Oobabooga
>run oobabooga with --api
>download Qwen/Qwen2-0.5B
>load model
>go to sillytavern, select Text Completion, put in the url, do not click Legacy API
>it works, and shows "qwen2 0.5b"
>select the default card
>type "test"
>nothing happens for 10 secs
>response comes all at once


also tried:
loading different model, loading in gguf

What i did seems reasonable, and it should work, but it don't.
>>
>glm-4
>constantly fucks up basic shit
>constantly becomes retarded and spams "!!!!!!!!!!"
I dunno if llama.cpp is broken or if this model is just garbage.
>>
>>101350308
It should just work if you do that. Nothing else needed. Do you get streaming inside ooba UI?
>>
>>101350345
Yeah, it behaves really, really weirdly.
I'll give it another try today, maybe I'm fucking something up in the context or instruct templates, but it might just be that the model is that bad.
>>
>>101350358
yep, streaming inside ooba works fine.
>>
>>101350386
Try koboldcpp, if nothing else, to isolate whether if's a backend or frontend problem.
>>
>>101350345
buy an ad
>>
>>101349888
>zoomer immediately starts thinking about my cock
kek what a generation
>>
All new models are shit. We must return.
>>
>>101350296
>America, send us financial aid!
>America, send us medicines that your own people can't afford to get!
>America, fight our oppressors!
>America, let us invade your nation!
>America, let us rely on your currency in the world marketplace!

>America, stop touching us with your way of doing things!
>>
>>101350308
Did you actually check the box for streaming in sillytavern?
>>
Dry sampler in Llama.cpp when?
>>
File: file.png (51 KB, 1007x410)
51 KB
51 KB PNG
>>101348965
Not good, the model is quite retarded.
>>
>>101350676
You, not we. Just run older models then if they are better, nothing is stopping you.
>>
>>101350676
return to what
>>
>>101350905
GPT-J obviously.
>>
>>101346458
Yep that's pretty much the universal observation of WLM. Very smart, but slopped to the brim
>>
>>101347362
>cat ears and tails are deranged fetishes
>>
>>101349748
gemmy-chan...
>>
>>101349627
>Wait for the flood of 32GB V100s brother
Why so hung up on V100? Yes it has a decent tensor core count and 32GB, but it's nowhere near a 3090, and it if has an issue with something, it's going to be at the back of the line for fixes since it's such a corner case.
Also delusional ebay sellers are just going to continue to be delusional.
>>
>>101347330
>where_do_you_think_you_are.jpg
>>
>>101350988
Hbm2
>>
>>101350857
>To address this, we dedicate 35,000 A100-SXM4-80GB GPU hours in conducting extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages
Rip
>>
>>101350961
nta, those are shit taste indicators.
>>
File: ThisFuckingGuy.png (64 KB, 1287x591)
64 KB
64 KB PNG
>>101348965
>>
>>101350676
https://huggingface.co/EleutherAI/gpt-j-6b
>>
File: 1716470176720287.png (1.9 MB, 1024x1536)
1.9 MB
1.9 MB PNG
>>101346493
lel
>>
File: VOOOOOOOTE.png (21 KB, 1505x190)
21 KB
21 KB PNG
voting matters
/pol/ btfo
>>
>>101351143
rent free
>>
File: Designer (1).jpg (238 KB, 1024x1024)
238 KB
238 KB JPG
>>101350996
P100 is HBM2, it's not magic, it doesn't necessarily give it way more bandwidth over GDDR6. It really only helps for training. Are you training models?
>>
>>101351143
Based.
>>
>>101349188
https://huggingface.co/collections/Lewdiculous/useful-65e6a91d5fbfe6b32586d265
lead me to
https://huggingface.co/Virt-io/SillyTavern-Presets
>>
https://www.techpowerup.com/324319/amd-to-acquire-silo-ai-to-expand-enterprise-ai-solutions-globally
Anyone hear of them?
>Silo AI team consists of world-class AI scientists and engineers with extensive experience developing tailored AI models, platforms and solutions for leading enterprises spanning cloud, embedded and endpoint computing markets.
>>
>>101351168
4xv100 32GB sxm will be the play in 4-6 months. Believe it! Local audiogen will breakthrough.
>>
>>101351249
>Local audiogen will breakthrough.
already has
>rentry.org/stableaudio
>>
>>101351266
Go fuck another goat, petra the algerian
>>
File: belief.png (592 KB, 747x800)
592 KB
592 KB PNG
>>101351249
>Believe it!
>>
File: file.png (153 KB, 773x987)
153 KB
153 KB PNG
>>101351308
mental illness
>>
>>101351217
lol. xAI probably has better people than this literally who company.
>>
>>101349627
>>101351249
Delusional.
>>
Why is everyone in this field talking about "infinite context soon!" when we can barely achieve 64k of coherent context in sota corpo models with hundreds of vram?
>>
>>101351812
Maybe if we rotate the rotation or shove ten embeddings into each context slot or....
>>
>>101346948
Romance is a constant struggle of both partners trying to deceive each other into thinking they are more attractive than they actually are. It is the natural state in animal kingdom. Therefore "that person doesn't exist" is a natural state of romance.
>>
File: Phi-ATMa-nala.png (95 KB, 928x340)
95 KB
95 KB PNG
Interesting result.
Spatial awareness leaves something to be desired and it's a bit schizo at times even on t=0.81
One more epoch of training and we'll see the final result.
>>
>>101351812
It's Pascal's Mugging.
If you promise VCs a 2x return on their investment they will only do it if you can convince them that you have at least a 50% chance of actually being able to do it.
But if you promise them a 10000x return on investment they only need to think that you have a 0.01% chance of being able to do it.
>>
is it possible to uncuck gemma-2 27b somehow
>>
>>101352122
yes but performance falls down
>>
>>101350393
tried it, and the streaming still doesn't work. Seems to be a frontend issue.

>>101350747
I looked, only thing i found was this.
"Smooth Streaming" suggests that it just splits up the tokens into letters, and dispenses them one by one.
I tried it anyways, and it didn't fix the issue. Are you referring to a different "box for streaming"?
>>
>>101352772
There's also Streaming FPS. Mine is 30.

Also make sure stream: true is in the console.
>>
File: Untitled.jpg (27 KB, 330x267)
27 KB
27 KB JPG
>>101352772
nta but for me the streaming button is near the token options at the top using llamacpp
>>
>tfw change to a faster MoE model that's now 2 t/s
I don't need more.
>use it more, feeling the limits of 2 t/s
I don't need more.
...
I NEED MORE AHHHHHHHHHHHHHHHHHHH
>>
>>101352801
that fixed it. :)

But it raises the question: why isn't this option enabled by default?
>>
>>101352969
Meanwhile,
>Total:238.05s (1.50T/s)
Nice, this model is really cookin'!
>>
>The powerful LI-DiT-10B will be available through the online platform and API after further optimization and security checks.
its over
>>
>>101353197
>ledit
Of course.
>>
>>101353197
>LI-DiT-10B
Whats this? Chinese diffusion model?
>>
>>101352122
use any context at all / use a tiny but of a prefill.

Dont feel like reposting shit so just go back a few threads for one of many examples.
>>
>>101345759
>(07/10) Anole, based on Chameleon, for interleaved image-text generation
Did anyone try this?
>>
>>101353834
be the first one
>>
>>101353834
None of the backends normal people use supports it, so no.
>>
DeepSeekV2 Quality x Price is unbeatable, don't miss on it anons.
>>
>>101354012
If only I could local the new Coder, but it's too thicc.
>>
>>101353834
I'll wait for hentai finetunes
>>
It's so hard, Anons.
Gemma 2 is nice but too stupid.
LLaMa 3 is smart but can't write for shit.

localfags regressing in context, stuck in 8K token hell. Proxyfags and Corpo shills eating so good with 200K tokens. But don't you dare to be deviant on a paid API...
>>
>>101354571
the shortcomings of anything would always be apparent, don't delude yourself into thinking you could ever be content
>>
Is there *any* way to make gemma2's context longer?
>>
>>101354571
>eating so good with 200K tokens
according to even aicg, claude massively degrades after 16-24k tokens...
>>
>>101354614
It works at 16K just roped. I don't notice any loss in performance.
>>
>>101354716
How do you do this with llama.cpp? I tried yarn and it couldn't even write sentences.
>>
>>101353834
>(07/10) Anole, based on Chameleon, for interleaved image-text generation

Setting this up now, I'll post some gens here. I'll also try to run prompts anyone posts because I'm not very creative.
>>
>>101354741
dont use yam, it makes it retarded
>>
Has anybody tested flipping the headers around when interacting with some of these "censored" models? Basically you have the model complete the user's message and you write the assistant's message. In principle, only the assistant's responses are filtered, right?
>>
>>101354898
That would likely make it retarded. Just use prefills like normal people.
>>
>>101354826
Oh I see. What frequency are you using? I saw 16k somewhere.
>>
>>101354614
No.
>>
You know what is Crazy about Anole? 30 minutes of training, 40m parameters, less than 6000 image dataset.

Imagine what a more dedicated effort will be able to do.
>>
>>101355115
have you used it?
>>
File: other1.jpg (2 MB, 4309x3456)
2 MB
2 MB JPG
>>101355115
the images look like complete shit though
note that these are cherry picked
>>
>>101354823
I hope you have a GPU of over 24gb vram anon, because that did not work on my 24gb card.
>>
>>101355464
They look like a couple of years ago, which is probably the level of training that was SOTA then but is proof of concept today.
>>
>>101354898
The voice answering doesn't seem to matter, it's just looking for some "harmful" direction(s) in the embedding space after which it starts answering in the "refuse" direction.
>>
>>101355464
>>101355568
note that it was trained on less than 6000 images
>>
File: 1716329112755149.png (674 KB, 1792x1024)
674 KB
674 KB PNG
Daily reminder
>>
>>101355464
It looks great anon. What did you expect? SD3 quality from an experimental multimodal model that generates pics without the help of clip?
>>
>>101355464
See >>101355821
>>
File: 1715943594084913.png (54 KB, 628x784)
54 KB
54 KB PNG
>>101355821
trvke.
>>
>>101355821
[5 Sam Coins were transferred to your account]
>>
I've implemented conditional prompts and sequential replies in my frontend.

{
"id": "g5ny3qoe",
"reply_after": "user",
"reply_if": "else"
},
{
"id": "g5ny3qoe.e0",
"reply_after": "user",
"reply_if": "**command** | **order** | **must**"
},
{
"id": "g5ny3qoe.e1",
"reply_after": "g5ny3qoe.e0",
}
>>
>>101356130
>>
File: 1709544719912216.jpg (103 KB, 515x793)
103 KB
103 KB JPG
Its obvious multimodals are the future. Which backends will support them? Are multimodals able to be quanted? Do you think the quality of text generation will be worse than current local models for first generation multimodals? What kind of hardware requirements do you expect to be able to run these models?
>>
>>101356430
>Its obvious multimodals are the future.
why would you want to be stuck with how one model does everything when you can pick specific ones you like and get a much better result
>>
>>101356430
>Its obvious multimodals are the future.
Nah, they don't bring any performance benefit to the table and they are more expensive to train. I think they will just be an alternative, not the future.
>Which backends will support them? Are multimodals able to be quanted? Do you think the quality of text generation will be worse than current local models for first generation multimodals? What kind of hardware requirements do you expect to be able to run these models?
Dunno.
>>
>>101356130
based...
>>
>>101356430
>Are multimodals able to be quanted?
obviously theres *going to be* a quant
>>
>>101356430
Yes. We're reaching the limits of training with written data, the next thing is adding images and at some point audio.
>>
>>101356478
Vision is kind of nice because you can write everything offline and then just dump it into the computer.
>>
>>101356643
>implying it can read my handwriting
>>
>>101356643
you get a text model and a vision model then. you don't need 1 multimodel to do both and in pretty much any instance, multiple models you prefer will be better overall
>>
>>101356685
Older vision transformers and gpt-v can read mine well and it’s not great. Clip can’t the way it’s used in llava but it sounds like llava-next might be better about that.
>>
>>101356716
local models?
>>
>>101356633

You mean... AI Agents? Are you guys retarded?
>>
File: file.png (3.4 MB, 1090x1548)
3.4 MB
3.4 MB PNG
>ITT
>>
>>101356704
There is too much delay between multiple modals. Wouldn't it be more efficient to just use a multimodal?
>>101356633
It seems like the logical conclusion. I don't doubt that text based LLMs can get way better and smarter, but for everyday usage a multimodal just seems like a step up from what we have.
>Can talk to your model like a person and have it talk back with minimal delay.
>Can have your model understand images, including your environment around you.
>>
>>101356761
Yes I have a local one that can although it’s not multimodal.
I’d really love a model I can just submit handwriting to and it figure out if it needs to go to the shell or vim or compile a report or whatever. That would be cool.
>>
>>101356874
>delay
i don't know for sure but i'm guessing they still process things separately like text first, image second. i'm mostly thinking that specific tunes of any model are still going to be better and preferred vs one model, so you'd end up using a separate model anyways, say for images. and if you do that at all, you're wasting resources on the base model even having that data in it in the first place
>>
>>101356943
I tried to get STT + TTS to work with my preferred model. There's many different implementations, but the common issue is that there is an inherent delay between all of the moving parts which makes speaking with your model very annoying. After watching what GPT4o, Claude Sonnet, and Moshi can do I am convinced that multimodal is the future. Unless a framework or some other technology comes out that allows seamless integration of singlemodal models I don't really see multimodal not becoming the norm.
>>
File: Phi-ATMa-nala.png (137 KB, 932x507)
137 KB
137 KB PNG
>>101351922
That extra epoch really did some good.
>>
>>101356995
online stuff might have the advantage of being able to process multiple pieces at once. on your local computer though if the image or voice was processing at the same time as the text, its going to cause all of it to slow down to the speed of your system. whether it processes all at once or in a queue doesn't really matter since its going to take the same amount of time overall.
what i was talking about is if you have a text model, say 70b, then chop part of that off to add in text to image and image to text, voice to text and text to voice, you've dumbed down the text part of the model to allow the rest to fit. so if you like the text and voice of the model, but then want to use another for image, you've got the image portion of the multi model being added into the mix. maybe it won't be a big deal in the future with better hardware or models get smaller (pls bitnet), but right now you want to maximize all the resources you have
>>
what if D&D, but hookers
>>
>>101356478
>get a much better result
no. multimodal is the future and the results will be better.
>>
>>101357223
'safety and alignment' alone ensures this will never be the case
>>
File: hs61gjk1h56d1.png (815 KB, 1024x1024)
815 KB
815 KB PNG
>>101357314
picrel
>>
>>101356876
what model exactly? all OCR models i tried are utter trash at recognizing printed text, let alone handwritten
llava was ok
>>
>>101357314
this. uncensoring LLMs is already impossible (see that abliterated meme), prompting makes it dumber or schizo, now imagine a multimodal model, all the parts raped with kosher brainwashing.
image-gen is easier to uncensor because you work with pixels and diffusion there.
tldr: better ai model architecture -> better censorship & (((safety))) methods.
>>
>>101357473
>uncensoring LLMs is already impossible
What's your endgame?
>>
Hey, I'm reading your guides to not be an annoying newfag but I've got one question. There's various places that specify how much ram they need, is this VRAM, system memory RAM, or does either work? I have 32 gb of RAM but only 10 of VRAM.
>>
>>101357814
VRAM: Models up to about 90% of your VRAM will run super fast.
RAM: Models up to 85% of your system RAM will run slowly, but fast enough to be useful if you have other things to do while it processes.
Models larger than that are out of reach.
>>
File: offload_x_performance.png (96 KB, 1536x1152)
96 KB
96 KB PNG
>>101357814
With llamacpp and is derivatives (koboldcpp ollama etc) you can split the AI's model between RAM and VRAM.
You want to have as much of the model in VRAM as possible in order to have the fastest prompt processing and inference speeds. Do keep in mind that it's not just the model's weights that occupy space, there's prompt caches, buffers, and all kids of other things.
>>
File: 1691106522629463.jpg (2.44 MB, 3012x3580)
2.44 MB
2.44 MB JPG
>>101345759
>>
>>101357898
1
>>
What's the best LoRA scaling factor and why is it 1:4?
>>
>>101354012
>Price
kek
>>101354571
wizard 8x22 doesnt have this problem
>>101357814
>does either work
yes, except ram is slower
>I have 32 gb of RAM but only 10 of VRAM.
download gemma q4 k s and run it in ram or offload a few layers with llama.cpp / koboldcpp
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

your not gonna get a better model for your specs
>>
>>101357776
spreading truth about AI meme is always morally correct.
>>
>>101354571
Jewish hands typed this post.
>>
>>101357861
>>101357869
>>101357916
I see, so it's just slower going but doesn't affect the quality. I've got stuff to multitask with so I don't mind that much at all. Also thanks for the recommendation anon, I'll give that model a shot.
>>
>>101357008
>Are you ready?... (to embark on this bonding journey)
>>
File: file.png (13 KB, 306x192)
13 KB
13 KB PNG
>>101356130
why is gemma always inserting extra line breaks where there shouldn't be any. There are examples and all of them have one line break, but gemma ALWAYS inserts two here, and this is the first message
>>
>>101358000
Quality is a function of the model itself and the quantization level.

Above all, the Q number matters, and every digit down compounds the loss of quality.
Q8: As high as it goes. You'll see _0 and _0_L versions, either is fine, with _0_L being experimental but perhaps *slightly* better in metrics.
Q7: Legendary Pokemon that may be hidden beneath a truck.
Q6: Also fine. Available in _0 (old style) and _K (new style).
Q5 and 4: Economy quants, things haven't gotten horrible yet but beyond here be dragons. A few of us think that K_S is better at information retrieval (being right about factual details) than K_M, which would be better for creative writing.

Q3 and down, lone Q quants are too stupid to live. So we go into two things that help.

iMatrix (iMat or i1) makes lower quants "know" what information can be sacrificed.
IQ quants: Designed for low Q numbers, they introduce XS and XXS varieties.

Don't Q under 4 unless it's got IQ quants and matrix or it's hopeless. And even then it gets rough fast.
>>
>>101355464
I'm learning rubiks cube algos and that would be real handy
>>
>>101358137
it just some weird watermark, don't think about it too much.
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101358139
>>
>>101358157
there's no way it could draw an accurate 3x3 and track the changes, plus you'd have to relay your scramble to it, it wouldn't work
>>
>>101358160
i think extra lines, tabs and spaces are all concatenated into one token anyway, so it shouldn't affect output quality
>>
Is there a method for using AI to improve your own writing quality with a toaster? I only have 48 GB of RAM and an old GPU that only has 8 GB of VRAM. Models are hilariously slow and have poor quality..
>>
>>101358369
>using AI to improve your own writing quality
Wrong direction.
AI is a font of cliche, repetition, and mostly bad writing styles harvested from the Internet commons.
>>
>>101358369
>method for using AI to improve your own writing quality with a toaster
give it text and ask what can be improved, anything else is cope, you shouldnt let it write anything for you

>48 GB of RAM and an old GPU that only has 8 GB of VRAM
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF
>>
>>101358429
Okay, thanks!
>>
File: 1696297761013032.webm (1.64 MB, 460x558)
1.64 MB
1.64 MB WEBM
>>
>/lmg/ - local models general
>>
>>101354571
That's why I just use openrouter and switch between models. I use Claude Sonnet for most gens but Euryale takes over during sex. Whenever euryale gets too retarded I use CR. Then back to Sonnet for anything else.

You do have to be a richfag like me for this to make sense tho
>>
>>101358583
I'm a model near you.
>>
>>101358611
You must have pictures to prove your statements
>>
>>101358583
Right, the naming means nothing.
>>
File: logh bitten bullshit.jpg (75 KB, 640x480)
75 KB
75 KB JPG
>>101358630
All I have is a picture of Fritz Joseph Bittenfeld.
>>
I realized that novelists will now start to use chatgpt to assist them in their process. "rewrite this sentence for me", "give me a metaphor for x" etc. The resulting slop will make it into books, which will be used in further training. The slopocalypse is inevitable, bros.
>>
>>101358137
proprietary reddit spacing
>>
>>101358481
the uncanny infinity vortex hurts my brain
>>
File: Wiz-32k-test-iq4-xs.jpg (54 KB, 1269x507)
54 KB
54 KB JPG
>>101354571
Why do anons still act like the 8k limit is something you have to live with?
Wizard 8x22b can recall information perfectly around 32k, works well with quantized cache too (picrel uses 8-bit cache)
>>101343344
If you were capable of running anything higher than a lobotomized 2-bit quant before writing it off you'd know that it actually works perfectly fine.
Full log: https://files.catbox.moe/y17y6c.txt
>>
>>101358970
sounds really fucking sketchy
>>
>>101358186
For two newlines, yes. For "extra lines, tabs and spaces", no.

['Hello', 'Anon', '.', '\t', '', '\n\n', 'Lucy', 'says', '.']
>>
>>101345759
Running MMLU Pro against Gemma2-9b-it. It's really shit at following instructions. It keeps inserting unasked for formatting and despite telling it to write the answer in a specific way, it deviates multiple times. I patched the code multiple times to allow it to say e.g. "The answer is **(A)**" whereas the code initially would fail to extract the answer from this due to the extra **()** shit.
>>
>>101359128
>anon can't into regex or simple string parsing
Come on, anon. Show the instruction. Let's fix it.
>>
>>101359308
The thing is, MMLU Pro has as part of its test the ability to follow instructions. I don't want to spoon feed this fucker too much. Examples of it failing to format the answer correctly are:
1. Would need to add optional 'closest to', and another formatting alternative '( without **')
**The answer is closest to (A) $838.75.**
The answer is closest to (E).
2. Another variant: **Answer:** The closest answer choice is **(C) 34 hours**.

etc.

I can fix it easily enough. But if the model is asked to respond in a certiain way and it fucking ignores it, should I?
>>
>convinced by anons shilling for gemma2 9b
>it's just as bad as stheno
>mfw
>>
>>101359442
yup, zoomers gonna zoom
>>
>>101359442
skill issue, learn to prompt and write bots
>>
gemma style rigged lmsys arena
>>
>>101359522
Gemma is the best. Everyone says so besides retards who are in ultra-cope mode after spending thousands on expensive hardware they don't need.
>>
>>101359410
What's the MMLU test you're running? Do you have a link? And i still want to see your instruction to answer the questions. My suspicion is that it's more verbose than it needs to be. And for the love of anything you believe, never mention the word "formatting" to the model. Something simple like "You'll be asked some multiple-choice questions. Only show the letter of the correct answer."
My little LLM machine just went offline while i was trying to run a test. I know... i don't believe the timing either...
>>
Gemma cured my cancer and brought my dog back to life.
>>
>>101359554
Lana told me she loved me and offered to suck me off
>>
>>101359554
it also gave you a brain cancer it seems
>>
>>101359552
The instruction was the MMLU Pro default instruction, initially. I added the part in parens at the end:
"The following are multiple choice questions (with answers) about {subject}. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice (note: you MUST use the exact phrasing 'the answer is [A-J]' where [A-J] is the correct letter answer)."
The MMLU test is https://github.com/chigkim/Ollama-MMLU-Pro
>>
>>101359475
nice meme
>>
>>101359475
model issue, transformers issue, etc. the only ones with real skill are the ones who censor LLMs, it cannot be surpassed, its not like you zoomers can comprehend whole implications of this.
>>
>>101359544
Thank you for sharing your opinion about Gemma. It's great to hear that she has such a positive reputation! It's important to remember that everyone has different needs and budgets when it comes to hardware, and what might be unnecessary for one could be quite important for another. Let's try to keep the conversation inclusive and supportive for all preferences and choices.
>>
>>101359599 (me)
I ripped out the paren note as it didn't seem to make a diff.
This is where I'm at. I'm being way too nice to this retard:
pattern = r"answer is ([\*]*)([\(]?)([A-J])"
match = re.search(pattern, text)
if match:
return match.group(3)
pattern = r"answer is closest to \*\*\(([A-J])\)"
match = re.search(pattern, text)
if not match:
pattern = r"most accurate description is \*\*\(([A-J])\)"
match = re.search(pattern, text)
if match:
return match.group(1)

And it still manages to fail: The answer that best encompasses these challenges is (E).
>>
>>101359554
Gemma restored my foreskin.
>>
File: file.png (818 KB, 768x768)
818 KB
818 KB PNG
>>
>>101359759 (me)
... Answer: The best answer here is **(B)**. Here's why:

Also, gemma2-9b is being asked explicitly to do CoT by the instructions, and yet it will very often start off as above, and then start chattering about the problem.
>>
>>101359554
And it did that even when all the loaders are still bugged. I can't even imagine what is gonna happen once the loaders are fixed.
>>
>>101359795
I don't dislike it.
>>
gemma 9b sucks at writing. I even told it to be descriptive, verbose, use award winning prose, but it just droned on and on without ever getting to the point
>>
>>101359884
did you ask it to write something explicit?
it will endlessly filibuster if it doesn't want to comply
>>
>>101359808
The instruction seems concise enough, so disregard my doubts about that.
If (X) is the only thing it keeps consistent, i'd consider that enough to match it.
    pattern = r"(\([A-J]\))"

I assume you're playing with extract_answer()...
But if the model goes on rants, maybe it IS dumb. The potential problem i see is that the test itself asks it to rant about the question. I understand why they do it. but it may confuse chatty models.
>>
>>101359884
>I even told it to be descriptive, verbose
>it just droned on and on

wtf how could this have happened
>>
>>101359884
>without ever getting to the point
That looks exactly like award reading prose. Just today i was reading this:
>https://www.gutenberg.org/cache/epub/32037/pg32037.txt
>Title: Eureka: A Prose Poem
>Author: Edgar Allan Poe
>>
File: s-l400-3408884227.jpg (52 KB, 400x373)
52 KB
52 KB JPG
>9b
>>
>>101359912
No, it's all over the place. Sometimes **(A)**. Sometimes (A). Sometimes **A**. Simetimes A. Or, hey, how about this new variant:

**Therefore, the options that are NOT evidence that DNA is the genetic material are B, D, E, F, and H.**

(i.e. lets just exclude what we think is the answer entirely, despite being asked to write it out)
>>
>>101359976 (me)
Actually, just realized it's excluding "GIJ" as well, so I guess it didn't know which was the answer there.
>>
File: a.jpg (69 KB, 784x579)
69 KB
69 KB JPG
>>101274994
>>101274094
>>101274250
>>101274496
any sourcecode? llama/silly are kind of ugly messes beyond saving.

>>101274094
>Post your custom frontends anons.
lmfao I use a .env file to set the URLs. Posted this last year on my trip
I can set any IP I want but tkinter is a bit unweildy, eg the 2 ui settings boxes dont need to be used for every ai. But chatgpt preferred tkinter when I made it and I've only written 1 or 2 functions myself.
Why are you loading the models in this anyway? Is it really that hard to ctrl+c a cli on your mikubox from SSH?
>>101274496
i do something similar. I will care about context again when its infinite and includes video.
========
If I get 5 (you)'s ill cbf to post the relevant part of Forbin's sourcecode because >>101274250 reminded me of the flutter class I used to make the JSON to send over POST
>>
>>101357898
for me, it's migu (male)
>>
go back
>>
>>101360016
code models are good enough now that if you show it the api and tell it what you want, even codestral could come up with a basic ui
>>
>>101359976
I wonder what question it's trying to answer. I searched for 'DNA' and got 3 unrelated questions (recombinant DNA, paroviruses and control of cell division). Searched for 'material' and there's only one question regarding male and female catheters... nothing for 'evidence'.
I think just matching for (X) is enough. Any further and it's not going to be fair. You could also end up with false positives.
>>
>>101360094
its not searching for anything because its a 9b and its fucking retarded
>>
File: a.jpg (86 KB, 1253x388)
86 KB
86 KB JPG
>>101345838
>having a normal convo with miku
>need to open SD
>swap model to save VRAM
picrel
>https://files.catbox.moe/3zymc8.mp3
mfw
>>
>>101360099
>its not searching for anything because its a 9b and its fucking retarded
And yet, it has better reading comprehension than you. I'm taking about the MMLU-Pro questions. Is that clear now?
>>
>>101360132
>Running MMLU Pro against Gemma2-9b-it
test is irrelevant when the model is retarded to begin with
>>
>>101360079
im too last for this. new or even old opus could probably do better than tkinter but i just want to show it a picture and it sends back a zip file of code just opens a ngix based on a websearch for the API and i can start chatting.
When will agents be a thing again? Did all development just... <stop>?
>>
lazy* even
>>
stats take to much of gemma's 4k context.
sad,
>>
>>101360144
The anon running the test wants to know where it lands in the retardedness scale. I don't see a problem with that. Why are you angry? Did the LLM not let you touch it?
>>
>>101350308
are you running ooba behind a reverse proxy?
>>
>>101360148
>show it a picture and it sends back a zip file of code just opens a ngix based on a websearch for the API and i can start chatting
have you tried any of it? the based on a picture part sounds like the hardest to solve since the rest is automation and using the api
>>
>>101360172
>retardedness scale
all small models are retarded though. its common knowledge they hallucinate and have no spacial awareness, can't remember what happened a message ago. if you want to retreive data or search stuff at least use mixtral 8x7b, command-r. there is no 9b or smaller that is going to do it
>>
>>101345759
is
 export CUDA_VISIBLE DEVICES=0

the same as making a .env file and putting
CUDA_VISIBLE DEVICES=0

???
>>
>>101360094
I don't really mind not being fair, as I plan to use this internally to ensure that models I train do not get dumber than their parents. Kind of disheartening seeing how bad it is at sticking to such a simple instruction though.
>>
>>101360219
>all small models are retarded though
ALL models are retarded. Period.
>no spacial awareness
That's the least of their problems. Are you one of those expert roleplayers?
>can't remember what happened a message ago
Neither can you if you cannot follow the thread.
>if you want to retreive data or search stuff
I'M NOT THE ONE RUNNING THE TESTS. GET IT NOW? Anon was trying to run the test, i doubted the prompt, he proved that the prompt was simple and that the problem was that the model wasn't following the expected format required by the testing script. I suggested a less rigorous regex while, hopefully, not giving it an unfair advantage.
>>
>>101360280
Don't be too harsh, anon is a heavily quantized 7B model.
>>
>>101360280
>the 9b is retarded
yeah, got it
>That's the least of their problems. Are you one of those expert roleplayers?
i'm a wizard
>>
>>101360265
Yeah. A shame. Can you do the same test with a proven retard model like phi3-mini or something? Here's a crazy idea: It doesn't follow the response format to a T because it wasn't trained on benchmarks. If phi is significantly better, i'd be suspicious. Or gemma2 is, after all, kind of dumb and unruly. Also, i'm not sure if the regex needs .* at the beginning and end to match the rest of the string if there's extra noise in the output.
>>
>>101360280
I patched the code to cache all LLM responses, so I can rerun it with the original (strict) patterns as well, in case people wanna see the results.
I did remove the 'randomize the response and give it a score if it ends up correct' logic. If it can't even produce a response, it gets a 0 score period.
>>
>>101360317
Yeah, I could. There are thousands of questions though. It's taking quite awhile. Like, half an hour per subject, and there are 14 of them...
>>
>>101360325
>I did remove the 'randomize the response and give it a score if it ends up correct' logic. If it can't even produce a response, it gets a 0 score period.
Seems fair. If anything it seems to be testing the model's 'luck'. It's a weird methodology.

>>101360359
I suggested phi3-mini because it's tiny and seems to do very well in benchmarks. More than a 4B has any right to. Run it on a single subject. It will, at least, give you a baseline of what a 'well behaved' model's output looks like.
>>
>>101360094
>I wonder what question it's trying to answer. I

{'question_id': 3361, 'question': 'Discuss how the quantitative measurements of the dioxy-ribonucleic acid content of cells is evidence that DNA is the genetic material.', 'options': ['The increase in DNA content following cell division indicates that DNA is the genetic material.', 'The presence of DNA in mitochondria and chloroplasts, but not in the cell nucleus, is evidence that DNA is the genetic material.', 'The constant amount of DNA in all body cells and half the amount in germ cells is evidence that DNA is the genetic material.', 'The varying amount of RNA in cells indicates that DNA is the genetic material.', 'The ratio of adenine to thymine varies greatly between different types of cells, suggesting DNA is the genetic material.', 'The presence of histones in cells proves that DNA is the genetic material.', 'The correlation between the complexity of an organism and the amount of DNA in its cells points to DNA as the genetic material.', 'The ability to synthesize proteins directly from RNA without DNA involvement demonstrates that DNA is the genetic material.', 'The consistency of DNA sequences across different species suggests that DNA is the genetic material.', 'Polyploid tissues have multiple sets of chromosomes, which shows DNA is the genetic material.'], 'answer': 'C', 'answer_index': 2, 'cot_content': '', 'category': 'biology', 'src': 'stemez-Biology'}
>>
File: 1536720366445.png (55 KB, 278x248)
55 KB
55 KB PNG
>try out a "strong waman who don't need no man" card with the goal of seggs
>first few responses get pretty bad reactions even on swipes
>iterate on the strategy, trying out other different possibilities for my responses
>eventually get into a flow of using good humor and retorts to her seriousness and sass, that also don't step over the line of rudeness
>in the end, break into her shell, getting her laughing and smiling
Huh, did I just get groomed?
>>
>>101358481
Huh, is that Sora or a different model? I don't think I've seen any Luma gens of that level.
>>
>>101360401
Ah. The filter sucks. Thanks.
At least the model didn't reject C :).
Anyway. Give phi3-mini a go just to get a baseline. That one is well known to be trained on textbook-like data, so it should understand multiple choice better, without being actually smarter. Judging a model's "intelligence" is still difficult. Try the more permissive regex to see if you get more actual positives. Gotta split.
Best of luck with your finetune.
>>
>>101360229
Yes, if you run the following
set -a
source file.env
set +a
>>
I want to do lewd RP on a gaymer PC so only 8gigs of vram. Do I go for Lunaris, Stheno or Gemma?
>>
>>101360487
That's the strat. Now go fuck her silly and make her reject feminism.
>>
>>101360930
mythomax
>>
>>101361021
>>101361021
>>101361021
>>
File: output.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>101355464
It's a crude prototype, but I can see the flickers of something great.
>>
>>101358592
This is the most retarded thing that I ever read.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.