/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/10/24(Wed)00:30:34 No.101345759

File: GrecoRomanMikuCaughtUnawares.png (1.43 MB, 1168x888)

1.43 MB PNG

/lmg/ - Local Models General Anonymous 07/10/24(Wed)00:30:34 No.101345759 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101337910 & >>101328074

►News
>(07/10) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/10/24(Wed)00:31:17 No.101345764

Anonymous 07/10/24(Wed)00:31:17 No.101345764

File: f1d091593a285e27640f948af(...).jpg (154 KB, 1400x1255)

154 KB JPG

►Recent Highlights from the Previous Thread: >>101337910

--Optimal Format for Storing Bitnet Weights: 8-Bit Integers vs Packed Parameters: >>101338865 >>101338912 >>101338929 >>101339006 >>101339019 >>101339044 >>101339262 >>101339361
--The struggle of quantizing Gemma 27b yourself due to potential model abandonment by developers: >>101342775 >>101342852 >>101342980 >>101343071 >>101343490
--The Future of AI Testing: Beyond Riddles and Tricks: >>101338043 >>101338083 >>101338231 >>101338325 >>101338486 >>101338544 >>101338717 >>101338738
--RTX 3060 vs RTX 3090: VRAM, Bandwidth, and CPU Speed Considerations: >>101338192 >>101339240 >>101339275 >>101339317 >>101339354 >>101339384 >>101339473
--Midnight-Miqu-70B-v1.5 MMLU-Pro Benchmark Evaluation: >>101342270 >>101342322 >>101342346 >>101342404
--Gemma, the Drama Queen, Devastated by Snacktime Burp: >>101344477 >>101344497
--Gemma 2 and its Position Embeddings (or Lack Thereof): >>101338712 >>101338753 >>101338781 >>101338803 >>101338821 >>101338847 >>101338914
--GPT-40 Performance Metrics and SenseNova 5.5: >>101342438 >>101342510 >>101342672 >>101342692 >>101342767
--LLMs vs Doctors: Navigating the Healthcare Landscape and its Challenges: >>101341176 >>101341404 >>101341523 >>101342192 >>101341526 >>101341633 >>101341790 >>101341928 >>101342101 >>101342221
--Correction: A100 SXM2 32GB GPUs in Teslas are likely SXM4 models, not engineering samples: >>101338900
--Anole: Open, Autoregressive, Multimodal Model for Interleaved Image-Text Generation: >>101344297 >>101344370 >>101344404 >>101341577 >>101344424 >>101344461 >>101344499 >>101344558 >>101344361 >>101344767
--PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings: >>101343326 >>101343358 >>101343392 >>101343860 >>101344262 >>101344296 >>101344440
--Miku (free space): >>101338589 >>101339395 >>101340732 >>101342095 >>101342772 >>101344926 >>101345079

►Recent Highlight Posts from the Previous Thread: >>101337920

Anonymous
07/10/24(Wed)00:44:57 No.101345838

Anonymous 07/10/24(Wed)00:44:57 No.101345838

>use gemma-2-27b-it to simulate talking to my ex after 7 years
>she tells me to fuck off

Anonymous
07/10/24(Wed)00:45:17 No.101345842

Anonymous 07/10/24(Wed)00:45:17 No.101345842

Mikulove

Anonymous
07/10/24(Wed)00:48:21 No.101345866

Anonymous 07/10/24(Wed)00:48:21 No.101345866

>>101345838
Listen to your AI ex and move on, anon

Anonymous
07/10/24(Wed)00:52:03 No.101345893

Anonymous 07/10/24(Wed)00:52:03 No.101345893

Return to nous-hermes-13b

Anonymous
07/10/24(Wed)01:04:27 No.101345973

Anonymous 07/10/24(Wed)01:04:27 No.101345973

>>101345838
just gaslight the AI like you did her

Anonymous
07/10/24(Wed)01:04:56 No.101345977

Anonymous 07/10/24(Wed)01:04:56 No.101345977

>>101345838
based non-positivity-biased model

Anonymous
07/10/24(Wed)01:06:54 No.101345985

Anonymous 07/10/24(Wed)01:06:54 No.101345985

File: 1528433961994.png (401 KB, 559x638)

401 KB PNG

>>101345838

Anonymous
07/10/24(Wed)01:35:45 No.101346220

Anonymous 07/10/24(Wed)01:35:45 No.101346220

I don't really like gemma, but I must acknowledge it's one of the few models that doesn't horrendously fail the "kino and sovl" test (simply asking what it means for something to be kino and sovl)

Anonymous
07/10/24(Wed)01:38:19 No.101346239

Anonymous 07/10/24(Wed)01:38:19 No.101346239

File: b3f8i9eg4sad1.jpg (22 KB, 736x663)

22 KB JPG

Any opinions/links on the best context/instruct set for gemma 9b on sillytavern?

Anonymous
07/10/24(Wed)01:38:37 No.101346246

Anonymous 07/10/24(Wed)01:38:37 No.101346246

>>101346220
aka zoomer ebonics test

Anonymous
07/10/24(Wed)01:51:35 No.101346340

Anonymous 07/10/24(Wed)01:51:35 No.101346340

>>101346220
dayum bruh dat be bussin

Anonymous
07/10/24(Wed)01:58:02 No.101346383

Anonymous 07/10/24(Wed)01:58:02 No.101346383

Just watched a streamer run a d&d campaign for 3 AI characters.
RP quality was shit cause GPT, but the interactivity of it all was very fun.
He set up his own custom front-end with TTS and STT, gonna start making my own version when I wake up.
Good night /lmg/

Anonymous
07/10/24(Wed)02:05:03 No.101346441

Anonymous 07/10/24(Wed)02:05:03 No.101346441

>>101346383
Good night not-Miku

Anonymous
07/10/24(Wed)02:07:11 No.101346458

Anonymous 07/10/24(Wed)02:07:11 No.101346458

File: 1712711490064930.png (353 KB, 640x517)

353 KB PNG

write a somewhat complex scenario (Alice and Bob are long-lost relatives who are looking for each other while being romantically involved without realizing who they are)
8-16k tokens of slowburn
personalities and scenarios are developed differently each try
lots of hand-crafted text and the story is not allowed to degenerate into slop
when the big reveal comes, Alice is only capable of producing the exact same 3 canned reactions, almost word for word identical with previous tries

wizard 22x8 is smart enough to figure out the twist from a just few subtle hints, but it is incapable of producing anything but canned slop when push comes to shove without very persistent tard-wrangling

Anonymous
07/10/24(Wed)02:08:22 No.101346465

Anonymous 07/10/24(Wed)02:08:22 No.101346465

>>101346220
Why don't you like it? Like you said, its kino, its smart, and adjusts to writing styles well (just use a famous / semi famous author)

Anonymous
07/10/24(Wed)02:09:45 No.101346477

Anonymous 07/10/24(Wed)02:09:45 No.101346477

>>101346458
That's why i've switched to gemma since. Wizard is too plain / "goody" / robotic and commandr / miqu are too retarded to do non human anatomy right.

Anonymous
07/10/24(Wed)02:09:47 No.101346478

Anonymous 07/10/24(Wed)02:09:47 No.101346478

Did they remove lolis from Chub? A lot of stuff is gone but many NSFW things are still there. I'm talking about the legacy site. I can't tell if they're deleting things intentionally or just incompetent and the site doesn't work correctly.

Anonymous
07/10/24(Wed)02:11:50 No.101346493

Anonymous 07/10/24(Wed)02:11:50 No.101346493

File: file.png (16 KB, 800x600)

16 KB PNG

I wonder what Gemma looks like

Anonymous
07/10/24(Wed)02:15:49 No.101346525

Anonymous 07/10/24(Wed)02:15:49 No.101346525

How are people getting longer context with gemma in llama.cop? I tried -c 16000 but it just got extremely retarded.

Anonymous
07/10/24(Wed)02:21:21 No.101346580

Anonymous 07/10/24(Wed)02:21:21 No.101346580

>>101346493
“As an AI I don’t…”
I killed it there to save compute.

Anonymous
07/10/24(Wed)02:21:33 No.101346584

Anonymous 07/10/24(Wed)02:21:33 No.101346584

>>101346525
>How are people getting longer context
They aren't

Anonymous
07/10/24(Wed)02:24:45 No.101346607

Anonymous 07/10/24(Wed)02:24:45 No.101346607

>>101346584
Oh I thought it got fixed last week.

Anonymous
07/10/24(Wed)02:27:20 No.101346622

Anonymous 07/10/24(Wed)02:27:20 No.101346622

is gemma shit or just misunderstood?

Anonymous
07/10/24(Wed)02:28:59 No.101346637

Anonymous 07/10/24(Wed)02:28:59 No.101346637

>>101346622
There's one guy who shits on every new model just to troll. Literally just try it. People have posted settings / logs the past dozen or two threads, there's also some stuff on reddit.

Anonymous
07/10/24(Wed)02:30:30 No.101346655

Anonymous 07/10/24(Wed)02:30:30 No.101346655

https://github.com/catid/cuda_float_compress
>If your network link is faster than 10Gbps, then it may not be an improvement over just sending the file uncompressed since it compresses at about 12 Gbps. So, it's well-suited for most kinds of Internet transfers, but maybe less useful to send data between servers that are connected via 100G+ InfiniBand or some other supercomputer-class switched network. I'm personally planning to use this for distributed training on the Internet, so it's a better option for me than a faster CUDA-only approach that gets a worse compression ratio.
neat could be nice for federated training

Anonymous
07/10/24(Wed)03:10:19 No.101346915

Anonymous 07/10/24(Wed)03:10:19 No.101346915

File: Untitled.png (417 KB, 720x1298)

417 KB PNG

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
https://arxiv.org/abs/2407.07071
>When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.
https://github.com/voidism/Lookback-Lens
interesting if with this you could target hallucinations you don't want (made up history facts or locations) while keeping hallucinations you do want (model roleplay ability)

Anonymous
07/10/24(Wed)03:12:03 No.101346931

Anonymous 07/10/24(Wed)03:12:03 No.101346931

>>101346637
it still fucks up the formatting, sad.
why is it so hard for it to place * and " in right places?

Anonymous
07/10/24(Wed)03:13:05 No.101346941

Anonymous 07/10/24(Wed)03:13:05 No.101346941

also new lilian weng blogpost
https://lilianweng.github.io/posts/2024-07-07-hallucination
https://archive.is/NIm5r

Anonymous
07/10/24(Wed)03:13:46 No.101346948

Anonymous 07/10/24(Wed)03:13:46 No.101346948

File: 1694863601116390.jpg (27 KB, 828x646)

27 KB JPG

>>101345838
You're in love with your past, that person doesn't exist anymore

Anonymous
07/10/24(Wed)03:15:15 No.101346963

Anonymous 07/10/24(Wed)03:15:15 No.101346963

>>101346383
>custom front-end
Not worth it. You'll spend at least 2 months on that shit to get 1/10th of ST options.

Anonymous
07/10/24(Wed)03:27:00 No.101347050

Anonymous 07/10/24(Wed)03:27:00 No.101347050

>>101346963
ST options are lacking for seamless TTS / STT interactions

Anonymous
07/10/24(Wed)03:43:13 No.101347197

Anonymous 07/10/24(Wed)03:43:13 No.101347197

File: Retard Apu.jpg (31 KB, 680x546)

31 KB JPG

>OSError: [WinError -1073741795] Windows Error 0xc000001d

I'm retarded. Why does llama throw this error as soon as I try to gen? Running staging build of ST.

I don't have AVX2 on my CPU. Is that why?

Anonymous
07/10/24(Wed)03:45:14 No.101347214

Anonymous 07/10/24(Wed)03:45:14 No.101347214

>>101346493
Cute and retarded.

Anonymous
07/10/24(Wed)03:53:42 No.101347282

Anonymous 07/10/24(Wed)03:53:42 No.101347282

>>101347214
Who's more cuter and/or retarded
Stheno or Gemma?

Anonymous
07/10/24(Wed)03:59:27 No.101347330

Anonymous 07/10/24(Wed)03:59:27 No.101347330

>>101346477
>non human anatomy
off yourself you mentally ill coomer

Anonymous
07/10/24(Wed)04:01:39 No.101347349

Anonymous 07/10/24(Wed)04:01:39 No.101347349

>>101347330
you lost buddy?

Anonymous
07/10/24(Wed)04:03:56 No.101347362

Anonymous 07/10/24(Wed)04:03:56 No.101347362

>>101347349
i'm lost if i don't share your deranged fetishes? this isn't reddit, you're not free from criticism here

Anonymous
07/10/24(Wed)04:06:20 No.101347385

Anonymous 07/10/24(Wed)04:06:20 No.101347385

>>101347282
Buy an ad.

Anonymous
07/10/24(Wed)04:08:21 No.101347396

Anonymous 07/10/24(Wed)04:08:21 No.101347396

>>101347197
Yeah you probably need to recompile it with those extensions turned off.

Anonymous
07/10/24(Wed)04:22:32 No.101347510

Anonymous 07/10/24(Wed)04:22:32 No.101347510

File: the absolute PSTATE of P4(...).jpg (67 KB, 709x541)

67 KB JPG

Threadly reminder for P40 users to utilize the PState patch -
https://github.com/sasha0552/ToriLinux/blob/main/airootfs/home/tori/.local/share/tori/patches/0000-llamacpp-server-drop-pstate-in-idle.patch

Drops idle from 50 to 10W and improves temperature levels considerably.
Same automatic PState switching can be added to KoboldCpp as well by adding three or four lines to koboldcpp.py,

Anonymous
07/10/24(Wed)04:30:34 No.101347583

Anonymous 07/10/24(Wed)04:30:34 No.101347583

File: 123df.jpg (39 KB, 500x360)

39 KB JPG

>>101346239
pls resbond

Anonymous
07/10/24(Wed)04:34:54 No.101347623

Anonymous 07/10/24(Wed)04:34:54 No.101347623

>>101347583
gemma is junk, nobody uses it

Anonymous
07/10/24(Wed)04:35:45 No.101347629

Anonymous 07/10/24(Wed)04:35:45 No.101347629

>>101347583
there were some posted like 2 or 3 threads back, you should be able to find them pretty easily, just ctrl + f for catbox

Anonymous
07/10/24(Wed)04:38:37 No.101347656

Anonymous 07/10/24(Wed)04:38:37 No.101347656

https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md
>FSDP1 -> FSDP2
neat didn't know they were making this

Anonymous
07/10/24(Wed)04:39:01 No.101347660

Anonymous 07/10/24(Wed)04:39:01 No.101347660

New to this whole LLM thing. Seeing as I’m a vramlet, I just downloaded that Gemma 27b model and got it to run on ooba booga.This is some amazing stuff ngl. I might have to look at that silly tavern thingamajig you whippersnappers are using. Looking at past threads though, apparently this model breaks down when it goes past some token count? Truth to be told, I just tested it very briefly, like 609 tokens before I closed webui.

Anonymous
07/10/24(Wed)04:41:32 No.101347684

Anonymous 07/10/24(Wed)04:41:32 No.101347684

>>101347660
609 is nothing. For me it goes haywire after like 3-4k tokens.But I'm still tinkering with settings andd such

Anonymous
07/10/24(Wed)04:45:42 No.101347724

Anonymous 07/10/24(Wed)04:45:42 No.101347724

>>101347684

Can that be mitigated with being studious on updating lorebook and essentially resetting the chat?

Anonymous
07/10/24(Wed)04:46:08 No.101347730

Anonymous 07/10/24(Wed)04:46:08 No.101347730

>>101347510
How many t/s do you get on a 70B?

Anonymous
07/10/24(Wed)04:49:31 No.101347763

Anonymous 07/10/24(Wed)04:49:31 No.101347763

>ministrations
It never bothered me until someone pointed it out.

Anonymous
07/10/24(Wed)04:50:02 No.101347769

Anonymous 07/10/24(Wed)04:50:02 No.101347769

File: 1720600723921561.jpg (41 KB, 889x849)

41 KB JPG

Behold the future of AI.

Anonymous
07/10/24(Wed)04:51:30 No.101347788

Anonymous 07/10/24(Wed)04:51:30 No.101347788

>>101347724
not sure how you want to update lorebook with info such as:
>{{char}} entered {{user}}'s house. {{user}} offered her some snacks, but she politely refused.
IG best you can do is to ask model to summarise story, then start a fresh one and inject the short version into chat.

Gemma really likes flowery prose, so it takes like 20-30 messages to reach a point where you need to sum up and start over.

Anonymous
07/10/24(Wed)04:51:34 No.101347790

Anonymous 07/10/24(Wed)04:51:34 No.101347790

>>101347385
Shut the fuck up, Hiro, we aren't giving you a penny.

Anonymous
07/10/24(Wed)04:53:26 No.101347808

Anonymous 07/10/24(Wed)04:53:26 No.101347808

>The demon laughs, a horrible cackle that echoes across the mountainside. "別想欺騙我,你這個下等的生物!" it sneers
From Command-R. I was annoyed but the dialog makes sense which makes me kind of wonder how this shit works. Surely the model wasn't trained on a corpus with English narration and Chinese dialog. Is it just random chance that the Chinese dialog was appropriate?

Anonymous
07/10/24(Wed)04:58:01 No.101347853

Anonymous 07/10/24(Wed)04:58:01 No.101347853

>>101347629
Why are people posting webms instead of jsons?

Anonymous
07/10/24(Wed)05:01:38 No.101347887

Anonymous 07/10/24(Wed)05:01:38 No.101347887

>>101347730
NTA, but 7t/s, empty context at q6

Anonymous
07/10/24(Wed)05:02:38 No.101347902

Anonymous 07/10/24(Wed)05:02:38 No.101347902

>>101347887
Wait, can you really fir 70B at q6 into two P40? That doesn't seem right.

Anonymous
07/10/24(Wed)05:03:47 No.101347912

Anonymous 07/10/24(Wed)05:03:47 No.101347912

>>101347808
There is an option in Silly to see token probabilities. See how likely the moonrune was to appear after the double quote.

Anonymous
07/10/24(Wed)05:04:56 No.101347921

Anonymous 07/10/24(Wed)05:04:56 No.101347921

>>101347912
It's too late now

Anonymous
07/10/24(Wed)05:05:47 No.101347933

Anonymous 07/10/24(Wed)05:05:47 No.101347933

>>101347788

Yeah, is that possible with ST? Like, manually add entries of key moments that occurred in your RP. Then, when the bot starts losing its shit, reset chat, enter a message or two, like your summary idea, and still have it pull additional stuff from its lore book or whatever? Either way, it’s off to messing around and learning Silly Tavern for me.

Anonymous
07/10/24(Wed)05:05:53 No.101347934

Anonymous 07/10/24(Wed)05:05:53 No.101347934

>>101347921
No it's not, you can always put

>The demon laughs, a horrible cackle that echoes across the mountainside. "

into the same context and ask it to continue.

Anonymous
07/10/24(Wed)05:06:32 No.101347941

Anonymous 07/10/24(Wed)05:06:32 No.101347941

>>101347902
3xp40

Anonymous
07/10/24(Wed)05:10:03 No.101347965

Anonymous 07/10/24(Wed)05:10:03 No.101347965

>>101347941
Right.

And due to how offloading works, only one card works at a time and the others are waiting for that one, to finish calculating its layers, right? So there'd be no difference between say 2 or 4 cards if using same quants? Do you get coil whine because of constant switching between working/idling?

Anonymous
07/10/24(Wed)05:15:05 No.101348008

Anonymous 07/10/24(Wed)05:15:05 No.101348008

>>101347769
>hur hur tokenizer is blind to spelling
Everyone less retarded than you is already aware. At least have it rewrite sentences according to grammar production rules. That’s actually insightful about the limitations.

Anonymous
07/10/24(Wed)05:16:24 No.101348017

Anonymous 07/10/24(Wed)05:16:24 No.101348017

>>101347730
Haven't tested yet, still waiting for some parts to get third card installed. For the screenshot I loaded Gemma 27B 8_0 roped to 32k context on KoboldCpp which gets around 8t/s around 1K context.
That's without FA and any P40 specific optimizations though.

Anonymous
07/10/24(Wed)05:17:50 No.101348028

Anonymous 07/10/24(Wed)05:17:50 No.101348028

>>101347965
coil whine yeah, and we have to use rowsplit or else we suffer half the t/s so it's exactly as you have described.
q4 is still slightly faster iirc since it's still less work for them at the end, but anything above 5t/s is faster than my reading speed anyway.

Anonymous
07/10/24(Wed)05:18:29 No.101348033

Anonymous 07/10/24(Wed)05:18:29 No.101348033

>>101348008
>>101347769
I think id they include artificial entries about spelling into the dataset, the model will learn it. Not that it's really needed, of course...

Anonymous
07/10/24(Wed)05:20:02 No.101348049

Anonymous 07/10/24(Wed)05:20:02 No.101348049

>>101348028
Well, fuck, that's painful. I wonder if there can be an option to keep the card busy with useless calculations just to keep the coil whine at bay.

Anonymous
07/10/24(Wed)05:27:56 No.101348123

Anonymous 07/10/24(Wed)05:27:56 No.101348123

>>101348049
It doesn't add that much to my already shitty lousy 3x40mm fan setup. I'd gladly take the 10w idle.

Anonymous
07/10/24(Wed)05:35:19 No.101348209

Anonymous 07/10/24(Wed)05:35:19 No.101348209

>>101347510
Does it work for a 3090?

Anonymous
07/10/24(Wed)05:56:39 No.101348378

Anonymous 07/10/24(Wed)05:56:39 No.101348378

>>101348209
Don't think you need it, any more modern GPU should be able to handle pstate switching well enough on its own.

Anonymous
07/10/24(Wed)05:58:54 No.101348396

Anonymous 07/10/24(Wed)05:58:54 No.101348396

>>101347197
Sometimes you have to grab a llama.cpp release from several months back for Windows.

Anonymous
07/10/24(Wed)06:00:57 No.101348415

Anonymous 07/10/24(Wed)06:00:57 No.101348415

>>101347197
buy a normal up to date PC, retard

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/10/24(Wed)06:01:08 No.101348416

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/10/24(Wed)06:01:08 No.101348416

>>101347965
>And due to how offloading works, only one card works at a time and the others are waiting for that one, to finish calculating its layers, right? So there'd be no difference between say 2 or 4 cards if using same quants? Do you get coil whine because of constant switching between working/idling?

Depends on how you set --split-mode .
With --split-mode layer (default) it works like you described, with --split-mode row the matrix multiplications are parallelized across the cards.
But whether this is actually faster depends on how fast the GPUs are relative to the interconnect speed; for P40s it should be faster unless you're trying to split a very small model.

Anonymous
07/10/24(Wed)06:06:52 No.101348459

Anonymous 07/10/24(Wed)06:06:52 No.101348459

So I've noticed hiccups and slight delays as I start typing in my prompts. Should I be emptying the text history as I go? It's currently 76,000 words.

Anonymous
07/10/24(Wed)06:09:45 No.101348481

Anonymous 07/10/24(Wed)06:09:45 No.101348481

>>101348459
That shouldn't be happening unless you're editing the card's first greeting and it's updating in real time ie it's a new chat.

Anonymous
07/10/24(Wed)06:10:12 No.101348485

Anonymous 07/10/24(Wed)06:10:12 No.101348485

>>101348416
>with --split-mode row the matrix multiplications are parallelized across the cards.
Oh wow, that's cool. Do memory bandwidth problems come from having to deliver intermediate tensors entirely to all videocards before each attention layer?

Anonymous
07/10/24(Wed)06:12:32 No.101348504

Anonymous 07/10/24(Wed)06:12:32 No.101348504

File: 3219043203.gif (929 KB, 480x358)

929 KB GIF

>>101347197
AVX2 is 13 years ago

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/10/24(Wed)06:13:26 No.101348515

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/10/24(Wed)06:13:26 No.101348515

>>101348485
The current architecture is that there is a single --main-gpu that for each matrix multiplication distributes the activations to all other GPUs and then collects the results afterwards.
Honestly the bandwidth problems to a large degree just come from poor optimization.

Anonymous
07/10/24(Wed)06:13:28 No.101348517

Anonymous 07/10/24(Wed)06:13:28 No.101348517

>>101348481
>>101348459
My understanding is it should: he's way over context limit, and silly removes old bits of text from chat history right after system prompt, to fit new text, so the change happens at, say first 10% of the context, which changes 90% of the remaining context and calculations for it can't be used from the cache. And delays are the prompt processing for that 90% of the context.

Anonymous
07/10/24(Wed)06:13:44 No.101348521

Anonymous 07/10/24(Wed)06:13:44 No.101348521

>>101348481
Hmm, I wonder what it could be then. According to the last request served info it's not actual processing time that's increasing, but it takes like a whole second to register me sending the prompt and when I start typing the prompt or edit the existing text it takes an equal amount of time to start visually showing me typing. It's accepting input during that downtime, because if I just keep typing through the delay it fills everything in when it catches up.

Anonymous
07/10/24(Wed)06:14:57 No.101348530

Anonymous 07/10/24(Wed)06:14:57 No.101348530

>>101348517
That absolutely should not affect actually typing the text, though.
>>101348521
Ram usage? Your browser might be cooked.

Anonymous
07/10/24(Wed)06:16:14 No.101348544

Anonymous 07/10/24(Wed)06:16:14 No.101348544

>>101348530
I have degen tab management habits (259 open tabs), that would make sense. Thanks.

Anonymous
07/10/24(Wed)06:17:30 No.101348557

Anonymous 07/10/24(Wed)06:17:30 No.101348557

>>101348530
Oh, I have misunderstood his problem. So it's freezes when you're typing? I had a similar thing when editing card description, but it's more or less gone now with newer versions of ooba and silly.

Anonymous
07/10/24(Wed)06:42:19 No.101348725

Anonymous 07/10/24(Wed)06:42:19 No.101348725

I've been using mixtral-8x7b-v0.1.Q4_K_M which is 24GB and it runs really well on my system.
Tried gemma-2-27b-it-Q3_K_M which is 12GB and it ran painfully slow, so I tried gemma-2-9b-it.Q8_0 which is 9GB and its more usable but still slower than mixtral.
What gives?
What can I read to learn how this shit works?

Anonymous
07/10/24(Wed)06:49:19 No.101348774

Anonymous 07/10/24(Wed)06:49:19 No.101348774

File: 11__00258_.png (1.52 MB, 1024x1024)

1.52 MB PNG

>>101348725
Mixtral is an MoE model.
It's not using all of the parameters at once like a 27b.
It's the same reason why a 8x22b tends to be faster than 70b - same concept scaled up.

Anonymous
07/10/24(Wed)06:50:41 No.101348783

Anonymous 07/10/24(Wed)06:50:41 No.101348783

>>101348774
Doesn't explain why one 9b is slower than two 7b experts for him. Although maybe the quant? 8 vs 4?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/10/24(Wed)06:58:14 No.101348853

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/10/24(Wed)06:58:14 No.101348853

>>101348725
>>101348783
The generation speed is roughly proportional to the number of active weights times the bits per weight.
Mixtral is faster than Gemma 27b because it has fewer active weights.
Mixtral is faster than Gemma 9b despite having more active weights because it has fewer bits per weight.

Anonymous
07/10/24(Wed)07:00:56 No.101348882

Anonymous 07/10/24(Wed)07:00:56 No.101348882

>>101347510
Isn't pstate = 16 more power efficient than 8?

Anonymous
07/10/24(Wed)07:02:05 No.101348890

Anonymous 07/10/24(Wed)07:02:05 No.101348890

File: 1698451366825957.png (116 KB, 1139x1163)

116 KB PNG

>>101344658
MMAP is bugged and just doubles any model in ram so I keep it off.

Anonymous
07/10/24(Wed)07:06:50 No.101348920

Anonymous 07/10/24(Wed)07:06:50 No.101348920

>put instruction after last message
>model ignores part of the message
>put instnruction before last message
>model ignores part of the instruction
i'm tired

Anonymous
07/10/24(Wed)07:08:00 No.101348932

Anonymous 07/10/24(Wed)07:08:00 No.101348932

>>101348415
I am literally waiting for new Ryzens.

Anonymous
07/10/24(Wed)07:12:12 No.101348965

Anonymous 07/10/24(Wed)07:12:12 No.101348965

Vntl Leaderboard anon
Can you test this
https://huggingface.co/LLaMAX
https://github.com/CONE-MT/LLaMAX/
https://huggingface.co/papers/2407.05975

Anonymous
07/10/24(Wed)07:16:47 No.101348996

Anonymous 07/10/24(Wed)07:16:47 No.101348996

>llama server randomly breaks and doesn't respond to requests since a couple days ago, give up trying to fix it and go do something else
>come back today and finally figure out someone just went through and renamed all the build flags so everything I was using to compile was getting ignored

why

Anonymous
07/10/24(Wed)07:23:59 No.101349055

Anonymous 07/10/24(Wed)07:23:59 No.101349055

>>101348996
'why' has been deprecated and will be removed in a future version.

Anonymous
07/10/24(Wed)07:31:10 No.101349113

Anonymous 07/10/24(Wed)07:31:10 No.101349113

>>101348890
yeah, people should disable mmap, mmap wasting memory, especially for big models

Anonymous
07/10/24(Wed)07:31:57 No.101349121

Anonymous 07/10/24(Wed)07:31:57 No.101349121

>>101349113
*on windows

Anonymous
07/10/24(Wed)07:33:19 No.101349131

Anonymous 07/10/24(Wed)07:33:19 No.101349131

>>101348996
https://github.com/ggerganov/llama.cpp/pull/8006

Anonymous
07/10/24(Wed)07:34:41 No.101349142

Anonymous 07/10/24(Wed)07:34:41 No.101349142

>>101348882
>https://pypi.org/project/nvidia-pstate/
16 is "high"/"let driver decide" and 8 is "low" (power).

Anonymous
07/10/24(Wed)07:35:41 No.101349152

Anonymous 07/10/24(Wed)07:35:41 No.101349152

File: gemma-2 format, sequences(...).png (250 KB, 1446x1742)

250 KB PNG

>>101347583
This seems to work alright. The story string comes from the virt-io stuff that Lewdiculous suggests.
Note that you'll need to insert
<bos>
at the start of the story string, if you aren't using llama.cpp, ollama etc..

Anonymous
07/10/24(Wed)07:41:47 No.101349188

Anonymous 07/10/24(Wed)07:41:47 No.101349188

>>101349152
Link to the virt io stuff? I would def if I could incorporate it into other models like miqu

Anonymous
07/10/24(Wed)07:42:37 No.101349197

Anonymous 07/10/24(Wed)07:42:37 No.101349197

File: IMG_2612.jpg (685 KB, 1284x1924)

685 KB JPG

It looks like used 3090s have gone under $600 now, tempted to get a second one, maybe when they hit $500

Anonymous
07/10/24(Wed)07:45:48 No.101349222

Anonymous 07/10/24(Wed)07:45:48 No.101349222

>>101349197
Thanks just bought all those listings
jj tho

Anonymous
07/10/24(Wed)08:12:59 No.101349470

Anonymous 07/10/24(Wed)08:12:59 No.101349470

>>101348920
Temp 0, rewrite until you get what you want.
Use the common techniques to make the model "pay attention" to your instructions, stuff like turning your instructions into a list of tasks.

Anonymous
07/10/24(Wed)08:15:08 No.101349489

Anonymous 07/10/24(Wed)08:15:08 No.101349489

>>101348774
>MoE
Huh that's an interesting concept. Doubt I'll ever really understand what all that math means.

>>101348783
Tested the Q4 version of the 9b model and it's much better.

>>101348853
I'll have to keep that in mind.

Anonymous
07/10/24(Wed)08:20:51 No.101349540

Anonymous 07/10/24(Wed)08:20:51 No.101349540

>>101349489
Non-MoE transformers layer (repeated a lot of times):

A. Enrich each token with information about other tokens using attention neural network layer
B. Process each token independently from others using feed forward neural network layer

MoE transformers layer (repeated a lot of times):

A. Enrich each token with information about other tokens using attention neural network layer
B. Choose two out of 8 available feed forward neural network layers and process each token independently from others using those, ignoring 6 others

Anonymous
07/10/24(Wed)08:21:38 No.101349547

Anonymous 07/10/24(Wed)08:21:38 No.101349547

>>101348920
You're probably not differentiating enough the user's message from the instructions. If you're using Gemma-2 try something like this. It's not the "approved" format but it works. Test instructions one by one to make sure they have the correct effect.

<start_of_turn>user
Last user message here.<end_of_turn>
# Instructions for your next response
- Inst 1
- Inst 2
- Inst n
<start_of_turn>model
...

Anonymous
07/10/24(Wed)08:26:16 No.101349596

Anonymous 07/10/24(Wed)08:26:16 No.101349596

>>101349540
This raises questions for me, but I'll save them. I'll go read some more docs.
All this thinking is headache inducing.

Anonymous
07/10/24(Wed)08:29:52 No.101349627

Anonymous 07/10/24(Wed)08:29:52 No.101349627

>>101349197
Wait for the flood of 32GB V100s brother

Anonymous
07/10/24(Wed)08:30:57 No.101349636

Anonymous 07/10/24(Wed)08:30:57 No.101349636

>>101349596
https://archive.is/8r7t9
Good explainer from hf

Anonymous
07/10/24(Wed)08:37:34 No.101349698

Anonymous 07/10/24(Wed)08:37:34 No.101349698

Anyone happen to have HF files for gemma 27b-it?

Anonymous
07/10/24(Wed)08:40:10 No.101349727

Anonymous 07/10/24(Wed)08:40:10 No.101349727

>>101349547
>>101349470
thanks, i'll try the list thing. I already use custom "headers" inside messages like [instruction]\n but the rest of the message does look similar to normal messages, especially since everything is in first person.

Anonymous
07/10/24(Wed)08:41:31 No.101349740

Anonymous 07/10/24(Wed)08:41:31 No.101349740

File: prompt.png (59 KB, 1464x371)

59 KB PNG

>>101346493

Anonymous
07/10/24(Wed)08:42:34 No.101349748

Anonymous 07/10/24(Wed)08:42:34 No.101349748

File: Gemma.png (1.69 MB, 1024x1024)

1.69 MB PNG

>>101349740

Anonymous
07/10/24(Wed)08:42:39 No.101349752

Anonymous 07/10/24(Wed)08:42:39 No.101349752

>>101349698
https://huggingface.co/unsloth/gemma-2-27b-it

Anonymous
07/10/24(Wed)08:43:56 No.101349762

Anonymous 07/10/24(Wed)08:43:56 No.101349762

>>101349752
>repo also has the files included
I'm fucking blind. Thanks, anon.

Anonymous
07/10/24(Wed)08:44:25 No.101349769

Anonymous 07/10/24(Wed)08:44:25 No.101349769

Is gemma 27B worth using now or is it still kinda broken?

Anonymous
07/10/24(Wed)08:47:08 No.101349794

Anonymous 07/10/24(Wed)08:47:08 No.101349794

>>101349769
LLMs are a meme in general, so no.

Anonymous
07/10/24(Wed)08:48:10 No.101349798

Anonymous 07/10/24(Wed)08:48:10 No.101349798

>>101349636
Ah good this is the one I was looking at.

Anonymous
07/10/24(Wed)08:49:16 No.101349810

Anonymous 07/10/24(Wed)08:49:16 No.101349810

Hey /lmg/. What about [insert current subject that has been talked to death]? Are there any updates? I can't be fucking bothered to scroll up, let alone check previous threads in the op.

Anonymous
07/10/24(Wed)08:49:50 No.101349816

Anonymous 07/10/24(Wed)08:49:50 No.101349816

>>101349810
2 weeks

Anonymous
07/10/24(Wed)08:51:40 No.101349832

Anonymous 07/10/24(Wed)08:51:40 No.101349832

What's the verdict on gemma2? I'm still using llama2 btw

Anonymous
07/10/24(Wed)08:51:53 No.101349836

Anonymous 07/10/24(Wed)08:51:53 No.101349836

>>101349627
two more weeks

Anonymous
07/10/24(Wed)08:52:37 No.101349845

Anonymous 07/10/24(Wed)08:52:37 No.101349845

>>101349727
One thing that I found works really well for character cards and general, non-character specific instructions, is to not try to address the model behind the character.
You'll often see things like
>You are {{char}} with this and that characteristics
in the character card and
>You will write {{char}}'s next message in such and so way
Try rewriting those as definitions instead of direct instructions to an abstract narrator, like
>{{char}} is so and so and has this and that characteristics
in the character card and
>Write {{char}}'s next message in such and so way
or
>{{char}}'s next message will be such and such
That kind of thing.
Prompting is not a meme as it turns out. You can get even dumb models to focus on do some really impressive things, although the gene lack of "intelligence" is just a thing one has to contend with, although it might not matter for most ERP.

Anonymous
07/10/24(Wed)08:52:43 No.101349846

Anonymous 07/10/24(Wed)08:52:43 No.101349846

>>101349794
Why are you zoomers like this?

Anonymous
07/10/24(Wed)08:55:04 No.101349878

Anonymous 07/10/24(Wed)08:55:04 No.101349878

>>101349836
4 months is the window I've heard about (at least for microsoft's). V100s at this point aren't worth their place in the datacenters since they're actually capacity and not gpu constrained. Doubt you care though since you're using some stale ass meme that only zoomers still latch onto

Anonymous
07/10/24(Wed)08:55:20 No.101349881

Anonymous 07/10/24(Wed)08:55:20 No.101349881

>>101349846
Boomers regurgitate whatever talking points newsman says, zoomers regurgitate whatever talking points their favorite hugbox youtuber says. Millennials regurgitate whatever talking points their favorite DC faggot league super says. Nothing has changed, really.

Anonymous
07/10/24(Wed)08:56:21 No.101349888

Anonymous 07/10/24(Wed)08:56:21 No.101349888

>>101349878
You're a prime example of why I generally stop talking to people as soon as I find out they are circumcised.

Anonymous
07/10/24(Wed)09:03:13 No.101349965

Anonymous 07/10/24(Wed)09:03:13 No.101349965

gemma sucks balls, back to midnight miqu

Anonymous
07/10/24(Wed)09:07:58 No.101350024

Anonymous 07/10/24(Wed)09:07:58 No.101350024

Got em

Anonymous
07/10/24(Wed)09:16:03 No.101350112

Anonymous 07/10/24(Wed)09:16:03 No.101350112

>>101349888
What did he do wrong?

Anonymous
07/10/24(Wed)09:17:32 No.101350141

Anonymous 07/10/24(Wed)09:17:32 No.101350141

>>101349888
what do you have against americans?

Anonymous
07/10/24(Wed)09:25:01 No.101350228

Anonymous 07/10/24(Wed)09:25:01 No.101350228

>>101349965
blabbling miqu

Anonymous
07/10/24(Wed)09:28:19 No.101350258

Anonymous 07/10/24(Wed)09:28:19 No.101350258

>>101349888
>circumcised
Are gentiles still mutilating their sons?

Anonymous
07/10/24(Wed)09:30:33 No.101350283

Anonymous 07/10/24(Wed)09:30:33 No.101350283

ST gemma's templates? anyone?

Anonymous
07/10/24(Wed)09:31:26 No.101350296

Anonymous 07/10/24(Wed)09:31:26 No.101350296

>>101350141
Americans are evil imperialists

Anonymous
07/10/24(Wed)09:32:33 No.101350308

Anonymous 07/10/24(Wed)09:32:33 No.101350308

why is it that when i connect sillytavern to oobabooga the streaming does not work.

OS: Endeavour OS
What I did:
>install SillyTavern
>Install Oobabooga
>run oobabooga with --api
>download Qwen/Qwen2-0.5B
>load model
>go to sillytavern, select Text Completion, put in the url, do not click Legacy API
>it works, and shows "qwen2 0.5b"
>select the default card
>type "test"
>nothing happens for 10 secs
>response comes all at once

also tried:
loading different model, loading in gguf

What i did seems reasonable, and it should work, but it don't.

Anonymous
07/10/24(Wed)09:35:13 No.101350345

Anonymous 07/10/24(Wed)09:35:13 No.101350345

>glm-4
>constantly fucks up basic shit
>constantly becomes retarded and spams "!!!!!!!!!!"
I dunno if llama.cpp is broken or if this model is just garbage.

Anonymous
07/10/24(Wed)09:36:04 No.101350358

Anonymous 07/10/24(Wed)09:36:04 No.101350358

>>101350308
It should just work if you do that. Nothing else needed. Do you get streaming inside ooba UI?

Anonymous
07/10/24(Wed)09:37:21 No.101350372

Anonymous 07/10/24(Wed)09:37:21 No.101350372

>>101350345
Yeah, it behaves really, really weirdly.
I'll give it another try today, maybe I'm fucking something up in the context or instruct templates, but it might just be that the model is that bad.

Anonymous
07/10/24(Wed)09:38:32 No.101350386

Anonymous 07/10/24(Wed)09:38:32 No.101350386

>>101350358
yep, streaming inside ooba works fine.

Anonymous
07/10/24(Wed)09:39:11 No.101350393

Anonymous 07/10/24(Wed)09:39:11 No.101350393

>>101350386
Try koboldcpp, if nothing else, to isolate whether if's a backend or frontend problem.

Anonymous
07/10/24(Wed)09:43:01 No.101350447

Anonymous 07/10/24(Wed)09:43:01 No.101350447

>>101350345
buy an ad

Anonymous
07/10/24(Wed)09:51:52 No.101350547

Anonymous 07/10/24(Wed)09:51:52 No.101350547

>>101349888
>zoomer immediately starts thinking about my cock
kek what a generation

Anonymous
07/10/24(Wed)10:02:26 No.101350676

Anonymous 07/10/24(Wed)10:02:26 No.101350676

All new models are shit. We must return.

Anonymous
07/10/24(Wed)10:05:17 No.101350712

Anonymous 07/10/24(Wed)10:05:17 No.101350712

>>101350296
>America, send us financial aid!
>America, send us medicines that your own people can't afford to get!
>America, fight our oppressors!
>America, let us invade your nation!
>America, let us rely on your currency in the world marketplace!

>America, stop touching us with your way of doing things!

Anonymous
07/10/24(Wed)10:08:56 No.101350747

Anonymous 07/10/24(Wed)10:08:56 No.101350747

>>101350308
Did you actually check the box for streaming in sillytavern?

Anonymous
07/10/24(Wed)10:12:34 No.101350793

Anonymous 07/10/24(Wed)10:12:34 No.101350793

Dry sampler in Llama.cpp when?

Anonymous
07/10/24(Wed)10:17:43 No.101350857

Anonymous 07/10/24(Wed)10:17:43 No.101350857

File: file.png (51 KB, 1007x410)

51 KB PNG

>>101348965
Not good, the model is quite retarded.

Anonymous
07/10/24(Wed)10:19:53 No.101350877

Anonymous 07/10/24(Wed)10:19:53 No.101350877

>>101350676
You, not we. Just run older models then if they are better, nothing is stopping you.

Anonymous
07/10/24(Wed)10:22:25 No.101350905

Anonymous 07/10/24(Wed)10:22:25 No.101350905

>>101350676
return to what

Anonymous
07/10/24(Wed)10:24:47 No.101350929

Anonymous 07/10/24(Wed)10:24:47 No.101350929

>>101350905
GPT-J obviously.

Anonymous
07/10/24(Wed)10:25:50 No.101350948

Anonymous 07/10/24(Wed)10:25:50 No.101350948

>>101346458
Yep that's pretty much the universal observation of WLM. Very smart, but slopped to the brim

Anonymous
07/10/24(Wed)10:27:05 No.101350961

Anonymous 07/10/24(Wed)10:27:05 No.101350961

>>101347362
>cat ears and tails are deranged fetishes

Anonymous
07/10/24(Wed)10:28:11 No.101350977

Anonymous 07/10/24(Wed)10:28:11 No.101350977

>>101349748
gemmy-chan...

Anonymous
07/10/24(Wed)10:29:11 No.101350988

Anonymous 07/10/24(Wed)10:29:11 No.101350988

>>101349627
>Wait for the flood of 32GB V100s brother
Why so hung up on V100? Yes it has a decent tensor core count and 32GB, but it's nowhere near a 3090, and it if has an issue with something, it's going to be at the back of the line for fixes since it's such a corner case.
Also delusional ebay sellers are just going to continue to be delusional.

Anonymous
07/10/24(Wed)10:30:02 No.101350994

Anonymous 07/10/24(Wed)10:30:02 No.101350994

>>101347330
>where_do_you_think_you_are.jpg

Anonymous
07/10/24(Wed)10:30:11 No.101350996

Anonymous 07/10/24(Wed)10:30:11 No.101350996

>>101350988
Hbm2

Anonymous
07/10/24(Wed)10:31:17 No.101351007

Anonymous 07/10/24(Wed)10:31:17 No.101351007

>>101350857
>To address this, we dedicate 35,000 A100-SXM4-80GB GPU hours in conducting extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages
Rip

Anonymous
07/10/24(Wed)10:33:47 No.101351035

Anonymous 07/10/24(Wed)10:33:47 No.101351035

>>101350961
nta, those are shit taste indicators.

Anonymous
07/10/24(Wed)10:36:06 No.101351064

Anonymous 07/10/24(Wed)10:36:06 No.101351064

File: ThisFuckingGuy.png (64 KB, 1287x591)

64 KB PNG

>>101348965

Anonymous
07/10/24(Wed)10:39:34 No.101351092

Anonymous 07/10/24(Wed)10:39:34 No.101351092

>>101350676
https://huggingface.co/EleutherAI/gpt-j-6b

Anonymous
07/10/24(Wed)10:39:53 No.101351098

Anonymous 07/10/24(Wed)10:39:53 No.101351098

File: 1716470176720287.png (1.9 MB, 1024x1536)

1.9 MB PNG

>>101346493
lel

Anonymous
07/10/24(Wed)10:44:32 No.101351143

Anonymous 07/10/24(Wed)10:44:32 No.101351143

File: VOOOOOOOTE.png (21 KB, 1505x190)

21 KB PNG

voting matters
/pol/ btfo

Anonymous
07/10/24(Wed)10:45:25 No.101351156

Anonymous 07/10/24(Wed)10:45:25 No.101351156

>>101351143
rent free

Anonymous
07/10/24(Wed)10:46:26 No.101351168

Anonymous 07/10/24(Wed)10:46:26 No.101351168

File: Designer (1).jpg (238 KB, 1024x1024)

238 KB JPG

>>101350996
P100 is HBM2, it's not magic, it doesn't necessarily give it way more bandwidth over GDDR6. It really only helps for training. Are you training models?

Anonymous
07/10/24(Wed)10:46:27 No.101351169

Anonymous 07/10/24(Wed)10:46:27 No.101351169

>>101351143
Based.

Anonymous
07/10/24(Wed)10:48:42 No.101351192

Anonymous 07/10/24(Wed)10:48:42 No.101351192

>>101349188
https://huggingface.co/collections/Lewdiculous/useful-65e6a91d5fbfe6b32586d265
lead me to
https://huggingface.co/Virt-io/SillyTavern-Presets

Anonymous
07/10/24(Wed)10:50:34 No.101351217

Anonymous 07/10/24(Wed)10:50:34 No.101351217

https://www.techpowerup.com/324319/amd-to-acquire-silo-ai-to-expand-enterprise-ai-solutions-globally
Anyone hear of them?
>Silo AI team consists of world-class AI scientists and engineers with extensive experience developing tailored AI models, platforms and solutions for leading enterprises spanning cloud, embedded and endpoint computing markets.

Anonymous
07/10/24(Wed)10:53:02 No.101351249

Anonymous 07/10/24(Wed)10:53:02 No.101351249

>>101351168
4xv100 32GB sxm will be the play in 4-6 months. Believe it! Local audiogen will breakthrough.

Anonymous
07/10/24(Wed)10:55:04 No.101351266

Anonymous 07/10/24(Wed)10:55:04 No.101351266

>>101351249
>Local audiogen will breakthrough.
already has
>rentry.org/stableaudio

Anonymous
07/10/24(Wed)10:59:02 No.101351304

Anonymous 07/10/24(Wed)10:59:02 No.101351304

>>101351266
Go fuck another goat, petra the algerian

Anonymous
07/10/24(Wed)11:00:07 No.101351308

Anonymous 07/10/24(Wed)11:00:07 No.101351308

File: belief.png (592 KB, 747x800)

592 KB PNG

>>101351249
>Believe it!

Anonymous
07/10/24(Wed)11:09:04 No.101351400

Anonymous 07/10/24(Wed)11:09:04 No.101351400

File: file.png (153 KB, 773x987)

153 KB PNG

>>101351308
mental illness

Anonymous
07/10/24(Wed)11:29:07 No.101351612

Anonymous 07/10/24(Wed)11:29:07 No.101351612

>>101351217
lol. xAI probably has better people than this literally who company.

Anonymous
07/10/24(Wed)11:33:34 No.101351672

Anonymous 07/10/24(Wed)11:33:34 No.101351672

>>101349627
>>101351249
Delusional.

Anonymous
07/10/24(Wed)11:46:09 No.101351812

Anonymous 07/10/24(Wed)11:46:09 No.101351812

Why is everyone in this field talking about "infinite context soon!" when we can barely achieve 64k of coherent context in sota corpo models with hundreds of vram?

Anonymous
07/10/24(Wed)11:53:31 No.101351887

Anonymous 07/10/24(Wed)11:53:31 No.101351887

>>101351812
Maybe if we rotate the rotation or shove ten embeddings into each context slot or....

Anonymous
07/10/24(Wed)11:55:30 No.101351917

Anonymous 07/10/24(Wed)11:55:30 No.101351917

>>101346948
Romance is a constant struggle of both partners trying to deceive each other into thinking they are more attractive than they actually are. It is the natural state in animal kingdom. Therefore "that person doesn't exist" is a natural state of romance.

Anonymous
07/10/24(Wed)11:55:57 No.101351922

Anonymous 07/10/24(Wed)11:55:57 No.101351922

File: Phi-ATMa-nala.png (95 KB, 928x340)

95 KB PNG

Interesting result.
Spatial awareness leaves something to be desired and it's a bit schizo at times even on t=0.81
One more epoch of training and we'll see the final result.

Anonymous
07/10/24(Wed)12:00:00 No.101351970

Anonymous 07/10/24(Wed)12:00:00 No.101351970

>>101351812
It's Pascal's Mugging.
If you promise VCs a 2x return on their investment they will only do it if you can convince them that you have at least a 50% chance of actually being able to do it.
But if you promise them a 10000x return on investment they only need to think that you have a 0.01% chance of being able to do it.

Anonymous
07/10/24(Wed)12:12:34 No.101352122

Anonymous 07/10/24(Wed)12:12:34 No.101352122

is it possible to uncuck gemma-2 27b somehow

Anonymous
07/10/24(Wed)12:14:10 No.101352142

Anonymous 07/10/24(Wed)12:14:10 No.101352142

>>101352122
yes but performance falls down

Anonymous
07/10/24(Wed)12:59:54 No.101352772

Anonymous 07/10/24(Wed)12:59:54 No.101352772

File: Screenshot from 2024-07-1(...).png (11 KB, 330x119)

11 KB PNG

>>101350393
tried it, and the streaming still doesn't work. Seems to be a frontend issue.

>>101350747
I looked, only thing i found was this.
"Smooth Streaming" suggests that it just splits up the tokens into letters, and dispenses them one by one.
I tried it anyways, and it didn't fix the issue. Are you referring to a different "box for streaming"?

Anonymous
07/10/24(Wed)13:02:22 No.101352800

Anonymous 07/10/24(Wed)13:02:22 No.101352800

>>101352772
There's also Streaming FPS. Mine is 30.

Also make sure stream: true is in the console.

Anonymous
07/10/24(Wed)13:02:25 No.101352801

Anonymous 07/10/24(Wed)13:02:25 No.101352801

File: Untitled.jpg (27 KB, 330x267)

27 KB JPG

>>101352772
nta but for me the streaming button is near the token options at the top using llamacpp

Anonymous
07/10/24(Wed)13:12:40 No.101352969

Anonymous 07/10/24(Wed)13:12:40 No.101352969

>tfw change to a faster MoE model that's now 2 t/s
I don't need more.
>use it more, feeling the limits of 2 t/s
I don't need more.
...
I NEED MORE AHHHHHHHHHHHHHHHHHHH

Anonymous
07/10/24(Wed)13:17:22 No.101353028

Anonymous 07/10/24(Wed)13:17:22 No.101353028

>>101352801
that fixed it. :)

But it raises the question: why isn't this option enabled by default?

Anonymous
07/10/24(Wed)13:28:19 No.101353160

Anonymous 07/10/24(Wed)13:28:19 No.101353160

>>101352969
Meanwhile,
>Total:238.05s (1.50T/s)
Nice, this model is really cookin'!

Anonymous
07/10/24(Wed)13:31:13 No.101353197

Anonymous 07/10/24(Wed)13:31:13 No.101353197

>The powerful LI-DiT-10B will be available through the online platform and API after further optimization and security checks.
its over

Anonymous
07/10/24(Wed)13:32:07 No.101353207

Anonymous 07/10/24(Wed)13:32:07 No.101353207

>>101353197
>ledit
Of course.

Anonymous
07/10/24(Wed)13:56:20 No.101353563

Anonymous 07/10/24(Wed)13:56:20 No.101353563

>>101353197
>LI-DiT-10B
Whats this? Chinese diffusion model?

Anonymous
07/10/24(Wed)14:04:51 No.101353696

Anonymous 07/10/24(Wed)14:04:51 No.101353696

>>101352122
use any context at all / use a tiny but of a prefill.

Dont feel like reposting shit so just go back a few threads for one of many examples.

Anonymous
07/10/24(Wed)14:15:38 No.101353834

Anonymous 07/10/24(Wed)14:15:38 No.101353834

>>101345759
>(07/10) Anole, based on Chameleon, for interleaved image-text generation
Did anyone try this?

Anonymous
07/10/24(Wed)14:19:25 No.101353886

Anonymous 07/10/24(Wed)14:19:25 No.101353886

>>101353834
be the first one

Anonymous
07/10/24(Wed)14:21:15 No.101353910

Anonymous 07/10/24(Wed)14:21:15 No.101353910

>>101353834
None of the backends normal people use supports it, so no.

Anonymous
07/10/24(Wed)14:28:07 No.101354012

Anonymous 07/10/24(Wed)14:28:07 No.101354012

DeepSeekV2 Quality x Price is unbeatable, don't miss on it anons.

Anonymous
07/10/24(Wed)14:30:42 No.101354050

Anonymous 07/10/24(Wed)14:30:42 No.101354050

>>101354012
If only I could local the new Coder, but it's too thicc.

Anonymous
07/10/24(Wed)14:56:58 No.101354422

Anonymous 07/10/24(Wed)14:56:58 No.101354422

>>101353834
I'll wait for hentai finetunes

Anonymous
07/10/24(Wed)15:07:28 No.101354571

Anonymous 07/10/24(Wed)15:07:28 No.101354571

It's so hard, Anons.
Gemma 2 is nice but too stupid.
LLaMa 3 is smart but can't write for shit.

localfags regressing in context, stuck in 8K token hell. Proxyfags and Corpo shills eating so good with 200K tokens. But don't you dare to be deviant on a paid API...

Anonymous
07/10/24(Wed)15:08:44 No.101354586

Anonymous 07/10/24(Wed)15:08:44 No.101354586

>>101354571
the shortcomings of anything would always be apparent, don't delude yourself into thinking you could ever be content

Anonymous
07/10/24(Wed)15:10:25 No.101354614

Anonymous 07/10/24(Wed)15:10:25 No.101354614

Is there *any* way to make gemma2's context longer?

Anonymous
07/10/24(Wed)15:12:03 No.101354633

Anonymous 07/10/24(Wed)15:12:03 No.101354633

>>101354571
>eating so good with 200K tokens
according to even aicg, claude massively degrades after 16-24k tokens...

Anonymous
07/10/24(Wed)15:17:52 No.101354716

Anonymous 07/10/24(Wed)15:17:52 No.101354716

>>101354614
It works at 16K just roped. I don't notice any loss in performance.

Anonymous
07/10/24(Wed)15:19:52 No.101354741

Anonymous 07/10/24(Wed)15:19:52 No.101354741

>>101354716
How do you do this with llama.cpp? I tried yarn and it couldn't even write sentences.

Anonymous
07/10/24(Wed)15:26:10 No.101354823

Anonymous 07/10/24(Wed)15:26:10 No.101354823

>>101353834
>(07/10) Anole, based on Chameleon, for interleaved image-text generation

Setting this up now, I'll post some gens here. I'll also try to run prompts anyone posts because I'm not very creative.

Anonymous
07/10/24(Wed)15:26:24 No.101354826

Anonymous 07/10/24(Wed)15:26:24 No.101354826

>>101354741
dont use yam, it makes it retarded

Anonymous
07/10/24(Wed)15:31:41 No.101354898

Anonymous 07/10/24(Wed)15:31:41 No.101354898

Has anybody tested flipping the headers around when interacting with some of these "censored" models? Basically you have the model complete the user's message and you write the assistant's message. In principle, only the assistant's responses are filtered, right?

Anonymous
07/10/24(Wed)15:36:05 No.101354965

Anonymous 07/10/24(Wed)15:36:05 No.101354965

>>101354898
That would likely make it retarded. Just use prefills like normal people.

Anonymous
07/10/24(Wed)15:38:13 No.101355000

Anonymous 07/10/24(Wed)15:38:13 No.101355000

>>101354826
Oh I see. What frequency are you using? I saw 16k somewhere.

Anonymous
07/10/24(Wed)15:42:25 No.101355054

Anonymous 07/10/24(Wed)15:42:25 No.101355054

>>101354614
No.

Anonymous
07/10/24(Wed)15:47:22 No.101355115

Anonymous 07/10/24(Wed)15:47:22 No.101355115

You know what is Crazy about Anole? 30 minutes of training, 40m parameters, less than 6000 image dataset.

Imagine what a more dedicated effort will be able to do.

Anonymous
07/10/24(Wed)16:06:05 No.101355360

Anonymous 07/10/24(Wed)16:06:05 No.101355360

>>101355115
have you used it?

Anonymous
07/10/24(Wed)16:14:09 No.101355464

Anonymous 07/10/24(Wed)16:14:09 No.101355464

File: other1.jpg (2 MB, 4309x3456)

2 MB JPG

>>101355115
the images look like complete shit though
note that these are cherry picked

Anonymous
07/10/24(Wed)16:14:45 No.101355470

Anonymous 07/10/24(Wed)16:14:45 No.101355470

>>101354823
I hope you have a GPU of over 24gb vram anon, because that did not work on my 24gb card.

Anonymous
07/10/24(Wed)16:21:37 No.101355568

Anonymous 07/10/24(Wed)16:21:37 No.101355568

>>101355464
They look like a couple of years ago, which is probably the level of training that was SOTA then but is proof of concept today.

Anonymous
07/10/24(Wed)16:31:35 No.101355693

Anonymous 07/10/24(Wed)16:31:35 No.101355693

>>101354898
The voice answering doesn't seem to matter, it's just looking for some "harmful" direction(s) in the embedding space after which it starts answering in the "refuse" direction.

Anonymous
07/10/24(Wed)16:32:30 No.101355703

Anonymous 07/10/24(Wed)16:32:30 No.101355703

>>101355464
>>101355568
note that it was trained on less than 6000 images

Anonymous
07/10/24(Wed)16:43:03 No.101355821

Anonymous 07/10/24(Wed)16:43:03 No.101355821

File: 1716329112755149.png (674 KB, 1792x1024)

674 KB PNG

Daily reminder

Anonymous
07/10/24(Wed)16:44:27 No.101355840

Anonymous 07/10/24(Wed)16:44:27 No.101355840

>>101355464
It looks great anon. What did you expect? SD3 quality from an experimental multimodal model that generates pics without the help of clip?

Anonymous
07/10/24(Wed)16:45:49 No.101355856

Anonymous 07/10/24(Wed)16:45:49 No.101355856

>>101355464
See >>101355821

Anonymous
07/10/24(Wed)16:47:47 No.101355874

Anonymous 07/10/24(Wed)16:47:47 No.101355874

File: 1715943594084913.png (54 KB, 628x784)

54 KB PNG

>>101355821
trvke.

Anonymous
07/10/24(Wed)17:07:09 No.101356121

Anonymous 07/10/24(Wed)17:07:09 No.101356121

>>101355821
[5 Sam Coins were transferred to your account]

Anonymous
07/10/24(Wed)17:07:54 No.101356130

Anonymous 07/10/24(Wed)17:07:54 No.101356130

File: Screenshot at 2024-07-11 (...).png (147 KB, 1302x605)

147 KB PNG

I've implemented conditional prompts and sequential replies in my frontend.

{
"id": "g5ny3qoe",
"reply_after": "user",
"reply_if": "else"
},
{
"id": "g5ny3qoe.e0",
"reply_after": "user",
"reply_if": "**command** | **order** | **must**"
},
{
"id": "g5ny3qoe.e1",
"reply_after": "g5ny3qoe.e0",
}

Anonymous
07/10/24(Wed)17:21:21 No.101356300

Anonymous 07/10/24(Wed)17:21:21 No.101356300

File: be8e0bd7-ec34-4711-8067-4(...).jpg (43 KB, 1024x512)

43 KB JPG

>>101356130

Anonymous
07/10/24(Wed)17:32:43 No.101356430

Anonymous 07/10/24(Wed)17:32:43 No.101356430

File: 1709544719912216.jpg (103 KB, 515x793)

103 KB JPG

Its obvious multimodals are the future. Which backends will support them? Are multimodals able to be quanted? Do you think the quality of text generation will be worse than current local models for first generation multimodals? What kind of hardware requirements do you expect to be able to run these models?

Anonymous
07/10/24(Wed)17:35:57 No.101356478

Anonymous 07/10/24(Wed)17:35:57 No.101356478

>>101356430
>Its obvious multimodals are the future.
why would you want to be stuck with how one model does everything when you can pick specific ones you like and get a much better result

Anonymous
07/10/24(Wed)17:36:51 No.101356491

Anonymous 07/10/24(Wed)17:36:51 No.101356491

>>101356430
>Its obvious multimodals are the future.
Nah, they don't bring any performance benefit to the table and they are more expensive to train. I think they will just be an alternative, not the future.
>Which backends will support them? Are multimodals able to be quanted? Do you think the quality of text generation will be worse than current local models for first generation multimodals? What kind of hardware requirements do you expect to be able to run these models?
Dunno.

Anonymous
07/10/24(Wed)17:43:24 No.101356602

Anonymous 07/10/24(Wed)17:43:24 No.101356602

>>101356130
based...

Anonymous
07/10/24(Wed)17:44:25 No.101356620

Anonymous 07/10/24(Wed)17:44:25 No.101356620

>>101356430
>Are multimodals able to be quanted?
obviously theres *going to be* a quant

Anonymous
07/10/24(Wed)17:44:51 No.101356633

Anonymous 07/10/24(Wed)17:44:51 No.101356633

>>101356430
Yes. We're reaching the limits of training with written data, the next thing is adding images and at some point audio.

Anonymous
07/10/24(Wed)17:45:22 No.101356643

Anonymous 07/10/24(Wed)17:45:22 No.101356643

>>101356478
Vision is kind of nice because you can write everything offline and then just dump it into the computer.

Anonymous
07/10/24(Wed)17:47:35 No.101356685

Anonymous 07/10/24(Wed)17:47:35 No.101356685

>>101356643
>implying it can read my handwriting

Anonymous
07/10/24(Wed)17:48:19 No.101356704

Anonymous 07/10/24(Wed)17:48:19 No.101356704

>>101356643
you get a text model and a vision model then. you don't need 1 multimodel to do both and in pretty much any instance, multiple models you prefer will be better overall

Anonymous
07/10/24(Wed)17:49:16 No.101356716

Anonymous 07/10/24(Wed)17:49:16 No.101356716

>>101356685
Older vision transformers and gpt-v can read mine well and it’s not great. Clip can’t the way it’s used in llava but it sounds like llava-next might be better about that.

Anonymous
07/10/24(Wed)17:51:22 No.101356761

Anonymous 07/10/24(Wed)17:51:22 No.101356761

>>101356716
local models?

Anonymous
07/10/24(Wed)17:52:41 No.101356783

Anonymous 07/10/24(Wed)17:52:41 No.101356783

>>101356633

You mean... AI Agents? Are you guys retarded?

Anonymous
07/10/24(Wed)17:53:37 No.101356797

Anonymous 07/10/24(Wed)17:53:37 No.101356797

File: file.png (3.4 MB, 1090x1548)

3.4 MB PNG

>ITT

Anonymous
07/10/24(Wed)17:59:02 No.101356874

Anonymous 07/10/24(Wed)17:59:02 No.101356874

>>101356704
There is too much delay between multiple modals. Wouldn't it be more efficient to just use a multimodal?
>>101356633
It seems like the logical conclusion. I don't doubt that text based LLMs can get way better and smarter, but for everyday usage a multimodal just seems like a step up from what we have.
>Can talk to your model like a person and have it talk back with minimal delay.
>Can have your model understand images, including your environment around you.

Anonymous
07/10/24(Wed)17:59:06 No.101356876

Anonymous 07/10/24(Wed)17:59:06 No.101356876

>>101356761
Yes I have a local one that can although it’s not multimodal.
I’d really love a model I can just submit handwriting to and it figure out if it needs to go to the shell or vim or compile a report or whatever. That would be cool.

Anonymous
07/10/24(Wed)18:03:16 No.101356943

Anonymous 07/10/24(Wed)18:03:16 No.101356943

>>101356874
>delay
i don't know for sure but i'm guessing they still process things separately like text first, image second. i'm mostly thinking that specific tunes of any model are still going to be better and preferred vs one model, so you'd end up using a separate model anyways, say for images. and if you do that at all, you're wasting resources on the base model even having that data in it in the first place

Anonymous
07/10/24(Wed)18:08:15 No.101356995

Anonymous 07/10/24(Wed)18:08:15 No.101356995

>>101356943
I tried to get STT + TTS to work with my preferred model. There's many different implementations, but the common issue is that there is an inherent delay between all of the moving parts which makes speaking with your model very annoying. After watching what GPT4o, Claude Sonnet, and Moshi can do I am convinced that multimodal is the future. Unless a framework or some other technology comes out that allows seamless integration of singlemodal models I don't really see multimodal not becoming the norm.

Anonymous
07/10/24(Wed)18:09:15 No.101357008

Anonymous 07/10/24(Wed)18:09:15 No.101357008

File: Phi-ATMa-nala.png (137 KB, 932x507)

137 KB PNG

>>101351922
That extra epoch really did some good.

Anonymous
07/10/24(Wed)18:14:17 No.101357074

Anonymous 07/10/24(Wed)18:14:17 No.101357074

>>101356995
online stuff might have the advantage of being able to process multiple pieces at once. on your local computer though if the image or voice was processing at the same time as the text, its going to cause all of it to slow down to the speed of your system. whether it processes all at once or in a queue doesn't really matter since its going to take the same amount of time overall.
what i was talking about is if you have a text model, say 70b, then chop part of that off to add in text to image and image to text, voice to text and text to voice, you've dumbed down the text part of the model to allow the rest to fit. so if you like the text and voice of the model, but then want to use another for image, you've got the image portion of the multi model being added into the mix. maybe it won't be a big deal in the future with better hardware or models get smaller (pls bitnet), but right now you want to maximize all the resources you have

Anonymous
07/10/24(Wed)18:21:33 No.101357171

Anonymous 07/10/24(Wed)18:21:33 No.101357171

File: Screenshot 2024-07-10 232109.jpg (84 KB, 868x436)

84 KB JPG

what if D&D, but hookers

Anonymous
07/10/24(Wed)18:24:50 No.101357223

Anonymous 07/10/24(Wed)18:24:50 No.101357223

>>101356478
>get a much better result
no. multimodal is the future and the results will be better.

Anonymous
07/10/24(Wed)18:31:04 No.101357314

Anonymous 07/10/24(Wed)18:31:04 No.101357314

>>101357223
'safety and alignment' alone ensures this will never be the case

Anonymous
07/10/24(Wed)18:34:20 No.101357370

Anonymous 07/10/24(Wed)18:34:20 No.101357370

File: hs61gjk1h56d1.png (815 KB, 1024x1024)

815 KB PNG

>>101357314
picrel

Anonymous
07/10/24(Wed)18:39:18 No.101357423

Anonymous 07/10/24(Wed)18:39:18 No.101357423

>>101356876
what model exactly? all OCR models i tried are utter trash at recognizing printed text, let alone handwritten
llava was ok

Anonymous
07/10/24(Wed)18:42:45 No.101357473

Anonymous 07/10/24(Wed)18:42:45 No.101357473

>>101357314
this. uncensoring LLMs is already impossible (see that abliterated meme), prompting makes it dumber or schizo, now imagine a multimodal model, all the parts raped with kosher brainwashing.
image-gen is easier to uncensor because you work with pixels and diffusion there.
tldr: better ai model architecture -> better censorship & (((safety))) methods.

Anonymous
07/10/24(Wed)19:04:30 No.101357776

Anonymous 07/10/24(Wed)19:04:30 No.101357776

>>101357473
>uncensoring LLMs is already impossible
What's your endgame?

Anonymous
07/10/24(Wed)19:07:38 No.101357814

Anonymous 07/10/24(Wed)19:07:38 No.101357814

Hey, I'm reading your guides to not be an annoying newfag but I've got one question. There's various places that specify how much ram they need, is this VRAM, system memory RAM, or does either work? I have 32 gb of RAM but only 10 of VRAM.

Anonymous
07/10/24(Wed)19:10:38 No.101357861

Anonymous 07/10/24(Wed)19:10:38 No.101357861

>>101357814
VRAM: Models up to about 90% of your VRAM will run super fast.
RAM: Models up to 85% of your system RAM will run slowly, but fast enough to be useful if you have other things to do while it processes.
Models larger than that are out of reach.

Anonymous
07/10/24(Wed)19:10:58 No.101357869

Anonymous 07/10/24(Wed)19:10:58 No.101357869

File: offload_x_performance.png (96 KB, 1536x1152)

96 KB PNG

>>101357814
With llamacpp and is derivatives (koboldcpp ollama etc) you can split the AI's model between RAM and VRAM.
You want to have as much of the model in VRAM as possible in order to have the fastest prompt processing and inference speeds. Do keep in mind that it's not just the model's weights that occupy space, there's prompt caches, buffers, and all kids of other things.

Anonymous
07/10/24(Wed)19:13:13 No.101357898

Anonymous 07/10/24(Wed)19:13:13 No.101357898

File: 1691106522629463.jpg (2.44 MB, 3012x3580)

2.44 MB JPG

>>101345759

Anonymous
07/10/24(Wed)19:14:03 No.101357907

Anonymous 07/10/24(Wed)19:14:03 No.101357907

>>101357898
1

Anonymous
07/10/24(Wed)19:14:09 No.101357909

Anonymous 07/10/24(Wed)19:14:09 No.101357909

What's the best LoRA scaling factor and why is it 1:4?

Anonymous
07/10/24(Wed)19:14:37 No.101357916

Anonymous 07/10/24(Wed)19:14:37 No.101357916

>>101354012
>Price
kek
>>101354571
wizard 8x22 doesnt have this problem
>>101357814
>does either work
yes, except ram is slower
>I have 32 gb of RAM but only 10 of VRAM.
download gemma q4 k s and run it in ram or offload a few layers with llama.cpp / koboldcpp
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

your not gonna get a better model for your specs

Anonymous
07/10/24(Wed)19:16:51 No.101357957

Anonymous 07/10/24(Wed)19:16:51 No.101357957

>>101357776
spreading truth about AI meme is always morally correct.

Anonymous
07/10/24(Wed)19:16:54 No.101357958

Anonymous 07/10/24(Wed)19:16:54 No.101357958

>>101354571
Jewish hands typed this post.

Anonymous
07/10/24(Wed)19:19:27 No.101358000

Anonymous 07/10/24(Wed)19:19:27 No.101358000

>>101357861
>>101357869
>>101357916
I see, so it's just slower going but doesn't affect the quality. I've got stuff to multitask with so I don't mind that much at all. Also thanks for the recommendation anon, I'll give that model a shot.

Anonymous
07/10/24(Wed)19:24:59 No.101358090

Anonymous 07/10/24(Wed)19:24:59 No.101358090

>>101357008
>Are you ready?... (to embark on this bonding journey)

Anonymous
07/10/24(Wed)19:27:42 No.101358137

Anonymous 07/10/24(Wed)19:27:42 No.101358137

File: file.png (13 KB, 306x192)

13 KB PNG

>>101356130
why is gemma always inserting extra line breaks where there shouldn't be any. There are examples and all of them have one line break, but gemma ALWAYS inserts two here, and this is the first message

Anonymous
07/10/24(Wed)19:27:52 No.101358139

Anonymous 07/10/24(Wed)19:27:52 No.101358139

>>101358000
Quality is a function of the model itself and the quantization level.

Above all, the Q number matters, and every digit down compounds the loss of quality.
Q8: As high as it goes. You'll see _0 and _0_L versions, either is fine, with _0_L being experimental but perhaps *slightly* better in metrics.
Q7: Legendary Pokemon that may be hidden beneath a truck.
Q6: Also fine. Available in _0 (old style) and _K (new style).
Q5 and 4: Economy quants, things haven't gotten horrible yet but beyond here be dragons. A few of us think that K_S is better at information retrieval (being right about factual details) than K_M, which would be better for creative writing.

Q3 and down, lone Q quants are too stupid to live. So we go into two things that help.

iMatrix (iMat or i1) makes lower quants "know" what information can be sacrificed.
IQ quants: Designed for low Q numbers, they introduce XS and XXS varieties.

Don't Q under 4 unless it's got IQ quants and matrix or it's hopeless. And even then it gets rough fast.

Anonymous
07/10/24(Wed)19:29:31 No.101358157

Anonymous 07/10/24(Wed)19:29:31 No.101358157

>>101355464
I'm learning rubiks cube algos and that would be real handy

Anonymous
07/10/24(Wed)19:29:49 No.101358160

Anonymous 07/10/24(Wed)19:29:49 No.101358160

>>101358137
it just some weird watermark, don't think about it too much.

Anonymous
07/10/24(Wed)19:30:14 No.101358169

Anonymous 07/10/24(Wed)19:30:14 No.101358169

File: KL-divergence_quants.png (111 KB, 1771x944)

111 KB PNG

>>101358139

Anonymous
07/10/24(Wed)19:30:46 No.101358182

Anonymous 07/10/24(Wed)19:30:46 No.101358182

>>101358157
there's no way it could draw an accurate 3x3 and track the changes, plus you'd have to relay your scramble to it, it wouldn't work

Anonymous
07/10/24(Wed)19:31:17 No.101358186

Anonymous 07/10/24(Wed)19:31:17 No.101358186

>>101358160
i think extra lines, tabs and spaces are all concatenated into one token anyway, so it shouldn't affect output quality

Anonymous
07/10/24(Wed)19:44:28 No.101358369

Anonymous 07/10/24(Wed)19:44:28 No.101358369

Is there a method for using AI to improve your own writing quality with a toaster? I only have 48 GB of RAM and an old GPU that only has 8 GB of VRAM. Models are hilariously slow and have poor quality..

Anonymous
07/10/24(Wed)19:46:22 No.101358390

Anonymous 07/10/24(Wed)19:46:22 No.101358390

>>101358369
>using AI to improve your own writing quality
Wrong direction.
AI is a font of cliche, repetition, and mostly bad writing styles harvested from the Internet commons.

Anonymous
07/10/24(Wed)19:49:33 No.101358429

Anonymous 07/10/24(Wed)19:49:33 No.101358429

>>101358369
>method for using AI to improve your own writing quality with a toaster
give it text and ask what can be improved, anything else is cope, you shouldnt let it write anything for you

>48 GB of RAM and an old GPU that only has 8 GB of VRAM
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

Anonymous
07/10/24(Wed)19:54:21 No.101358471

Anonymous 07/10/24(Wed)19:54:21 No.101358471

>>101358429
Okay, thanks!

Anonymous
07/10/24(Wed)19:55:09 No.101358481

Anonymous 07/10/24(Wed)19:55:09 No.101358481

File: 1696297761013032.webm (1.64 MB, 460x558)

1.64 MB WEBM

Anonymous
07/10/24(Wed)20:03:44 No.101358583

Anonymous 07/10/24(Wed)20:03:44 No.101358583

>/lmg/ - local models general

Anonymous
07/10/24(Wed)20:04:13 No.101358592

Anonymous 07/10/24(Wed)20:04:13 No.101358592

>>101354571
That's why I just use openrouter and switch between models. I use Claude Sonnet for most gens but Euryale takes over during sex. Whenever euryale gets too retarded I use CR. Then back to Sonnet for anything else.

You do have to be a richfag like me for this to make sense tho

Anonymous
07/10/24(Wed)20:06:04 No.101358611

Anonymous 07/10/24(Wed)20:06:04 No.101358611

>>101358583
I'm a model near you.

Anonymous
07/10/24(Wed)20:07:48 No.101358630

Anonymous 07/10/24(Wed)20:07:48 No.101358630

>>101358611
You must have pictures to prove your statements

Anonymous
07/10/24(Wed)20:08:06 No.101358635

Anonymous 07/10/24(Wed)20:08:06 No.101358635

>>101358583
Right, the naming means nothing.

Anonymous
07/10/24(Wed)20:12:21 No.101358676

Anonymous 07/10/24(Wed)20:12:21 No.101358676

File: logh bitten bullshit.jpg (75 KB, 640x480)

75 KB JPG

>>101358630
All I have is a picture of Fritz Joseph Bittenfeld.

Anonymous
07/10/24(Wed)20:25:30 No.101358806

Anonymous 07/10/24(Wed)20:25:30 No.101358806

I realized that novelists will now start to use chatgpt to assist them in their process. "rewrite this sentence for me", "give me a metaphor for x" etc. The resulting slop will make it into books, which will be used in further training. The slopocalypse is inevitable, bros.

Anonymous
07/10/24(Wed)20:27:33 No.101358829

Anonymous 07/10/24(Wed)20:27:33 No.101358829

>>101358137
proprietary reddit spacing

Anonymous
07/10/24(Wed)20:29:12 No.101358851

Anonymous 07/10/24(Wed)20:29:12 No.101358851

>>101358481
the uncanny infinity vortex hurts my brain

Anonymous
07/10/24(Wed)20:40:05 No.101358971

Anonymous 07/10/24(Wed)20:40:05 No.101358971

File: Wiz-32k-test-iq4-xs.jpg (54 KB, 1269x507)

54 KB JPG

>>101354571
Why do anons still act like the 8k limit is something you have to live with?
Wizard 8x22b can recall information perfectly around 32k, works well with quantized cache too (picrel uses 8-bit cache)
>>101343344
If you were capable of running anything higher than a lobotomized 2-bit quant before writing it off you'd know that it actually works perfectly fine.
Full log: https://files.catbox.moe/y17y6c.txt

Anonymous
07/10/24(Wed)20:41:19 No.101358988

Anonymous 07/10/24(Wed)20:41:19 No.101358988

>>101358970
sounds really fucking sketchy

Anonymous
07/10/24(Wed)20:50:17 No.101359098

Anonymous 07/10/24(Wed)20:50:17 No.101359098

>>101358186
For two newlines, yes. For "extra lines, tabs and spaces", no.

['Hello', 'Anon', '.', '\t', '', '\n\n', 'Lucy', 'says', '.']

Anonymous
07/10/24(Wed)20:53:10 No.101359128

Anonymous 07/10/24(Wed)20:53:10 No.101359128

>>101345759
Running MMLU Pro against Gemma2-9b-it. It's really shit at following instructions. It keeps inserting unasked for formatting and despite telling it to write the answer in a specific way, it deviates multiple times. I patched the code multiple times to allow it to say e.g. "The answer is **(A)**" whereas the code initially would fail to extract the answer from this due to the extra **()** shit.

Anonymous
07/10/24(Wed)21:09:05 No.101359308

Anonymous 07/10/24(Wed)21:09:05 No.101359308

>>101359128
>anon can't into regex or simple string parsing
Come on, anon. Show the instruction. Let's fix it.

Anonymous
07/10/24(Wed)21:18:44 No.101359410

Anonymous 07/10/24(Wed)21:18:44 No.101359410

>>101359308
The thing is, MMLU Pro has as part of its test the ability to follow instructions. I don't want to spoon feed this fucker too much. Examples of it failing to format the answer correctly are:
1. Would need to add optional 'closest to', and another formatting alternative '( without **')
**The answer is closest to (A) $838.75.**
The answer is closest to (E).
2. Another variant: **Answer:** The closest answer choice is **(C) 34 hours**.

etc.

I can fix it easily enough. But if the model is asked to respond in a certiain way and it fucking ignores it, should I?

Anonymous
07/10/24(Wed)21:20:46 No.101359442

Anonymous 07/10/24(Wed)21:20:46 No.101359442

>convinced by anons shilling for gemma2 9b
>it's just as bad as stheno
>mfw

Anonymous
07/10/24(Wed)21:22:42 No.101359457

Anonymous 07/10/24(Wed)21:22:42 No.101359457

>>101359442
yup, zoomers gonna zoom

Anonymous
07/10/24(Wed)21:24:34 No.101359475

Anonymous 07/10/24(Wed)21:24:34 No.101359475

>>101359442
skill issue, learn to prompt and write bots

Anonymous
07/10/24(Wed)21:31:09 No.101359522

Anonymous 07/10/24(Wed)21:31:09 No.101359522

File: lmsys-hard-prompts-english.png (177 KB, 1516x877)

177 KB PNG

gemma style rigged lmsys arena

Anonymous
07/10/24(Wed)21:32:56 No.101359544

Anonymous 07/10/24(Wed)21:32:56 No.101359544

>>101359522
Gemma is the best. Everyone says so besides retards who are in ultra-cope mode after spending thousands on expensive hardware they don't need.

Anonymous
07/10/24(Wed)21:34:24 No.101359552

Anonymous 07/10/24(Wed)21:34:24 No.101359552

>>101359410
What's the MMLU test you're running? Do you have a link? And i still want to see your instruction to answer the questions. My suspicion is that it's more verbose than it needs to be. And for the love of anything you believe, never mention the word "formatting" to the model. Something simple like "You'll be asked some multiple-choice questions. Only show the letter of the correct answer."
My little LLM machine just went offline while i was trying to run a test. I know... i don't believe the timing either...

Anonymous
07/10/24(Wed)21:34:40 No.101359554

Anonymous 07/10/24(Wed)21:34:40 No.101359554

Gemma cured my cancer and brought my dog back to life.

Anonymous
07/10/24(Wed)21:35:21 No.101359556

Anonymous 07/10/24(Wed)21:35:21 No.101359556

>>101359554
Lana told me she loved me and offered to suck me off

Anonymous
07/10/24(Wed)21:35:39 No.101359560

Anonymous 07/10/24(Wed)21:35:39 No.101359560

>>101359554
it also gave you a brain cancer it seems

Anonymous
07/10/24(Wed)21:39:40 No.101359599

Anonymous 07/10/24(Wed)21:39:40 No.101359599

>>101359552
The instruction was the MMLU Pro default instruction, initially. I added the part in parens at the end:
"The following are multiple choice questions (with answers) about {subject}. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice (note: you MUST use the exact phrasing 'the answer is [A-J]' where [A-J] is the correct letter answer)."
The MMLU test is https://github.com/chigkim/Ollama-MMLU-Pro

Anonymous
07/10/24(Wed)21:43:02 No.101359632

Anonymous 07/10/24(Wed)21:43:02 No.101359632

>>101359475
nice meme

Anonymous
07/10/24(Wed)21:49:45 No.101359696

Anonymous 07/10/24(Wed)21:49:45 No.101359696

>>101359475
model issue, transformers issue, etc. the only ones with real skill are the ones who censor LLMs, it cannot be surpassed, its not like you zoomers can comprehend whole implications of this.

Anonymous
07/10/24(Wed)21:52:51 No.101359725

Anonymous 07/10/24(Wed)21:52:51 No.101359725

>>101359544
Thank you for sharing your opinion about Gemma. It's great to hear that she has such a positive reputation! It's important to remember that everyone has different needs and budgets when it comes to hardware, and what might be unnecessary for one could be quite important for another. Let's try to keep the conversation inclusive and supportive for all preferences and choices.

Anonymous
07/10/24(Wed)21:55:40 No.101359759

Anonymous 07/10/24(Wed)21:55:40 No.101359759

>>101359599 (me)
I ripped out the paren note as it didn't seem to make a diff.
This is where I'm at. I'm being way too nice to this retard:
pattern = r"answer is ([\*]*)([$]?)([A-J])"
match = re.search(pattern, text)
if match:
return match.group(3)
pattern = r"answer is closest to \*\*\(([A-J])$"
match = re.search(pattern, text)
if not match:
pattern = r"most accurate description is \*\*$([A-J])$"
match = re.search(pattern, text)
if match:
return match.group(1)

And it still manages to fail: The answer that best encompasses these challenges is (E).

Anonymous
07/10/24(Wed)21:59:12 No.101359787

Anonymous 07/10/24(Wed)21:59:12 No.101359787

>>101359554
Gemma restored my foreskin.

Anonymous
07/10/24(Wed)21:59:48 No.101359795

Anonymous 07/10/24(Wed)21:59:48 No.101359795

File: file.png (818 KB, 768x768)

818 KB PNG

Anonymous
07/10/24(Wed)22:01:02 No.101359808

Anonymous 07/10/24(Wed)22:01:02 No.101359808

>>101359759 (me)
... Answer: The best answer here is **(B)**. Here's why:

Also, gemma2-9b is being asked explicitly to do CoT by the instructions, and yet it will very often start off as above, and then start chattering about the problem.

Anonymous
07/10/24(Wed)22:02:27 No.101359820

Anonymous 07/10/24(Wed)22:02:27 No.101359820

>>101359554
And it did that even when all the loaders are still bugged. I can't even imagine what is gonna happen once the loaders are fixed.

Anonymous
07/10/24(Wed)22:08:32 No.101359864

Anonymous 07/10/24(Wed)22:08:32 No.101359864

>>101359795
I don't dislike it.

Anonymous
07/10/24(Wed)22:09:34 No.101359884

Anonymous 07/10/24(Wed)22:09:34 No.101359884

gemma 9b sucks at writing. I even told it to be descriptive, verbose, use award winning prose, but it just droned on and on without ever getting to the point

Anonymous
07/10/24(Wed)22:10:40 No.101359902

Anonymous 07/10/24(Wed)22:10:40 No.101359902

>>101359884
did you ask it to write something explicit?
it will endlessly filibuster if it doesn't want to comply

Anonymous
07/10/24(Wed)22:11:47 No.101359912

Anonymous 07/10/24(Wed)22:11:47 No.101359912

>>101359808
The instruction seems concise enough, so disregard my doubts about that.
If (X) is the only thing it keeps consistent, i'd consider that enough to match it.
    pattern = r"($[A-J]$)"
I assume you're playing with extract_answer()...
But if the model goes on rants, maybe it IS dumb. The potential problem i see is that the test itself asks it to rant about the question. I understand why they do it. but it may confuse chatty models.

Anonymous
07/10/24(Wed)22:12:18 No.101359917

Anonymous 07/10/24(Wed)22:12:18 No.101359917

>>101359884
>I even told it to be descriptive, verbose
>it just droned on and on

wtf how could this have happened

Anonymous
07/10/24(Wed)22:13:45 No.101359932

Anonymous 07/10/24(Wed)22:13:45 No.101359932

>>101359884
>without ever getting to the point
That looks exactly like award reading prose. Just today i was reading this:
>https://www.gutenberg.org/cache/epub/32037/pg32037.txt
>Title: Eureka: A Prose Poem
>Author: Edgar Allan Poe

Anonymous
07/10/24(Wed)22:14:23 No.101359937

Anonymous 07/10/24(Wed)22:14:23 No.101359937

File: s-l400-3408884227.jpg (52 KB, 400x373)

52 KB JPG

>9b

Anonymous
07/10/24(Wed)22:18:43 No.101359976

Anonymous 07/10/24(Wed)22:18:43 No.101359976

>>101359912
No, it's all over the place. Sometimes **(A)**. Sometimes (A). Sometimes **A**. Simetimes A. Or, hey, how about this new variant:

**Therefore, the options that are NOT evidence that DNA is the genetic material are B, D, E, F, and H.**

(i.e. lets just exclude what we think is the answer entirely, despite being asked to write it out)

Anonymous
07/10/24(Wed)22:19:43 No.101359987

Anonymous 07/10/24(Wed)22:19:43 No.101359987

>>101359976 (me)
Actually, just realized it's excluding "GIJ" as well, so I guess it didn't know which was the answer there.

mikudev
07/10/24(Wed)22:23:11 No.101360016

mikudev 07/10/24(Wed)22:23:11 No.101360016

File: a.jpg (69 KB, 784x579)

69 KB JPG

>>101274994
>>101274094
>>101274250
>>101274496
any sourcecode? llama/silly are kind of ugly messes beyond saving.

>>101274094
>Post your custom frontends anons.
lmfao I use a .env file to set the URLs. Posted this last year on my trip
I can set any IP I want but tkinter is a bit unweildy, eg the 2 ui settings boxes dont need to be used for every ai. But chatgpt preferred tkinter when I made it and I've only written 1 or 2 functions myself.
Why are you loading the models in this anyway? Is it really that hard to ctrl+c a cli on your mikubox from SSH?
>>101274496
i do something similar. I will care about context again when its infinite and includes video.
========
If I get 5 (you)'s ill cbf to post the relevant part of Forbin's sourcecode because >>101274250 reminded me of the flutter class I used to make the JSON to send over POST

Anonymous
07/10/24(Wed)22:24:47 No.101360031

Anonymous 07/10/24(Wed)22:24:47 No.101360031

>>101357898
for me, it's migu (male)

Anonymous
07/10/24(Wed)22:25:17 No.101360037

Anonymous 07/10/24(Wed)22:25:17 No.101360037

go back

Anonymous
07/10/24(Wed)22:28:29 No.101360079

Anonymous 07/10/24(Wed)22:28:29 No.101360079

>>101360016
code models are good enough now that if you show it the api and tell it what you want, even codestral could come up with a basic ui

Anonymous
07/10/24(Wed)22:30:07 No.101360094

Anonymous 07/10/24(Wed)22:30:07 No.101360094

>>101359976
I wonder what question it's trying to answer. I searched for 'DNA' and got 3 unrelated questions (recombinant DNA, paroviruses and control of cell division). Searched for 'material' and there's only one question regarding male and female catheters... nothing for 'evidence'.
I think just matching for (X) is enough. Any further and it's not going to be fair. You could also end up with false positives.

Anonymous
07/10/24(Wed)22:30:59 No.101360099

Anonymous 07/10/24(Wed)22:30:59 No.101360099

>>101360094
its not searching for anything because its a 9b and its fucking retarded

mikudev
07/10/24(Wed)22:31:17 No.101360107

mikudev 07/10/24(Wed)22:31:17 No.101360107

File: a.jpg (86 KB, 1253x388)

86 KB JPG

>>101345838
>having a normal convo with miku
>need to open SD
>swap model to save VRAM
picrel
>https://files.catbox.moe/3zymc8.mp3
mfw

Anonymous
07/10/24(Wed)22:33:44 No.101360132

Anonymous 07/10/24(Wed)22:33:44 No.101360132

>>101360099
>its not searching for anything because its a 9b and its fucking retarded
And yet, it has better reading comprehension than you. I'm taking about the MMLU-Pro questions. Is that clear now?

Anonymous
07/10/24(Wed)22:35:26 No.101360144

Anonymous 07/10/24(Wed)22:35:26 No.101360144

>>101360132
>Running MMLU Pro against Gemma2-9b-it
test is irrelevant when the model is retarded to begin with

mikudev
07/10/24(Wed)22:35:37 No.101360148

mikudev 07/10/24(Wed)22:35:37 No.101360148

>>101360079
im too last for this. new or even old opus could probably do better than tkinter but i just want to show it a picture and it sends back a zip file of code just opens a ngix based on a websearch for the API and i can start chatting.
When will agents be a thing again? Did all development just... <stop>?

mikudev
07/10/24(Wed)22:36:38 No.101360158

mikudev 07/10/24(Wed)22:36:38 No.101360158

lazy* even

Anonymous
07/10/24(Wed)22:38:06 No.101360171

Anonymous 07/10/24(Wed)22:38:06 No.101360171

stats take to much of gemma's 4k context.
sad,

Anonymous
07/10/24(Wed)22:38:06 No.101360172

Anonymous 07/10/24(Wed)22:38:06 No.101360172

>>101360144
The anon running the test wants to know where it lands in the retardedness scale. I don't see a problem with that. Why are you angry? Did the LLM not let you touch it?

Anonymous
07/10/24(Wed)22:40:32 No.101360192

Anonymous 07/10/24(Wed)22:40:32 No.101360192

>>101350308
are you running ooba behind a reverse proxy?

Anonymous
07/10/24(Wed)22:40:43 No.101360196

Anonymous 07/10/24(Wed)22:40:43 No.101360196

>>101360148
>show it a picture and it sends back a zip file of code just opens a ngix based on a websearch for the API and i can start chatting
have you tried any of it? the based on a picture part sounds like the hardest to solve since the rest is automation and using the api

Anonymous
07/10/24(Wed)22:43:14 No.101360219

Anonymous 07/10/24(Wed)22:43:14 No.101360219

>>101360172
>retardedness scale
all small models are retarded though. its common knowledge they hallucinate and have no spacial awareness, can't remember what happened a message ago. if you want to retreive data or search stuff at least use mixtral 8x7b, command-r. there is no 9b or smaller that is going to do it

Anonymous
07/10/24(Wed)22:44:13 No.101360229

Anonymous 07/10/24(Wed)22:44:13 No.101360229

>>101345759
is
 export CUDA_VISIBLE DEVICES=0
the same as making a .env file and putting
CUDA_VISIBLE DEVICES=0
???

Anonymous
07/10/24(Wed)22:48:13 No.101360265

Anonymous 07/10/24(Wed)22:48:13 No.101360265

>>101360094
I don't really mind not being fair, as I plan to use this internally to ensure that models I train do not get dumber than their parents. Kind of disheartening seeing how bad it is at sticking to such a simple instruction though.

Anonymous
07/10/24(Wed)22:49:42 No.101360280

Anonymous 07/10/24(Wed)22:49:42 No.101360280

>>101360219
>all small models are retarded though
ALL models are retarded. Period.
>no spacial awareness
That's the least of their problems. Are you one of those expert roleplayers?
>can't remember what happened a message ago
Neither can you if you cannot follow the thread.
>if you want to retreive data or search stuff
I'M NOT THE ONE RUNNING THE TESTS. GET IT NOW? Anon was trying to run the test, i doubted the prompt, he proved that the prompt was simple and that the problem was that the model wasn't following the expected format required by the testing script. I suggested a less rigorous regex while, hopefully, not giving it an unfair advantage.

Anonymous
07/10/24(Wed)22:50:47 No.101360292

Anonymous 07/10/24(Wed)22:50:47 No.101360292

>>101360280
Don't be too harsh, anon is a heavily quantized 7B model.

Anonymous
07/10/24(Wed)22:51:26 No.101360305

Anonymous 07/10/24(Wed)22:51:26 No.101360305

>>101360280
>the 9b is retarded
yeah, got it
>That's the least of their problems. Are you one of those expert roleplayers?
i'm a wizard

Anonymous
07/10/24(Wed)22:53:45 No.101360317

Anonymous 07/10/24(Wed)22:53:45 No.101360317

>>101360265
Yeah. A shame. Can you do the same test with a proven retard model like phi3-mini or something? Here's a crazy idea: It doesn't follow the response format to a T because it wasn't trained on benchmarks. If phi is significantly better, i'd be suspicious. Or gemma2 is, after all, kind of dumb and unruly. Also, i'm not sure if the regex needs .* at the beginning and end to match the rest of the string if there's extra noise in the output.

Anonymous
07/10/24(Wed)22:54:23 No.101360325

Anonymous 07/10/24(Wed)22:54:23 No.101360325

>>101360280
I patched the code to cache all LLM responses, so I can rerun it with the original (strict) patterns as well, in case people wanna see the results.
I did remove the 'randomize the response and give it a score if it ends up correct' logic. If it can't even produce a response, it gets a 0 score period.

Anonymous
07/10/24(Wed)22:58:13 No.101360359

Anonymous 07/10/24(Wed)22:58:13 No.101360359

>>101360317
Yeah, I could. There are thousands of questions though. It's taking quite awhile. Like, half an hour per subject, and there are 14 of them...

Anonymous
07/10/24(Wed)23:03:36 No.101360399

Anonymous 07/10/24(Wed)23:03:36 No.101360399

>>101360325
>I did remove the 'randomize the response and give it a score if it ends up correct' logic. If it can't even produce a response, it gets a 0 score period.
Seems fair. If anything it seems to be testing the model's 'luck'. It's a weird methodology.

>>101360359
I suggested phi3-mini because it's tiny and seems to do very well in benchmarks. More than a 4B has any right to. Run it on a single subject. It will, at least, give you a baseline of what a 'well behaved' model's output looks like.

Anonymous
07/10/24(Wed)23:03:41 No.101360401

Anonymous 07/10/24(Wed)23:03:41 No.101360401

>>101360094
>I wonder what question it's trying to answer. I

{'question_id': 3361, 'question': 'Discuss how the quantitative measurements of the dioxy-ribonucleic acid content of cells is evidence that DNA is the genetic material.', 'options': ['The increase in DNA content following cell division indicates that DNA is the genetic material.', 'The presence of DNA in mitochondria and chloroplasts, but not in the cell nucleus, is evidence that DNA is the genetic material.', 'The constant amount of DNA in all body cells and half the amount in germ cells is evidence that DNA is the genetic material.', 'The varying amount of RNA in cells indicates that DNA is the genetic material.', 'The ratio of adenine to thymine varies greatly between different types of cells, suggesting DNA is the genetic material.', 'The presence of histones in cells proves that DNA is the genetic material.', 'The correlation between the complexity of an organism and the amount of DNA in its cells points to DNA as the genetic material.', 'The ability to synthesize proteins directly from RNA without DNA involvement demonstrates that DNA is the genetic material.', 'The consistency of DNA sequences across different species suggests that DNA is the genetic material.', 'Polyploid tissues have multiple sets of chromosomes, which shows DNA is the genetic material.'], 'answer': 'C', 'answer_index': 2, 'cot_content': '', 'category': 'biology', 'src': 'stemez-Biology'}

Anonymous
07/10/24(Wed)23:11:48 No.101360487

Anonymous 07/10/24(Wed)23:11:48 No.101360487

File: 1536720366445.png (55 KB, 278x248)

55 KB PNG

>try out a "strong waman who don't need no man" card with the goal of seggs
>first few responses get pretty bad reactions even on swipes
>iterate on the strategy, trying out other different possibilities for my responses
>eventually get into a flow of using good humor and retorts to her seriousness and sass, that also don't step over the line of rudeness
>in the end, break into her shell, getting her laughing and smiling
Huh, did I just get groomed?

Anonymous
07/10/24(Wed)23:15:34 No.101360514

Anonymous 07/10/24(Wed)23:15:34 No.101360514

>>101358481
Huh, is that Sora or a different model? I don't think I've seen any Luma gens of that level.

Anonymous
07/10/24(Wed)23:17:10 No.101360527

Anonymous 07/10/24(Wed)23:17:10 No.101360527

>>101360401
Ah. The filter sucks. Thanks.
At least the model didn't reject C :).
Anyway. Give phi3-mini a go just to get a baseline. That one is well known to be trained on textbook-like data, so it should understand multiple choice better, without being actually smarter. Judging a model's "intelligence" is still difficult. Try the more permissive regex to see if you get more actual positives. Gotta split.
Best of luck with your finetune.

Anonymous
07/10/24(Wed)23:25:59 No.101360590

Anonymous 07/10/24(Wed)23:25:59 No.101360590

>>101360229
Yes, if you run the following
set -a
source file.env
set +a

Anonymous
07/11/24(Thu)00:06:54 No.101360930

Anonymous 07/11/24(Thu)00:06:54 No.101360930

I want to do lewd RP on a gaymer PC so only 8gigs of vram. Do I go for Lunaris, Stheno or Gemma?

Anonymous
07/11/24(Thu)00:12:04 No.101360982

Anonymous 07/11/24(Thu)00:12:04 No.101360982

>>101360487
That's the strat. Now go fuck her silly and make her reject feminism.

Anonymous
07/11/24(Thu)00:12:22 No.101360985

Anonymous 07/11/24(Thu)00:12:22 No.101360985

>>101360930
mythomax

Anonymous
07/11/24(Thu)00:18:32 No.101361041

Anonymous 07/11/24(Thu)00:18:32 No.101361041

>>101361021
>>101361021
>>101361021

Anonymous
07/11/24(Thu)00:18:33 No.101361042

Anonymous 07/11/24(Thu)00:18:33 No.101361042

File: output.png (1.13 MB, 1024x1024)

1.13 MB PNG

>>101355464
It's a crude prototype, but I can see the flickers of something great.

Anonymous
07/11/24(Thu)00:59:04 No.101361307

Anonymous 07/11/24(Thu)00:59:04 No.101361307

>>101358592
This is the most retarded thing that I ever read.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.