/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 09/17/24(Tue)21:01:31 No.102434744

File: 2024-09-08_190709_seed3_s(...).png (1.62 MB, 1280x1280)

1.62 MB PNG

/lmg/ - Local Models General Anonymous 09/17/24(Tue)21:01:31 No.102434744

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102429190 & >>102417229

►News
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/17/24(Tue)21:02:29 No.102434752

Anonymous 09/17/24(Tue)21:02:29 No.102434752

File: zzz.jpg (13 KB, 367x316)

13 KB JPG

►Recent Highlights from the Previous Thread: >>102429190

--StyleTTS-ZS zero-shot text-to-speech synthesis project and xtts2 improvements: >>102430346 >>102430383
--Mistral AI correctly answers Castlevania trivia, but struggles without game context: >>102430040 >>102430050 >>102430068 >>102430106 >>102430513 >>102430626 >>102430648 >>102430653 >>102430665 >>102430751 >>102430762
--Mistral-Small-Instruct generates a Python script for the booba API: >>102429844 >>102429886
--Mistral AI's Pixtral 12B model shows high accuracy in multimodal knowledge and reasoning tasks: >>102430951 >>102430997
--Q6_K_L and Q8 quantization types compared: >>102432386 >>102432464 >>102432519 >>102432528 >>102432832
--Prompt engineering automation and evaluation for smaller LLMs: >>102429803 >>102429867 >>102430180 >>102430253 >>102431460 >>102431498
--Poor quality AI output with explicit content, users hope for improvement through finetuning: >>102434343 >>102434414 >>102434484
--Mistral Small's intelligence drops at longer context sizes: >>102431384
--Mistral Small model generates in-character response for "The girl called Alice": >>102431859 >>102432030 >>102432064
--Mistral Small Q8 can solve Sally question due to training on quizzes: >>102430428 >>102430654 >>102430636
--IQ2_M Mistral Small model is usable and generates smart responses: >>102431056 >>102431145 >>102431429
--ExLlamaV2 8bpw models padded with extra precision, 6bpw precision sufficient: >>102432045 >>102432069 >>102432103 >>102432834 >>102432962 >>102432987 >>102433001 >>102433040 >>102433048 >>102433108 >>102433180 >>102433192 >>102433082 >>102433149 >>102433244 >>102433764 >>102433261 >>102433277 >>102433311
--Teto (free space): >>102429241 >>102429806 >>102430975 >>102431015 >>102432273 >>102433465 >>102433919

►Recent Highlight Posts from the Previous Thread: >>102429197

Anonymous
09/17/24(Tue)21:04:10 No.102434766

Anonymous 09/17/24(Tue)21:04:10 No.102434766

>>102434739
I respect your high context autism, but goddamn anon. 30k at 30t/s is a pretty nice deal I say.

Anonymous
09/17/24(Tue)21:12:08 No.102434849

Anonymous 09/17/24(Tue)21:12:08 No.102434849

>>102434752
>IQ2_M Mistral Small model is usable and generates smart responses
True.

Anonymous
09/17/24(Tue)21:12:23 No.102434851

Anonymous 09/17/24(Tue)21:12:23 No.102434851

>>102434766
>30k
I don't believe there is a single model that can do 30k tokens ERP without the quality being complete trash. There are just too many patterns it can pick.

Anonymous
09/17/24(Tue)21:12:51 No.102434858

Anonymous 09/17/24(Tue)21:12:51 No.102434858

Hi all, Envoid here. I made a theme song for Drummer
https://voca.ro/19Y676wbJfOM

Anonymous
09/17/24(Tue)21:22:01 No.102434922

Anonymous 09/17/24(Tue)21:22:01 No.102434922

>>102434851
That's why you use meme sampling that discourages repetition. Personally I use presence penalty at 1 to encourage new tokens over old ones universally (but 1 is still a low enough value it won't pass on important tokens even if they repeat) and DRY at the default values for discouraging repeated strings of tokens, although both need the length adjusted until you find the sweet spot or the output quality will eventually suffer when too many tokens are being penalized.

Anonymous
09/17/24(Tue)21:25:58 No.102434943

Anonymous 09/17/24(Tue)21:25:58 No.102434943

bros... how do I speed up my vector stores... computing cosine similarities is so fucking slow that it's better on CPU...

Anonymous
09/17/24(Tue)21:30:44 No.102434973

Anonymous 09/17/24(Tue)21:30:44 No.102434973

File: 9D1ZwwI2hfnaTlIcHNTixQ.jpg (38 KB, 843x714)

38 KB JPG

is it just me or is mistral small 22B a complete nothingburger?

Anonymous
09/17/24(Tue)21:31:50 No.102434983

Anonymous 09/17/24(Tue)21:31:50 No.102434983

>>102434973
It's a nothingburger for non-VRAMlets. But I enjoy tuning models for VRAMlet enjoyment. So it's a something burger for me.

Anonymous
09/17/24(Tue)21:32:28 No.102434986

Anonymous 09/17/24(Tue)21:32:28 No.102434986

>>102434973
It's a collective vramlet delusion.

Anonymous
09/17/24(Tue)21:34:29 No.102435003

Anonymous 09/17/24(Tue)21:34:29 No.102435003

I ran TabbyAPI four times and the probabilities of the top token were different each time. Is there nondeterminism in exllamav2 or is my graphics card dying?

The whole reason I was doing this was tracking down an unexpectedly large discrepancy between Mistral Small 8.0 bpw exl2 and 8_0 gguf. The top token probability was 0.6850 on llama.cpp on the first run then stable at 0.6843 on multiple subsequent regenerations so I presume it's because something got cached, while using TabbyAPI it was 0.5913 then 0.5951 then 0.5762 then 0.5738. The (IMO rather large) difference ended up mattering in my case because the absurdity I was tracking down was right on the border of being excluded by a min p filter. After making sure samplers were neutral on both the next thing I did was check the SHA256 checksums of the files since I can't believe the difference was that great. If the problem isn't that my hardware is dying, then something is very wrong with at least one of those quants. Both were from LoneStriker. For that matter even if my hardware is dying it still could be true.

Anonymous
09/17/24(Tue)21:40:50 No.102435039

Anonymous 09/17/24(Tue)21:40:50 No.102435039

>>102434858
kino

Anonymous
09/17/24(Tue)21:42:30 No.102435053

Anonymous 09/17/24(Tue)21:42:30 No.102435053

>>102435003
>the absurdity I was tracking down was right on the border of being excluded by a min p filter

As an aside, the token to exclude could actually be the start of numerous reasonable clauses, but if that token is picked then the model with overwhelming likelihood (90% confidence in the top token) predicts the next token to be something that contradicts previous information. The gravitational pull of using certain phrases seems too great.

Anonymous
09/17/24(Tue)21:56:43 No.102435148

Anonymous 09/17/24(Tue)21:56:43 No.102435148

sometimes I think maybe feeding terabytes of scraped web text into a statistics engine will not possibly lead to agi
but I might just be retarded

Anonymous
09/17/24(Tue)22:07:34 No.102435220

Anonymous 09/17/24(Tue)22:07:34 No.102435220

sometimes I think anons don't realize what bad or good writing is because they don't read good books and have slop brains

Anonymous
09/17/24(Tue)22:12:56 No.102435260

Anonymous 09/17/24(Tue)22:12:56 No.102435260

>>102435220
female detected
males can easily detect good writing because their dicks will give it a standing ovation automatically

Anonymous
09/17/24(Tue)22:14:10 No.102435267

Anonymous 09/17/24(Tue)22:14:10 No.102435267

There's unironically nothing wrong with a singular instance of shivers running down a spine. The issue is when it appears in adjacent paragraphs and even adjacent sentences. At that point it becomes slop.

Anonymous
09/17/24(Tue)22:14:46 No.102435271

Anonymous 09/17/24(Tue)22:14:46 No.102435271

>>102435260
saying this when erotic literature is mainly written for and purchased by women
most men get off to coomer fan fiction

Anonymous
09/17/24(Tue)22:15:51 No.102435279

Anonymous 09/17/24(Tue)22:15:51 No.102435279

>>102435271
>saying this when erotic literature is mainly written for and purchased by women
hence why most of it is so terribly written and full of slop

Anonymous
09/17/24(Tue)22:17:23 No.102435292

Anonymous 09/17/24(Tue)22:17:23 No.102435292

>>102435279
do you think models are being trained off of erotic novels? because the slop you see is coming from coomer rp and fanfiction written by horny men.

Anonymous
09/17/24(Tue)22:19:40 No.102435311

Anonymous 09/17/24(Tue)22:19:40 No.102435311

>>102435220
A lot of people think Asimov was good.

Anonymous
09/17/24(Tue)22:20:00 No.102435314

Anonymous 09/17/24(Tue)22:20:00 No.102435314

>>102435292
NTA but you've clearly never read an actual book.
Spine shivers come from literally all human writing. Because feeling some sensation in one's spine is a literal actual natural reaction to visceral stimulation. And the model just generalizes it all into shivers down the spine, even though in actual writing it's somewhat varied. But the exact average of it is shivers in the downward direction along the spine.
This is like... 6+ month old discussion around here.

Anonymous
09/17/24(Tue)22:23:02 No.102435340

Anonymous 09/17/24(Tue)22:23:02 No.102435340

>>102435314
>NTA but you've clearly never read an actual book.
stop reading there. I'm more well read than 99% of the thread. You probably read ender's game and think it was a classic.

Anonymous
09/17/24(Tue)22:23:41 No.102435349

Anonymous 09/17/24(Tue)22:23:41 No.102435349

>>102435267
Same with eyes slop(widening narrowing rolling). Once in 2 pages is okay, every fucking paragraph until DRY eliminates them all is slop. I even had character with no eyes roll imaginary eyes.

Anonymous
09/17/24(Tue)22:25:14 No.102435368

Anonymous 09/17/24(Tue)22:25:14 No.102435368

>>102435314
I've never had shivers run down my spine IRL therefore SLOP.

Anonymous
09/17/24(Tue)22:25:33 No.102435374

Anonymous 09/17/24(Tue)22:25:33 No.102435374

Did some tests. Setup: 2x3090 ti.
Too lazy to make graphs, so read the numbers (c/t is cost per token, and measured as the wattage used divided by tokens/second):
400 w: 17 tok/s = 23.5 c/t
300 w: 17 tok/s = 17.65 c/t
250 w: 14.44 tok/s = 17.31 c/t
200 w: 10.80 tok/s = 18.52 c/t
Conclusion: 300 w optimal (in this setup).
Caveat: used nvidia-smi -pl and trusted software cap. No hw measurements.

Anonymous
09/17/24(Tue)22:26:07 No.102435383

Anonymous 09/17/24(Tue)22:26:07 No.102435383

File: artworks-AWmqTLlI9wQRX7hN(...).jpg (65 KB, 500x500)

65 KB JPG

>>102435340
>stop reading there. I'm more well read than 99% of the thread.
I read that. I read it in what I imagine this guy's voice sounds like.

Anonymous
09/17/24(Tue)22:26:34 No.102435390

Anonymous 09/17/24(Tue)22:26:34 No.102435390

>>102435003
It is not entirely deterministic, atomic addition sums numbers in a random order, and FP16 precision isn't exceptional, so a+b+c != b+a+c

Anonymous
09/17/24(Tue)22:29:36 No.102435412

Anonymous 09/17/24(Tue)22:29:36 No.102435412

File: Untitled.png (865 KB, 1080x2033)

865 KB PNG

SOAP: Improving and Stabilizing Shampoo using Adam
https://arxiv.org/abs/2409.11321
>There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: ShampoO with Adam in the Preconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo.
https://github.com/nikhilvyas/SOAP
neat

Anonymous
09/17/24(Tue)22:30:06 No.102435414

Anonymous 09/17/24(Tue)22:30:06 No.102435414

>>102434973
It's okay and the best for what it is. (20B ~ 30B) The vocabulary is also not terribly slopped like CR's earlier release. That shit was horrible.
It has okay spatial reasoning, but if you can run something larger, go with that.

Anonymous
09/17/24(Tue)22:30:16 No.102435415

Anonymous 09/17/24(Tue)22:30:16 No.102435415

>>102435374 (me)
>Conclusion: 300 w optimal (in this setup).
Because speed/cost is the efficiency (bang/buck) and
17/23.5 = 0.723
17/17.65 = 0.963
14.44/17.31 = 0.834
10.8/18.52 = 0.583
(in case it wasn't obv)

Anonymous
09/17/24(Tue)22:31:05 No.102435425

Anonymous 09/17/24(Tue)22:31:05 No.102435425

>>102435374
Cool dudes are now capping frequencies, you're not up to speed, dork

Anonymous
09/17/24(Tue)22:34:14 No.102435452

Anonymous 09/17/24(Tue)22:34:14 No.102435452

>>102435311
Asimov was good, but not for his writing style.

Anonymous
09/17/24(Tue)22:34:15 No.102435453

Anonymous 09/17/24(Tue)22:34:15 No.102435453

>>102435425
Ah, Pratyush Patel and Chaojie Zhang. This must be good.

Anonymous
09/17/24(Tue)22:36:34 No.102435479

Anonymous 09/17/24(Tue)22:36:34 No.102435479

>>102435271
Have you actually read women's crap? I got one book, I thought, oh, well, I'm sure the smut parts will be sort of short or whatever. It had 12 pages of a torture scene. A woman's idea of sexy is strange.

Anonymous
09/17/24(Tue)22:38:17 No.102435501

Anonymous 09/17/24(Tue)22:38:17 No.102435501

>>102434973
It's smarter than Nemo but only like 25% smarter. Not a nothingburger but I don't think it's as much of a capabilities leap for its size class as Nemo was.

Anonymous
09/17/24(Tue)22:39:11 No.102435510

Anonymous 09/17/24(Tue)22:39:11 No.102435510

>>102435501
>only like 25% smarter.
Which methodology are you using to come up with that number?

Anonymous
09/17/24(Tue)22:39:52 No.102435520

Anonymous 09/17/24(Tue)22:39:52 No.102435520

I'm using a 3070 and koboldcpp. Is there anything I need to do to make the AI faster beyond using CUBLAS?

Anonymous
09/17/24(Tue)22:40:28 No.102435523

Anonymous 09/17/24(Tue)22:40:28 No.102435523

>>102435510
Bellyfeel. Pretend I said "only somewhat" if putting a number on it is offensive to you.

Anonymous
09/17/24(Tue)22:42:03 No.102435540

Anonymous 09/17/24(Tue)22:42:03 No.102435540

>>102435523
I'm not offended. I was just walking you towards admitting that your statement was a bunch of made up bullshit.

Anonymous
09/17/24(Tue)22:43:02 No.102435552

Anonymous 09/17/24(Tue)22:43:02 No.102435552

File: fedora.png (993 KB, 1180x630)

993 KB PNG

>>102435540
you dropped this

Anonymous
09/17/24(Tue)22:43:33 No.102435562

Anonymous 09/17/24(Tue)22:43:33 No.102435562

>>102435540
>The only valid opinions are those that can be measured and quantified
redditbrained take

Anonymous
09/17/24(Tue)22:44:44 No.102435574

Anonymous 09/17/24(Tue)22:44:44 No.102435574

>>102435552
>>102435562
>if I cut my dick off I'm a woman

Anonymous
09/17/24(Tue)22:45:05 No.102435576

Anonymous 09/17/24(Tue)22:45:05 No.102435576

>>102435425
How do you do frequency capping on nvidia? I only see -pl flag.

Anonymous
09/17/24(Tue)22:49:59 No.102435619

Anonymous 09/17/24(Tue)22:49:59 No.102435619

>>102435576
nvidia-smi --lock-gpu-clocks 0,1600 --mode 1

Anonymous
09/17/24(Tue)22:50:25 No.102435624

Anonymous 09/17/24(Tue)22:50:25 No.102435624

>>102434973
It seems good for what it is, meaning 22b models. Obviously it's not going to beat recent models that are bigger
>>102435414
>like CR's earlier release
Do you mean CR pre plus? Used to use that a lot till today, that shit can't follow rules if it's life depended on it. Even basic shit like using "speech" or *actions* is beyond it without you pointing out or editing it's messages.

Anonymous
09/17/24(Tue)22:55:33 No.102435654

Anonymous 09/17/24(Tue)22:55:33 No.102435654

>>102435383
go read green eggs and ham kid, scram
pathetic

Anonymous
09/17/24(Tue)22:56:25 No.102435657

Anonymous 09/17/24(Tue)22:56:25 No.102435657

>Use Qwen 2.5 72B
>It's utter fucking garbage
Where'd Qwen go so wrong, bros?

Anonymous
09/17/24(Tue)22:56:45 No.102435660

Anonymous 09/17/24(Tue)22:56:45 No.102435660

>>102435574
>"gender and sex are completely different thing"
>"to be a real woman I need to cut my dick off!"
How can we take them seriously lol

Anonymous
09/17/24(Tue)22:57:26 No.102435662

Anonymous 09/17/24(Tue)22:57:26 No.102435662

>>102435657
Multilanguage

Anonymous
09/17/24(Tue)22:57:47 No.102435668

Anonymous 09/17/24(Tue)22:57:47 No.102435668

>>102435657
Qwen was never good desu, everytime I tried one of them it outputted random chinese tokens

Anonymous
09/17/24(Tue)22:58:53 No.102435678

Anonymous 09/17/24(Tue)22:58:53 No.102435678

>>102435662
not true, the Mistral models are really focused on multilanguage and they are good models overall, I'd even argue that the SOTA models (gpt4, Claude 3.5) are multimodal aswell

Anonymous
09/17/24(Tue)22:59:00 No.102435680

Anonymous 09/17/24(Tue)22:59:00 No.102435680

here's a tip, unslopnemo is the best 12b tune

Anonymous
09/17/24(Tue)22:59:53 No.102435691

Anonymous 09/17/24(Tue)22:59:53 No.102435691

>>102435501
25% would be worth it if it works past 16k context too.

Anonymous
09/17/24(Tue)23:03:27 No.102435713

Anonymous 09/17/24(Tue)23:03:27 No.102435713

Reasons to not use ollama

Anonymous
09/17/24(Tue)23:04:50 No.102435726

Anonymous 09/17/24(Tue)23:04:50 No.102435726

btw slop is only a problem for ESLs and retards who don't know to prompt

Anonymous
09/17/24(Tue)23:08:40 No.102435752

Anonymous 09/17/24(Tue)23:08:40 No.102435752

>>102435713
It is extremely annoying how it takes a long ass time to allocate space before downloading a model.
It offloads the model from memory if you don't use it for a few minutes.
The quants are hidden in the web page
You can't use models directly from hf, so you need to download them even if you already have them in your drive.

Anonymous
09/17/24(Tue)23:09:04 No.102435753

Anonymous 09/17/24(Tue)23:09:04 No.102435753

File: 1722400106196472.gif (677 KB, 1280x720)

677 KB GIF

If you care about good prose in smut, you're gay as fuck.

Anonymous
09/17/24(Tue)23:14:38 No.102435783

Anonymous 09/17/24(Tue)23:14:38 No.102435783

File: TWQDb-dhtguja7InokhzTg.gif (1.46 MB, 220x220)

1.46 MB GIF

>grr I don't know how to read
>I just need to coom!
anon just admitted to only having LLMs as a means for sexual gratification and thinks everyone else is gay

Anonymous
09/17/24(Tue)23:16:01 No.102435797

Anonymous 09/17/24(Tue)23:16:01 No.102435797

Anyone ever used a custom ChatML context/instruct set of instructions that actually improve roleplay? I dunno if snake oil or actually good.

Anonymous
09/17/24(Tue)23:18:01 No.102435814

Anonymous 09/17/24(Tue)23:18:01 No.102435814

>>102435753
When the prose is the only part of the smut, yeah, I do.

Anonymous
09/17/24(Tue)23:18:05 No.102435815

Anonymous 09/17/24(Tue)23:18:05 No.102435815

>>102435783
No, just you.

Anonymous
09/17/24(Tue)23:19:10 No.102435825

Anonymous 09/17/24(Tue)23:19:10 No.102435825

>>102435374
>>102435415
>simple limiting the max wattage instead of optimizing the voltage by undervolting (maximizing the clocks) wasted performance

Anonymous
09/17/24(Tue)23:30:27 No.102435912

Anonymous 09/17/24(Tue)23:30:27 No.102435912

>>102435657
>>Use Qwen 2.5 72B
Early access? I've heard they went the Meta route and filtered by bad words.

Anonymous
09/17/24(Tue)23:32:09 No.102435927

Anonymous 09/17/24(Tue)23:32:09 No.102435927

qwen was never good

Anonymous
09/17/24(Tue)23:32:13 No.102435928

Anonymous 09/17/24(Tue)23:32:13 No.102435928

https://nvlm-project.github.io/
vlm from nvidia (not up yet)

Anonymous
09/17/24(Tue)23:32:54 No.102435931

Anonymous 09/17/24(Tue)23:32:54 No.102435931

>>102435912
>I've heard they went the Meta route and filtered by bad words.
I mean, what else do you expect from chinks? They are as retarded as the west when it comes to censorship

Anonymous
09/17/24(Tue)23:35:48 No.102435956

Anonymous 09/17/24(Tue)23:35:48 No.102435956

>>102435619
Thanks.
>>102435825
How much of a performance hit are we talking though? A single nvidia-smi call is trivial effort.

Anonymous
09/17/24(Tue)23:38:50 No.102435976

Anonymous 09/17/24(Tue)23:38:50 No.102435976

>>102435931
The west censors shit to make themselves (white privileged college students) feel better
China censors shit on mass to protect their governments power and to restrict the masses free speech... wait a second, that's also why the west does it!

Anonymous
09/17/24(Tue)23:40:40 No.102435990

Anonymous 09/17/24(Tue)23:40:40 No.102435990

What will locusts do when OpenAI enforces CoT for all prompts, making it almost unjailbreakable?

Anonymous
09/17/24(Tue)23:40:50 No.102435991

Anonymous 09/17/24(Tue)23:40:50 No.102435991

>>102435976
annoying libertarian techbro hands wrote this post

Anonymous
09/17/24(Tue)23:41:43 No.102436005

Anonymous 09/17/24(Tue)23:41:43 No.102436005

>>102435990
I'm still surprised someone didn't find a jailbreak to make gpt4 repeat the CoT prompt again so we can see it

Anonymous
09/17/24(Tue)23:41:58 No.102436007

Anonymous 09/17/24(Tue)23:41:58 No.102436007

File: InstructModeSequences.png (358 KB, 1299x725)

358 KB PNG

I'm trying to set up the optimal settings for mistral nemo seen here:
https://rentry.org/freellamas
I'm a little confused on the
>Sequences for this model
section. This menu has completely changed nowadays.
Is input and output now: System Prompt Prefix and Suffix?
Which one is now Last Output Sequence? And which is Separator?
I tried to RTFM but didn't really find anything explaining the old settings.
I posted this on /aicg/ initially and they sent me here kek

Anonymous
09/17/24(Tue)23:44:02 No.102436022

Anonymous 09/17/24(Tue)23:44:02 No.102436022

File: 1726612890211848.png (35 KB, 532x949)

35 KB PNG

Found this in last thread. Can someone else confirm if the gap between IQ4_XS and Q3_K_L really is that significant?

Anonymous
09/17/24(Tue)23:47:59 No.102436054

Anonymous 09/17/24(Tue)23:47:59 No.102436054

>>102435753
Fuck you smugposter, the plot makes the porn better.

Anonymous
09/17/24(Tue)23:56:27 No.102436115

Anonymous 09/17/24(Tue)23:56:27 No.102436115

>>102436007
I just changed per https://hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF/discussions/1 and the writing of Mistral Small instantly changed for the better. That models still work but with degraded quality when one uses the wrong template is kind of maddening.

Anonymous
09/17/24(Tue)23:57:38 No.102436133

Anonymous 09/17/24(Tue)23:57:38 No.102436133

>>102435619
It's hovering around 216 watts for each card (capped at 400w). 14-15 tokens/s. 300w seems a tiny bit more efficient, but not bad. (Efficient here means "how much would I have to pay in electricity for it to write me some long blurb".)

Anonymous
09/18/24(Wed)00:19:21 No.102436319

Anonymous 09/18/24(Wed)00:19:21 No.102436319

>>102435991
nta but literally kill yourself. no, really. stop what you're doing and go swing from a rope

Anonymous
09/18/24(Wed)00:22:40 No.102436340

Anonymous 09/18/24(Wed)00:22:40 No.102436340

>>102434739
Let us know your findings, I'm currently trying out mistral large IQ2M (~2.7 bpw) because it's much faster than IQ4XS (1.3T/s vs 1T/s)
IQ3M seems to be unoptimized because it's slower than IQ4XS (0.85T/s)

Anonymous
09/18/24(Wed)00:25:09 No.102436360

Anonymous 09/18/24(Wed)00:25:09 No.102436360

File: 555.jpg (36 KB, 500x499)

36 KB JPG

>>102436319
>muh goberment censorship to something something the masses!
idk why you said "nta" when you're clearly the same annoying retard

Anonymous
09/18/24(Wed)00:33:21 No.102436432

Anonymous 09/18/24(Wed)00:33:21 No.102436432

>>102435412
LLM engineers have to re-invent shampoo and soap because they've never used IRL

Anonymous
09/18/24(Wed)00:35:08 No.102436439

Anonymous 09/18/24(Wed)00:35:08 No.102436439

How can I make Q6_K_L quants? Llama.cpp quantize doesn't support it and google gives no results.

Anonymous
09/18/24(Wed)00:39:00 No.102436467

Anonymous 09/18/24(Wed)00:39:00 No.102436467

File: 1708443456817321.png (44 KB, 788x292)

44 KB PNG

>>102436439
I think it's done like this.

Anonymous
09/18/24(Wed)00:42:58 No.102436493

Anonymous 09/18/24(Wed)00:42:58 No.102436493

>>102436467
Oh, so just --output-tensor-type Q8_0 and --token-embedding-type Q8_0?

Anonymous
09/18/24(Wed)00:49:24 No.102436544

Anonymous 09/18/24(Wed)00:49:24 No.102436544

Will Mistral really make any more MoE models? The primary reason they did it was because it was quick and cheap, as they could initialize from a smaller model they already trained, but now they have the compute and can train models like 123B fine.

Anonymous
09/18/24(Wed)00:51:05 No.102436555

Anonymous 09/18/24(Wed)00:51:05 No.102436555

>>102436544
We MIGHT get a revamp of 8x22 or 8x7 but they're not going to train a new one from scratch. MoEs were a MeMe.

Anonymous
09/18/24(Wed)00:54:36 No.102436581

Anonymous 09/18/24(Wed)00:54:36 No.102436581

>>102435752
>You can't use models directly from hf
apparently you can import them, but you have to make a one liner file.

Anonymous
09/18/24(Wed)00:57:18 No.102436593

Anonymous 09/18/24(Wed)00:57:18 No.102436593

>>102436493
I believe so, yes, when I saw discussion on that matter it was mentioning that Output Tensors and Embeddings at q8_0 was superior, so I think that's all it is. Whether it's truly better or not I can't say.

Anonymous
09/18/24(Wed)01:06:18 No.102436665

Anonymous 09/18/24(Wed)01:06:18 No.102436665

>>102435990
They use Opus

Anonymous
09/18/24(Wed)01:06:34 No.102436669

Anonymous 09/18/24(Wed)01:06:34 No.102436669

File: firefox_xDDUSc6GyQ.png (225 KB, 1154x327)

225 KB PNG

>>102435003
exl2 is hugely nondeterministic. Has always been. I see it as the price you pay for speed.

Anonymous
09/18/24(Wed)01:08:35 No.102436681

Anonymous 09/18/24(Wed)01:08:35 No.102436681

>>102435479
I've read one, within about 50 pages i'd actually run into 8 spine shivers, and that was not even the worst of it. It also had "that" sentence structure almost exclusively, the same one the LLMs spout continuously. Just one of these books has such a high slop density that it can poz an entire dataset, i wager. If it hadn't been published in 2021, i would have swore that it was written by an LLM beginning to end with no editing. But if i'm being honest, the LLM would probably do better.

Anonymous
09/18/24(Wed)01:11:02 No.102436701

Anonymous 09/18/24(Wed)01:11:02 No.102436701

training with my gpus at 200 watts is pretty comfy

Anonymous
09/18/24(Wed)01:16:37 No.102436738

Anonymous 09/18/24(Wed)01:16:37 No.102436738

File: firefox_aTVZd2RqqJ.png (281 KB, 750x1100)

281 KB PNG

Mistral Small almost seems to be capable of playing my dots game.

Anonymous
09/18/24(Wed)01:19:02 No.102436761

Anonymous 09/18/24(Wed)01:19:02 No.102436761

File: firefox_Bw5YZM04gZ.png (333 KB, 761x585)

333 KB PNG

Bros? Are we back?

Anonymous
09/18/24(Wed)01:22:21 No.102436790

Anonymous 09/18/24(Wed)01:22:21 No.102436790

File: firefox_nA561ujlnT.png (349 KB, 767x625)

349 KB PNG

>>102436761

Anonymous
09/18/24(Wed)01:26:29 No.102436827

Anonymous 09/18/24(Wed)01:26:29 No.102436827

File: firefox_DCuiPLgDDH.png (70 KB, 757x161)

70 KB PNG

Anonymous
09/18/24(Wed)01:27:42 No.102436836

Anonymous 09/18/24(Wed)01:27:42 No.102436836

>>102436790
>Newsflash
it was over before it began

Anonymous
09/18/24(Wed)01:45:56 No.102436989

Anonymous 09/18/24(Wed)01:45:56 No.102436989

>>102436836
Simply ban that word, and live happily ever after.

Anonymous
09/18/24(Wed)01:47:42 No.102437006

Anonymous 09/18/24(Wed)01:47:42 No.102437006

> Anything interesting being discussed in this thread? Use bullet points and add >>NNNNNNNN links.

---

Determinism in LLM Generation (>>102435003)
There seems to be nondeterminism in LLMs, with varying token probabilities during regeneration.
Some users suggest this is due to atomic addition and FP16 precision (>>102435383).

Efficiency and Power Consumption (>>102435374)
User experiments with different power settings for running LLMs.
User concludes that around 300 watts is the optimal setting for their setup (>>102435414, >>102435931).

Comparison of Mistral Models (>>102434973, >>102435510, >>102435520)
Discussion on the performance and capabilities of Mistral Small and other models.
Some users praise Mistral Small, while others find it lacking compared to larger models (>>102434943, >>102435825).

Sexual Preferences and Writing Quality (>>102435201, >>102435220, >>102435660)
Debate over the importance of prose quality in erotic literature and the gender dynamics related to writing and consuming such literature.

Jailbreaking AI Restrictions (>>102435990, >>102436005)
Speculation on the impact of potential OpenAI changes to enforce Chain of Thought (CoT) for prompts and its influence on ""jailbreaking"" AI restrictions.

Vector Store Optimization (>>102434943)
A user seeks advice on speeding up computational tasks related to vector stores and cosine similarity computations.

Quantizing Model Weights (>>102436432, >>102436467)
Users discuss the methods for creating Q6_K_L quantized model weights.

Model Training and Dataset Bias (>>102436701)
Conversation around the quality and bias found in training datasets for LLMs, with particular focus on erotic content.

Anonymous
09/18/24(Wed)01:49:39 No.102437029

Anonymous 09/18/24(Wed)01:49:39 No.102437029

>>102437006
>one of the quotes leads to something completely different

Anonymous
09/18/24(Wed)01:50:26 No.102437035

Anonymous 09/18/24(Wed)01:50:26 No.102437035

>>102437029
Just showing you how it. Mistral Small isn't perfect.

Anonymous
09/18/24(Wed)01:57:31 No.102437105

Anonymous 09/18/24(Wed)01:57:31 No.102437105

>st
>[DEPRECATION NOTICE] Model scopes for Vectore Storage are disabled, but will soon be required.
what is this

Anonymous
09/18/24(Wed)02:05:13 No.102437169

Anonymous 09/18/24(Wed)02:05:13 No.102437169

>>102435668
>random chinese tokens
Based Chink making burger feel the same as we ESL with their shitty models when is non english prompt and chats, even Claude do this shit and turn french into English tokens.

Anonymous
09/18/24(Wed)02:13:20 No.102437251

Anonymous 09/18/24(Wed)02:13:20 No.102437251

File: 1707310016651155.jpg (172 KB, 947x583)

172 KB JPG

>>102437105
well this sounds fun for my 15 rag databases

Anonymous
09/18/24(Wed)02:20:01 No.102437307

Anonymous 09/18/24(Wed)02:20:01 No.102437307

I guess 2 bit large is much better than 8 bit small?

Anonymous
09/18/24(Wed)02:32:29 No.102437401

Anonymous 09/18/24(Wed)02:32:29 No.102437401

>>102437307
Is 2 bit smol better than nemo?

Anonymous
09/18/24(Wed)02:36:23 No.102437429

Anonymous 09/18/24(Wed)02:36:23 No.102437429

>>102437251
Use case?

Anonymous
09/18/24(Wed)02:36:47 No.102437431

Anonymous 09/18/24(Wed)02:36:47 No.102437431

File: 1726006554401578.png (128 KB, 770x549)

128 KB PNG

>>102435520
pls resbond

Anonymous
09/18/24(Wed)02:43:40 No.102437478

Anonymous 09/18/24(Wed)02:43:40 No.102437478

>>102437429
rp

Anonymous
09/18/24(Wed)02:45:21 No.102437488

Anonymous 09/18/24(Wed)02:45:21 No.102437488

File: 1710043687041916.jpg (43 KB, 720x960)

43 KB JPG

>>102437431
Buy a new gpu

Anonymous
09/18/24(Wed)02:52:54 No.102437537

Anonymous 09/18/24(Wed)02:52:54 No.102437537

>>102437431
Exllamav2 is faster than koboldcpp however it has some non-deterministic behavior and requires your model + context to fit entirely in VRAM. Aside from that, nothing. Maybe practice chess or watch some shows while genning (coming from a 0.5t/s guy)

Anonymous
09/18/24(Wed)02:58:05 No.102437553

Anonymous 09/18/24(Wed)02:58:05 No.102437553

What's the smallest model that's effective for writing simple scripts or config files?

Anonymous
09/18/24(Wed)03:01:46 No.102437588

Anonymous 09/18/24(Wed)03:01:46 No.102437588

File: 900640_00001.webm (822 KB, 720x1280)

822 KB WEBM

>>102434744
it's still tuesday somewhere

Anonymous
09/18/24(Wed)03:12:48 No.102437666

Anonymous 09/18/24(Wed)03:12:48 No.102437666

>>102437553
Nemo

Anonymous
09/18/24(Wed)03:17:48 No.102437691

Anonymous 09/18/24(Wed)03:17:48 No.102437691

I have 16gb of vram. Glad mistral released the small model. Perfect for people like me.
In b4 people shit on it again for not being as good as models that need 2x24gb+ vram.

WIth Nemo the CoT prompt did not really improve the writing upon further testing.
That might be different with mistral 22b. Actually had a "oh also i should do X". (x being something in the context)
But need to test more.

Anonymous
09/18/24(Wed)03:17:48 No.102437692

Anonymous 09/18/24(Wed)03:17:48 No.102437692

>>102437666
Really? When I looked at programming benchmarks it scored worse than llama 3.1 8b even.

Anonymous
09/18/24(Wed)03:30:11 No.102437777

Anonymous 09/18/24(Wed)03:30:11 No.102437777

>>102437692
(In my experience) it performs better with consequential prompts when you need to correct or improve a solution

Anonymous
09/18/24(Wed)03:47:16 No.102437889

Anonymous 09/18/24(Wed)03:47:16 No.102437889

Reporting. Genning a lot of replies in parallel does not seem to reduce the quality.

I made a script to do sentiment classification for hotel reviews dataset, Russian language, and genning one by one achieves the same result as genning tens in parallel.

Incidentally, Nemo achieves the accuracy of 90%, Mistral-Small - 92%, and 2.75bpw Mistral-large - 89%.

Tabbyapi.

Anonymous
09/18/24(Wed)03:49:10 No.102437903

Anonymous 09/18/24(Wed)03:49:10 No.102437903

File: 1725697593000880.png (92 KB, 717x352)

92 KB PNG

>>102435797
Anyone?

Anonymous
09/18/24(Wed)03:51:06 No.102437920

Anonymous 09/18/24(Wed)03:51:06 No.102437920

>>102437903
ChatML sucks.assistant

Anonymous
09/18/24(Wed)03:58:36 No.102437975

Anonymous 09/18/24(Wed)03:58:36 No.102437975

File: nala small lora.png (217 KB, 922x610)

217 KB PNG

guys I think I might have just achieved AGI.

Anonymous
09/18/24(Wed)04:02:00 No.102438006

Anonymous 09/18/24(Wed)04:02:00 No.102438006

>>102437889
2.75bpwos...

Anonymous
09/18/24(Wed)04:07:56 No.102438047

Anonymous 09/18/24(Wed)04:07:56 No.102438047

Just tried out Mistral Large 2 and I legitimately like it more than Opus.

So far I only used proprietary models for ERP. Why did no one tell me straight up that open weight models already caught up in ERP?

Anonymous
09/18/24(Wed)04:12:57 No.102438081

Anonymous 09/18/24(Wed)04:12:57 No.102438081

>>102438047
Because most people using the api models can't run large at a worthwhile quant with enough speed.

Anonymous
09/18/24(Wed)04:13:06 No.102438083

Anonymous 09/18/24(Wed)04:13:06 No.102438083

It seems like even mistral can't agree on how its prompt template should look like.
Opened up tokenizer_config.json of large and small side by side. Small has spaces before [INST] and after, also a space before [/INST].
Large only has a space after [INST].

Anonymous
09/18/24(Wed)04:13:10 No.102438084

Anonymous 09/18/24(Wed)04:13:10 No.102438084

>>102437975
that's mistral small?

Anonymous
09/18/24(Wed)04:15:36 No.102438107

Anonymous 09/18/24(Wed)04:15:36 No.102438107

>>102438083
Small also seems to start every reply with a space, which is somewhat infuriating because I often want a single token response for classification, and I always get the space.

Anonymous
09/18/24(Wed)04:15:43 No.102438109

Anonymous 09/18/24(Wed)04:15:43 No.102438109

>>102438084
Well I did a LoRA on it, but it ended up overcooked (loss drops fast after the first epoch it seems) so I SLERP merged it back into the original model. And it's honestly pretty decent. Although it's one of those models where if you mention "consent" or "NSFW" anywhere in the prompt it will just go full on porn mode.

Anonymous
09/18/24(Wed)04:17:54 No.102438126

Anonymous 09/18/24(Wed)04:17:54 No.102438126

>>102438081
It doesn't help that the samplers for mistral-large on proxies are completely inadequate. This leads to the model being repetitive and drier than Popeyes biscuits. It also makes me wonder if some of the other corpo models could be saved if they had a more robust sampler setup.
Watch this become openai's next "breakthrough".

Anonymous
09/18/24(Wed)04:18:34 No.102438133

Anonymous 09/18/24(Wed)04:18:34 No.102438133

Mistral Small is actually great

Anonymous
09/18/24(Wed)04:19:41 No.102438143

Anonymous 09/18/24(Wed)04:19:41 No.102438143

>>102438126
what samplers do you use?

Anonymous
09/18/24(Wed)04:21:42 No.102438153

Anonymous 09/18/24(Wed)04:21:42 No.102438153

>>102438143
Aside from temp. Rep pen at 1.1, min-p and smoothing, very rarely dry at 0.45.

Anonymous
09/18/24(Wed)04:25:14 No.102438177

Anonymous 09/18/24(Wed)04:25:14 No.102438177

>>102438107
Cap at 2 tokens and trim...

Anonymous
09/18/24(Wed)04:26:14 No.102438183

Anonymous 09/18/24(Wed)04:26:14 No.102438183

>>102438153
I don't think rep pen and dry work very well together.

Anonymous
09/18/24(Wed)04:32:37 No.102438221

Anonymous 09/18/24(Wed)04:32:37 No.102438221

>>102438177
I look at probabilities, and for everything else I just had to look at probabilities of the first token.

Anonymous
09/18/24(Wed)04:38:26 No.102438262

Anonymous 09/18/24(Wed)04:38:26 No.102438262

n-nooo. the positivity bias in mistral small is so bad.
can finetunes even get rid of that? gemma2/llama3 is dead because of that.
nemo was good because of the convincing characters. even with all the retardation sometimes.

Anonymous
09/18/24(Wed)04:49:34 No.102438329

Anonymous 09/18/24(Wed)04:49:34 No.102438329

>>102438262
>can finetunes even get rid of that?
They got rid of it from Largestral so I don't see why not.

Anonymous
09/18/24(Wed)05:03:44 No.102438418

Anonymous 09/18/24(Wed)05:03:44 No.102438418

>>102435619
Compared to limiting power, this makes idle power consumption get worse for some reason, though.

Anonymous
09/18/24(Wed)05:10:48 No.102438482

Anonymous 09/18/24(Wed)05:10:48 No.102438482

>>102436467
I wish we could use different quantization levels (e.g. none) for the attention layers, or exclude specific layer numbers (e.g. the first and last one), that's possibly where most of the damage actually comes from (according to Meta).

Anonymous
09/18/24(Wed)05:34:18 No.102438652

Anonymous 09/18/24(Wed)05:34:18 No.102438652

Why is Euryale 2.2 souless compared to 2.1?

Anonymous
09/18/24(Wed)05:35:59 No.102438663

Anonymous 09/18/24(Wed)05:35:59 No.102438663

>>102438047
What prompt are you using anon?

Anonymous
09/18/24(Wed)05:39:39 No.102438689

Anonymous 09/18/24(Wed)05:39:39 No.102438689

>>102438047
It’s nowhere near Opus level. I really wish it was, but it’s not.
>Skill issue
Yeah, but not on my end.

Anonymous
09/18/24(Wed)05:53:43 No.102438768

Anonymous 09/18/24(Wed)05:53:43 No.102438768

So does Mistral Small have the same prompt format as Mistral Large, or is there a space before the [/INST]?

Anonymous
09/18/24(Wed)06:14:01 No.102438891

Anonymous 09/18/24(Wed)06:14:01 No.102438891

>>102436022
It depend on your model and particularly its size, you can even go as low as 2 for 70B, it'll be retarded for a 70b but still better than 12B.
You cannot just ask "whats the best quant", "is there always this drop" and get an answer that you can apply to all.

Anonymous
09/18/24(Wed)06:17:55 No.102438910

Anonymous 09/18/24(Wed)06:17:55 No.102438910

Any LLM based games anon can recommend? If I can snap a local to it, even better.
Only played that yandere thing a year back.

Anonymous
09/18/24(Wed)06:21:32 No.102438927

Anonymous 09/18/24(Wed)06:21:32 No.102438927

>>102438329
Eh? Which finetunes are you talking about? Been using base mistral large instruct 2.75bpw and it has the same positivity bias like almost every other model under the sun
PLEASE man I'm begging

Anonymous
09/18/24(Wed)06:24:37 No.102438943

Anonymous 09/18/24(Wed)06:24:37 No.102438943

>>102438927
>>102438329
Would like to know too. I tried Magnum and it lost way too much intelligence.

Anonymous
09/18/24(Wed)06:37:17 No.102439016

Anonymous 09/18/24(Wed)06:37:17 No.102439016

>>102438768
There are spaces before both [INST] and [/INST], and system message is placed before the last message from user rather than before the last message from user.

Anonymous
09/18/24(Wed)06:46:36 No.102439070

Anonymous 09/18/24(Wed)06:46:36 No.102439070

>2 years of shitposting and nobody has made a single private quantifiable benchmark to test a new model's vibes
I'm making one tonight. It's pretty close to ayumi's naughty words bench, among other things

Anonymous
09/18/24(Wed)06:51:31 No.102439109

Anonymous 09/18/24(Wed)06:51:31 No.102439109

>>102439070
>It's pretty close to ayumi's naughty words benc
straight to the trash

Anonymous
09/18/24(Wed)06:53:03 No.102439117

Anonymous 09/18/24(Wed)06:53:03 No.102439117

>>102438943
try one of the 123b merges then

Anonymous
09/18/24(Wed)06:54:41 No.102439134

Anonymous 09/18/24(Wed)06:54:41 No.102439134

Why is it that the larger the models the more assistant slop they become?
Is it because for the size you need synthetic gpt output training data?

Anonymous
09/18/24(Wed)06:57:11 No.102439150

Anonymous 09/18/24(Wed)06:57:11 No.102439150

new model when?

Anonymous
09/18/24(Wed)06:58:38 No.102439159

Anonymous 09/18/24(Wed)06:58:38 No.102439159

>>102439134
yeah pretty much

Anonymous
09/18/24(Wed)07:07:30 No.102439223

Anonymous 09/18/24(Wed)07:07:30 No.102439223

>>102439117
I bet you don't realize how dumb merges are since you're using already dumb low bpw quants.

Anonymous
09/18/24(Wed)07:10:16 No.102439242

Anonymous 09/18/24(Wed)07:10:16 No.102439242

>>102439134
Training larger models is expensive, so they tend to overly compensate to mitigate risk. Large models are also primarily targeted towards corporate clients.

Anonymous
09/18/24(Wed)07:24:51 No.102439378

Anonymous 09/18/24(Wed)07:24:51 No.102439378

File: jrf8e4xla9241.png (1.24 MB, 1920x1080)

1.24 MB PNG

Hi all, Drummer here...

https://huggingface.co/BeaverAI/Cydonia-22B-v1a-GGUF/tree/main
Formats: Metharme, Mistral, Text Completion

https://fridge-checked-interpretation-hash.trycloudflare.com/

Anonymous
09/18/24(Wed)07:36:36 No.102439484

Anonymous 09/18/24(Wed)07:36:36 No.102439484

>>102439378
>Metharme
That's a word I didn't think I'd see any time soon. Ancient pyg prompting format.

Hi all, Drummer here...
09/18/24(Wed)07:39:34 No.102439509

Hi all, Drummer here... 09/18/24(Wed)07:39:34 No.102439509

>>102439484
I see a template that doesn't require added tokens & assistant tag, I take it.

Anonymous
09/18/24(Wed)08:14:47 No.102439811

Anonymous 09/18/24(Wed)08:14:47 No.102439811

>>102439509
Should've taken alpaca then.

Anonymous
09/18/24(Wed)08:25:37 No.102439929

Anonymous 09/18/24(Wed)08:25:37 No.102439929

>>102437975
>shreads your boxers
>wet pussy grinding against your bulging cock, separated only by your underwear
>now, take off your fucking clothes, or I'll spread them off you

Yes, truly AGI, can't even remember the status of clothes in the same reply, definitely not a <70b model

Hi all, Drummer here...
09/18/24(Wed)08:29:31 No.102439964

Hi all, Drummer here... 09/18/24(Wed)08:29:31 No.102439964

>>102439811
No, concedo. Alpaca's taken and it takes more than 1 token to form "Instruction" and "Response". And no system tag for the one of many versions on Alpaca.

Anonymous
09/18/24(Wed)08:32:03 No.102439988

Anonymous 09/18/24(Wed)08:32:03 No.102439988

>>102439964
<|system|> takes 5 tokens.
>Alpaca's taken
lol what

Anonymous
09/18/24(Wed)08:35:25 No.102440008

Anonymous 09/18/24(Wed)08:35:25 No.102440008

>>102439964
He wrote this reply using his garbage ESL model.

Anonymous
09/18/24(Wed)08:39:51 No.102440041

Anonymous 09/18/24(Wed)08:39:51 No.102440041

As a beginner have only tried gemmacpp with their model because it's very fast, and seems low on resources.

Want to embeed one into my website, a simple stdout from command line, if you tell gamma something it's useless:
>You are an Alien from Planet lol
Hi I'm gamma developed by google... I can do...
>Who are you?
Hii alien from Planet lol, I'm gamma developed by google...

So, my questions are:
(1)What better inference engine(that's their name?) there is? Preferably cpp for lightweight.
(2)What is a good model since you can't download gpt4 normally(there are probably some leaks online?)

Regards.

Anonymous
09/18/24(Wed)08:49:14 No.102440124

Anonymous 09/18/24(Wed)08:49:14 No.102440124

>>102439378
What's with these random ass names? Why not use base model name + dataset name? The magnum guys almost got it right but they decided to remove the base model names. You're all fucking dumbasses.

Anonymous
09/18/24(Wed)08:54:24 No.102440183

Anonymous 09/18/24(Wed)08:54:24 No.102440183

>>102440041
There's many models in the gemma family, so i have no idea what you're running on or what you consider fast.
>(1)What better inference engine(that's their name?) there is? Preferably cpp for lightweight.
llama.cpp or kobold.cpp
>(2)What is a good model since you can't download gpt4 normally(there are probably some leaks online?)
Mistral Nemo 12b. or llama3.1 8b. Move up if you have spare resources, quantize more heavily if you don't. Plenty of .gguf models on huggingface (.gguf being the format llama.cpp and kobold.cpp use). https://huggingface.co/bartowski has a bunch for you to try. Roughly, the gguf you have to download is any below your vram/ram capacity. for 16gb vram, download any under 16gb. Leave some space for context.
Read llama.cpp's docs and the README.md.

Anonymous
09/18/24(Wed)09:02:24 No.102440258

Anonymous 09/18/24(Wed)09:02:24 No.102440258

File: Untitled.png (165 KB, 1239x730)

165 KB PNG

>>102440041
a bit outside my wheelhouse, but koboldcpp has this thing that looks pertinent
https://github.com/LostRuins/koboldcpp/releases

Anonymous
09/18/24(Wed)09:09:07 No.102440344

Anonymous 09/18/24(Wed)09:09:07 No.102440344

>>102440258
If you want to embed it into your site you'll be better served by a json API (which is how most chat uis work), not cli. Both llama.cpp and kobold.cpp have it. Probably every other engine as well, but i've only ever used llama.cpp. Both have an embedded web ui (llama-server) for you to play around and learn how to use them.

Anonymous
09/18/24(Wed)09:37:17 No.102440629

Anonymous 09/18/24(Wed)09:37:17 No.102440629

>>102440183
>visit the gemma.cpp models on the Hugging Face Hub
Used this (https://huggingface.co/google/recurrentgemma-2b-it-sfp-cpp) because was the last updated.

(1) What does quantize mean?
My setup is an rx570 4gb and 16gb ram.
(2) Are there any free services/Apis that can be used to receive results as follows:
>Input: You are a nigger
>Input: What Color are you?
>Black

Anonymous
09/18/24(Wed)09:48:33 No.102440739

Anonymous 09/18/24(Wed)09:48:33 No.102440739

>>102440629
>(1) What does quantize mean?
Basically, compressing the model. The more you quantize it, the smaller the model gets, (so you can fit bigger models in your gpu), but it comes at a loss in accuracy with respect to the original model. You can run 2B models in Q8_0 without any issues. Bigger models will need more aggressive quantization. You'll have to play around with what's the best for you (big model+aggressive quant or small model+highbit quant).
>(2) Are there any free services/Apis that can be used to receive results as follows:
Probably, but i don't use any. You can run llama.cpp (their llama-server) example and make queries to it.
Read https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
You can use this model for a close equivalent: https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/blob/main/gemma-2-2b-it-Q8_0.gguf
There's plenty of models to try.

Anonymous
09/18/24(Wed)09:53:33 No.102440797

Anonymous 09/18/24(Wed)09:53:33 No.102440797

Is there any hope for people like Yann?

Anonymous
09/18/24(Wed)09:54:13 No.102440807

Anonymous 09/18/24(Wed)09:54:13 No.102440807

>>102440797
In what aspect? He'll keep receiving grants for doing useless work thanks to his existing connections.

Anonymous
09/18/24(Wed)09:55:21 No.102440819

Anonymous 09/18/24(Wed)09:55:21 No.102440819

Abandon all hope, ye who enter here.

Anonymous
09/18/24(Wed)09:59:25 No.102440867

Anonymous 09/18/24(Wed)09:59:25 No.102440867

For me crazy thursday is just a regular thursday

Anonymous
09/18/24(Wed)10:01:33 No.102440899

Anonymous 09/18/24(Wed)10:01:33 No.102440899

For me, today is like any other day.

Anonymous
09/18/24(Wed)10:02:21 No.102440911

Anonymous 09/18/24(Wed)10:02:21 No.102440911

>>102439929
Did it work? Are you a woman now?

Anonymous
09/18/24(Wed)10:02:42 No.102440914

Anonymous 09/18/24(Wed)10:02:42 No.102440914

>>102440867
Qwen 2.5 may be quite possibly the biggest nothing burger I've ever seen.

Anonymous
09/18/24(Wed)10:04:31 No.102440940

Anonymous 09/18/24(Wed)10:04:31 No.102440940

>>102440914
But you haven't seen it yet.

Anonymous
09/18/24(Wed)10:05:28 No.102440956

Anonymous 09/18/24(Wed)10:05:28 No.102440956

>>102439378
transformers fp16 weights, when?

Anonymous
09/18/24(Wed)10:05:49 No.102440962

Anonymous 09/18/24(Wed)10:05:49 No.102440962

It seems the Mistral shills are afraid of Qwen...

Anonymous
09/18/24(Wed)10:06:07 No.102440968

Anonymous 09/18/24(Wed)10:06:07 No.102440968

>>102440807
Look at his "cat-like" AI thing. Is there any hope for people who think like that?

Anonymous
09/18/24(Wed)10:14:55 No.102441080

Anonymous 09/18/24(Wed)10:14:55 No.102441080

ここから入らんとする者は一切の希望を放棄せよ

Anonymous
09/18/24(Wed)10:16:03 No.102441093

Anonymous 09/18/24(Wed)10:16:03 No.102441093

If I only care about context length for the model to maintain a story line and coherence about past events, is mistral large or any other 70b or 405b model worth it or is a meme?
For reference, nemo is able to do 64k length for a while, it remembers characters and what not, but the more you approach that number of tokens the more it gets hard for it to be coherent.
Or lets say I told the model to write a 100k words story.
Is it worth it to invest in a multiple gpu rig for bigger models just for that, or will I face the same problem?

Anonymous
09/18/24(Wed)10:16:08 No.102441094

Anonymous 09/18/24(Wed)10:16:08 No.102441094

>>102440962
>he doesn't know

Anonymous
09/18/24(Wed)10:17:52 No.102441115

Anonymous 09/18/24(Wed)10:17:52 No.102441115

File: 1718682325840.png (11 KB, 605x152)

11 KB PNG

how do you answer without sounding mad

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.