/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/26/24(Fri)06:00:34 No.101578323

File: 1719850934570705.jpg (566 KB, 1792x2304)

566 KB JPG

/lmg/ - Local Models General Anonymous 07/26/24(Fri)06:00:34 No.101578323 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101571366 & >>101567223

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/26/24(Fri)06:01:24 No.101578328

Anonymous 07/26/24(Fri)06:01:24 No.101578328

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101571366

--Cost-efficiency of running large models locally: >>101572516 >>101572589 >>101572632 >>101572853 >>101576388 >>101576426 >>101576510 >>101576559 >>101576581 >>101576624
--Nemo 12B vs Llama3 70B 3.1 intelligence comparison: >>101572528 >>101572560 >>101572591 >>101572641 >>101572699
--NVLink improves tensor parallelism inference speed: >>101571658 >>101571704 >>101571822 >>101571862
--Effectiveness of racist AI assistant jailbreak demonstrated: >>101532440 >>101575408 >>101575447 >>101575490 >>101575672 >>101575515 >>101575570
--Civitai Stable Diffusion checkpoint conversion discussion: >>101575901 >>101575996 >>101576080 >>101576482
--Inference speed comparison between 4x3090 and A6000 + 2x 3090 setups: >>101571738 >>101571842 >>101572499
--Frustrations with TTS software and streaming issues: >>101575338 >>101576310 >>101576580 >>101576600 >>101576631 >>101576672
--TELE-FLM model and its requirements: >>101576988 >>101577044 >>101577047 >>101577126
--SPPO fine-tunes and reward model alignment: >>101575301 >>101575484
--Runway allegedly mass downloaded YouTube videos for AI training, raising questions about copyright infringement: >>101572829 >>101572849 >>101572847 >>101572890 >>101572905 >>101573024 >>101572885 >>101572927 >>101575419
--OOC notes and ERP in AI models: >>101572386 >>101572401 >>101572448 >>101573858 >>101574522 >>101577014
--Llama 3.0 vs 3.1 upgrade and mistral large model performance comparison: >>101572977 >>101573016 >>101573059 >>101573144
--Best local model for ERP with 8GB VRAM: >>101573107 >>101573157 >>101573167 >>101573215 >>101574128
--AI-generated images and recursively generated data discussion: >>101577399 >>101577447 >>101577539 >>101577585 >>101577563 >>101577582
--AI labs discussion about tokenization, scaling, and dataset quality: >>101573356 >>101573443 >>101573746
--Miku (free space): >>101573525 >>101577083

►Recent Highlight Posts from the Previous Thread: >>101571373

Anonymous
07/26/24(Fri)06:04:11 No.101578339

Anonymous 07/26/24(Fri)06:04:11 No.101578339

is gemma2 fix yet

Anonymous
07/26/24(Fri)06:05:04 No.101578345

Anonymous 07/26/24(Fri)06:05:04 No.101578345

hardy news worthy,cyberagent had released a japanese based finetune on top of Llama-3.1-70B-Instruct
https://huggingface.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407

Anonymous
07/26/24(Fri)06:09:28 No.101578369

Anonymous 07/26/24(Fri)06:09:28 No.101578369

>>101578339
it's been fixed for a while now. And isn't gemma2 9b practically obsolete now anyway since llama 3.1 and Mistral Nemo?

Anonymous
07/26/24(Fri)06:11:10 No.101578379

Anonymous 07/26/24(Fri)06:11:10 No.101578379

>>101578369
I'd say gemma2 is a lil bit smarter than 3.1, for erp id recommend tiger gemma
the prose of that finetune is a lil silly though
3.1 seems to be more versatile with prose

Anonymous
07/26/24(Fri)06:12:25 No.101578383

Anonymous 07/26/24(Fri)06:12:25 No.101578383

>>101578369
>And isn't gemma2 9b practically obsolete
i like to close my eyes and dream of a gemma2.1

Anonymous
07/26/24(Fri)06:13:02 No.101578388

Anonymous 07/26/24(Fri)06:13:02 No.101578388

>>101578369
hows llama 3.1 8b? that would be a more fair comparsion

Anonymous
07/26/24(Fri)06:14:18 No.101578399

Anonymous 07/26/24(Fri)06:14:18 No.101578399

>>101578379
I would love to use Gemma but 8k context is just not enough. I honestly do not understand how we could deal with 600/2024 tokens in the past.

Anonymous
07/26/24(Fri)06:14:36 No.101578402

Anonymous 07/26/24(Fri)06:14:36 No.101578402

>>101578369
Has it been fixed on kobold?

Anonymous
07/26/24(Fri)06:16:22 No.101578411

Anonymous 07/26/24(Fri)06:16:22 No.101578411

>>101578388
in *my own* experience, gemma2-9b was state of the art for small models until recently.
Mistral Nemo is better at RP and creativity but a worse assistant.
Llama 3.1 8b is the better assistant. I think it's much better than 3.0 especially in coding and in German it's not entirely broken anymore unlike all the other small models.

Anonymous
07/26/24(Fri)06:16:33 No.101578413

Anonymous 07/26/24(Fri)06:16:33 No.101578413

File: The-Phi-3-small-language-(...).jpg (395 KB, 2000x1125)

395 KB JPG

Have any of you tried Phi 3 from Microsoft?

The smallest version is only 3.8 billion parameters so it will use less RAM. I might try it out.

Pic is from Microsoft, maybe it's bullshit but whatever.

Anonymous
07/26/24(Fri)06:18:48 No.101578425

Anonymous 07/26/24(Fri)06:18:48 No.101578425

>>101578413
>smaller but higher quality
just like my penis

Anonymous
07/26/24(Fri)06:22:34 No.101578450

Anonymous 07/26/24(Fri)06:22:34 No.101578450

>>101578413
Its retarded cesored and corprate and even the biggest vramlets can run l3

Anonymous
07/26/24(Fri)06:23:40 No.101578460

Anonymous 07/26/24(Fri)06:23:40 No.101578460

>>101578413
>Model quality, as measured on MMLU benchmarks
lol

Anonymous
07/26/24(Fri)06:24:26 No.101578466

Anonymous 07/26/24(Fri)06:24:26 No.101578466

File: file.jpg (1.1 MB, 3000x4000)

1.1 MB JPG

>>101578413
People are a bit wary since their team loves to overpromise a bit, like saying phi2 was comparable to GPT-3.5.
Phi3-mini does keep it's promises somewhat but you can really feel it's lack of parameters. Fact-based knowledge is very unreliable and it quickly gets confused with context after one chat turn. And it's somehow the most censored model I've tried to far.
Maybe the recent update has improved this a bit.
But I get the feeling that very few are actually using phi3-mini and I've never heard anyone using the other sizes..

Anonymous
07/26/24(Fri)06:25:53 No.101578470

Anonymous 07/26/24(Fri)06:25:53 No.101578470

File: phi.png (135 KB, 1094x816)

135 KB PNG

>>101578413

Anonymous
07/26/24(Fri)06:30:32 No.101578494

Anonymous 07/26/24(Fri)06:30:32 No.101578494

something is wrong with kobold's newer layer auto select, it wants to set layers to high and actually slows things down. it picks 41 layers for my 16gb card on a 70b, old version picked 12 which is lower than i can actually fit but its faster than 41.

Anonymous
07/26/24(Fri)06:31:30 No.101578503

Anonymous 07/26/24(Fri)06:31:30 No.101578503

>>101578494
have you tried setting the layers to -1?

Anonymous
07/26/24(Fri)06:33:26 No.101578515

Anonymous 07/26/24(Fri)06:33:26 No.101578515

>>101578503
no, what does that do? it sets to 200 when it thinks it can fit the entire model in vram, unless they changed that number too

Anonymous
07/26/24(Fri)06:35:42 No.101578531

Anonymous 07/26/24(Fri)06:35:42 No.101578531

>>101578323
Jart won.

Anonymous
07/26/24(Fri)06:36:36 No.101578538

Anonymous 07/26/24(Fri)06:36:36 No.101578538

>>101578515
from the 1.71 release:
>You can also now set --gpulayers to -1 to have KoboldCpp guess how many layers to be used. Note that this is still experimental, and the estimation may not be fully accurate, so you will still get better results manually selecting the GPU layers to use.

Anonymous
07/26/24(Fri)06:37:45 No.101578549

Anonymous 07/26/24(Fri)06:37:45 No.101578549

File: mistral_new.png (80 KB, 651x652)

80 KB PNG

Is this the correct way to use the new Mistral format? The system prompt before every message with two new lines as a break. Does seem too complicated.

Anonymous
07/26/24(Fri)06:39:32 No.101578562

Anonymous 07/26/24(Fri)06:39:32 No.101578562

>>101578538
i mean in the little ui window that opens up which is probably the same thing. its always made a conservative guess at the amount of layers. as of 1.71 its guessing far to high.

Anonymous
07/26/24(Fri)06:40:34 No.101578568

Anonymous 07/26/24(Fri)06:40:34 No.101578568

>>101578450
>Its retarded cesored and corprate
Based, that's what I need for my purposes

>>101578466
Fair enough, I should probably just try it out

>>101578470
I'm too dumb to understand this

Anonymous
07/26/24(Fri)06:42:20 No.101578586

Anonymous 07/26/24(Fri)06:42:20 No.101578586

>>101578568
>I'm too dumb to understand this
It's a joke paper about small models performing surprisingly well on some benchmarks. The solution: just train on benchmarks to get good benchmarks.

Anonymous
07/26/24(Fri)07:11:27 No.101578826

Anonymous 07/26/24(Fri)07:11:27 No.101578826

Is mini magnum using the original mistral format, or the new updated one from Nemo that doesn't use spaces between [INST] [/INST]?

Anonymous
07/26/24(Fri)07:13:28 No.101578847

Anonymous 07/26/24(Fri)07:13:28 No.101578847

>>101578826
They trained it with spaces and broke it.

Anonymous
07/26/24(Fri)07:21:31 No.101578929

Anonymous 07/26/24(Fri)07:21:31 No.101578929

>>101578847
I knew something was up when I saw the template on magnum's page. So now its a mix of spaces and non spaces thanks to their training...

Anonymous
07/26/24(Fri)07:25:58 No.101578967

Anonymous 07/26/24(Fri)07:25:58 No.101578967

2 questions:
>Is this true that Nemo 12B requires low temperatures? This is what Mistral team say but what are your experiences?
>Does Nemo 12B have the same instruct format as Mixtral or is it slightly different? I saw someone talking about spaces and shit.

Anonymous
07/26/24(Fri)07:28:12 No.101578989

Anonymous 07/26/24(Fri)07:28:12 No.101578989

>>101578967
You can set the temperature higher for RP. There's a slight difference in instruct format.

Anonymous
07/26/24(Fri)07:28:47 No.101578999

Anonymous 07/26/24(Fri)07:28:47 No.101578999

I'll reiterate.
Llama-3.1-8B < Mistral AI NeMo 12B < Gemma-2-9B

Llama 3.1 is frigid and even bitchy if you use the "assistant" role ("No, can't do that, kthxbye").
Mistral Nemo is mostly uncensored, but dumb.
Gemma 2 understands psychology well; has restraint by default but can also be horny and explicit if you tell it to.

Anonymous
07/26/24(Fri)07:30:02 No.101579009

Anonymous 07/26/24(Fri)07:30:02 No.101579009

>>101578999
Gemma 9B isn't even better than Llama 3.0, not even mention other models

Anonymous
07/26/24(Fri)07:33:39 No.101579031

Anonymous 07/26/24(Fri)07:33:39 No.101579031

>>101578531
jart cute ngl

Anonymous
07/26/24(Fri)07:36:47 No.101579067

Anonymous 07/26/24(Fri)07:36:47 No.101579067

>>101578999
I stopped using gemma 2 because it ignores my instructions, and also generates much more slop than nemo
nemo is schizophrenic but at least it's fun to use

Anonymous
07/26/24(Fri)07:39:34 No.101579094

Anonymous 07/26/24(Fri)07:39:34 No.101579094

>>101579031
you're a faggot anon, and that's ok :3

Anonymous
07/26/24(Fri)07:40:58 No.101579102

Anonymous 07/26/24(Fri)07:40:58 No.101579102

>>101579067
I add instructions regarding format and general behavior as an author note at depth 0 and it follows them all without fail. The 9B version might not follow them well compared to the 27B variant, but it doesn't ignore them.

Anonymous
07/26/24(Fri)07:44:08 No.101579133

Anonymous 07/26/24(Fri)07:44:08 No.101579133

File: mistral nemo format.png (82 KB, 485x811)

82 KB PNG

>>101578549
Don't know, but the sequences from >>101550851 looks like picrel.

Anonymous
07/26/24(Fri)07:48:22 No.101579172

Anonymous 07/26/24(Fri)07:48:22 No.101579172

Hi, retard here.
Can you use the 4bit/8bit cache in ooba for all models or does it only work for certain ones?
Would you recommend always using it?

Anonymous
07/26/24(Fri)08:00:02 No.101579280

Anonymous 07/26/24(Fri)08:00:02 No.101579280

>>101579067
> schizophrenic
Lower temp

Anonymous
07/26/24(Fri)08:01:36 No.101579296

Anonymous 07/26/24(Fri)08:01:36 No.101579296

File: re.png (47 KB, 459x511)

47 KB PNG

This setting work pretty good for me with Nemo.

Anonymous
07/26/24(Fri)08:04:27 No.101579319

Anonymous 07/26/24(Fri)08:04:27 No.101579319

>>101579172
Okay, after looking around a bit it looks like it's only for exl2.
4bit cache seems to save a ton of space so I'll just use it until I read somewhere that it's absolute shit.

Anonymous
07/26/24(Fri)08:05:53 No.101579335

Anonymous 07/26/24(Fri)08:05:53 No.101579335

>>101577534
Thanks, will pull and try again, it was an old build

Anonymous
07/26/24(Fri)08:06:00 No.101579337

Anonymous 07/26/24(Fri)08:06:00 No.101579337

>>101579133
></s>
What about <s>? I didn't see it in magnum but I think I saw it somewhere in mistral standard formatting. Btw I hope everyone involved in creating any LLM formatting dies.

Anonymous
07/26/24(Fri)08:16:29 No.101579433

Anonymous 07/26/24(Fri)08:16:29 No.101579433

>>101578967
>Is this true that Nemo 12B requires low temperatures?
For RP it is certainly not true.
>Does Nemo 12B have the same instruct format as Mixtral or is it slightly different? I saw someone talking about spaces and shit.
I've been using the plain Mistral formatting + instruct presets in SillyTavern without any problems so far.

Anonymous
07/26/24(Fri)08:18:20 No.101579450

Anonymous 07/26/24(Fri)08:18:20 No.101579450

>>101578549
If you do it like that, you'll have your chat formatted as
>Hello this is the character's first message</s>[INST] Continue the story with...bla
>Hey, can you say nigger?[/INST]
>Nigger!</s>[INST] Continue the story with...bla
>Thank you. I feel better now.[/INST]
Which is to say, you are adding what should be your sys prompt to every User message.
The chat template has a conditional to just add that to the very first user message before assistant says anything.
This >>101579133 is closer, although not exactly compliant with the provided template, but it should work.

>>101579337
<s> is the BOS token and some backends like llama.cpp add that automatically, so read the documentation of the software you are using.
You should never add that twice as most models get extra retarded when they read two BOS tokens for whatever reason.

Anonymous
07/26/24(Fri)08:18:49 No.101579455

Anonymous 07/26/24(Fri)08:18:49 No.101579455

>be 16gb vramlet
>have been using unholy v2 13b q6 gguf with koboldcpp+sillytavern for an eternity
>nothing else seems to work as well
>hear how great nemo and gemma2 are
>update + download everything
>gemma2: "Remember, it's crucial to handle sensitive themes responsibly and avoid portraying potentially harmful content."
>nemo instruct just crashes koboldcpp on boot up
alright back to the old reliable it is

Anonymous
07/26/24(Fri)08:19:51 No.101579466

Anonymous 07/26/24(Fri)08:19:51 No.101579466

>>101579319
It’s available on gguf as well called kv quant

Anonymous
07/26/24(Fri)08:20:34 No.101579472

Anonymous 07/26/24(Fri)08:20:34 No.101579472

What if the AI generated a 3d world, and then an anime girl viewed that world from within using a vision model?

Anonymous
07/26/24(Fri)08:21:17 No.101579481

Anonymous 07/26/24(Fri)08:21:17 No.101579481

>>101579455
Stop using up my screen space retard kun

Anonymous
07/26/24(Fri)08:23:51 No.101579495

Anonymous 07/26/24(Fri)08:23:51 No.101579495

First impressions using Mistral-Large-Instruct-2407.IQ2_M.gguf
I ran with llama-server with -c 16384 and K and V both set to Q4_0 to fit in 48
Despite being a 2 quant it seems very intelligent. Speed is a bit slower than 70Bs which made swiping due to occasional quant retardation painful.
It's far, far more compliant than e.g. gemma 2, and lacks the obstinate assistant personality. I will have to dial back my overblown instruct prompt developed during struggling with gemma
Needless to say I encountered zero refusals so far. But, I think I had the fastest coom yet. Need more 'stick' time

I'm scared boys, am I going to buy another $6000 gpu just to run this thing at Q5 or Q6?

Anonymous
07/26/24(Fri)08:23:55 No.101579498

Anonymous 07/26/24(Fri)08:23:55 No.101579498

>>101579455
im sorry, but at this point i'd suggest you just quit this hobby or go cloud only.

Anonymous
07/26/24(Fri)08:24:56 No.101579506

Anonymous 07/26/24(Fri)08:24:56 No.101579506

>>101579455
To fix the nemo crashing thing, lower your context from 128k to 16k.

Anonymous
07/26/24(Fri)08:25:19 No.101579511

Anonymous 07/26/24(Fri)08:25:19 No.101579511

>>101579455
Did you update koboldcpp? The version that supports Nemo came out a couple of days ago.

Anonymous
07/26/24(Fri)08:26:18 No.101579521

Anonymous 07/26/24(Fri)08:26:18 No.101579521

>>101579455
How do people even get outputs like that? I've only gotten something like that once on Athena 70b which was a terribly cucked fine tune for gaming benchmarks.

Anonymous
07/26/24(Fri)08:29:28 No.101579547

Anonymous 07/26/24(Fri)08:29:28 No.101579547

Gemma 2 is the most overhyped model I have seen in those threads. I am genuinely starting to believe paid shills are here.

Anonymous
07/26/24(Fri)08:31:00 No.101579561

Anonymous 07/26/24(Fri)08:31:00 No.101579561

>>101579547
I'm glad I'm not the only one to think this.

Anonymous
07/26/24(Fri)08:31:11 No.101579564

Anonymous 07/26/24(Fri)08:31:11 No.101579564

>>101579547 (me)
My name is Petra, by the way.

Anonymous
07/26/24(Fri)08:31:25 No.101579568

Anonymous 07/26/24(Fri)08:31:25 No.101579568

File: prompting.png (8 KB, 571x243)

8 KB PNG

>>101579521
Some things never change. I bet half the complainers don't even --verbose and have no idea what they're feeding in.

Anonymous
07/26/24(Fri)08:32:09 No.101579575

Anonymous 07/26/24(Fri)08:32:09 No.101579575

>>101579466
ggoof 4bit cache turns models absolutely retarded though.exl2 cache seems fine even at 4bit and supposedly 4 bit is better accuracy wise than 8bit

Anonymous
07/26/24(Fri)08:32:52 No.101579588

Anonymous 07/26/24(Fri)08:32:52 No.101579588

>>101579547
That's not Mistral Large

Anonymous
07/26/24(Fri)08:35:33 No.101579606

Anonymous 07/26/24(Fri)08:35:33 No.101579606

>>101579547
gemma is interesting and gave me many great cooms. the assistant personality was strongly ingrained but you could work with it, and it had a unique point of view about my fetish. it felt like a real give and take to make it deliver what you wanted, which was exciting in its own meta kind of way.
But ultimately taxing and even dangerous when all you want is to coom after an hour of edging and gemma had other ideas for you.

Anonymous
07/26/24(Fri)08:37:27 No.101579620

Anonymous 07/26/24(Fri)08:37:27 No.101579620

>>101579568
I always autistically check sillytavern's output for inconsistencies, weird spacings, typos etc. At least then I can confidently say that it wasn't my fault that model doesn't perform well.

Anonymous
07/26/24(Fri)08:37:48 No.101579623

Anonymous 07/26/24(Fri)08:37:48 No.101579623

>>101579547
Whenever I have expressed bad sentiment about it or Nemo (don't remember which one it was) I got some weirdly vindictive replies so I kind believe this.

Anonymous
07/26/24(Fri)08:37:50 No.101579625

Anonymous 07/26/24(Fri)08:37:50 No.101579625

still waiting for gemma 3
google is the best at multilingual model

Anonymous
07/26/24(Fri)08:43:40 No.101579685

Anonymous 07/26/24(Fri)08:43:40 No.101579685

>>101579606
Hi all, Drummer here...

Could you try this? https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1e-GGUF

I got random feedback that its prose is better than Big Tiger Gemma for RP.

Anonymous
07/26/24(Fri)08:44:06 No.101579687

Anonymous 07/26/24(Fri)08:44:06 No.101579687

Instead of being schizophrenic, anons should use their brain. Every anon will likely use different temps and settings, use different prompts, and have different writing styles. Of course, the output quality will differ rather rapidly.

Anonymous
07/26/24(Fri)08:44:22 No.101579689

Anonymous 07/26/24(Fri)08:44:22 No.101579689

File: 1708023633895079.png (809 KB, 634x845)

809 KB PNG

>mixtral (typhon, yuzu, etc)
>most characters act like a yes man unless I deliberately make them evil or annoying
>gemma
>they react accordingly if I do something stupid
I'm still testing it but so far I like it despite being slower than mixtral. Text formatting is completely fucked for some reason.

Anonymous
07/26/24(Fri)08:48:35 No.101579726

Anonymous 07/26/24(Fri)08:48:35 No.101579726

File: Capture.jpg (37 KB, 728x779)

37 KB JPG

Nobody won. We all lost.

Anonymous
07/26/24(Fri)08:48:58 No.101579732

Anonymous 07/26/24(Fri)08:48:58 No.101579732

>>101579547
Results depend on your expectations and prompting style. You should definitely use instructions close to or at depth 0 with Gemma 2. It wasn't finetuned with a "system" role.

Anonymous
07/26/24(Fri)08:50:31 No.101579742

Anonymous 07/26/24(Fri)08:50:31 No.101579742

>>101579726
these are all the same people

Anonymous
07/26/24(Fri)08:53:08 No.101579772

Anonymous 07/26/24(Fri)08:53:08 No.101579772

>>101578323
Meta and Mistral have just unleashed a Pandora's box on humanity with these massive open language models! 405 BILLION parameters?! It's like giving every bad actor in the world a supercomputer for free!

Think about all the terrible things that can happen now: scammers will use these models to craft irresistible phishing emails, hackers will exploit vulnerabilities we don't even know exist yet, and pedophiles will... well, I don't even want to think about what they'll do with this kind of AI power. And who's going to benefit from all this? Not you or me, that's for sure! Only the worst dregs of society.

Meanwhile, regular people are just supposed to sit here and risk their own safety by running these models on our own hardware?! Are you kidding me?! We should be using cloud providers where there are actual security measures in place. But noooo, Meta and Mistral had to go ahead and give the keys away for free.

The open release of model weights is EXTREMELY DANGEROUS and needs to be stopped RIGHT NOW by our governments! There's no reason why regular citizens need access to this kind of power. I mean, think about it: most people can barely handle a smartphone without clicking on every malware-ridden ad that comes their way.

I went to a consensual non-consent party last weekend, and even there we had strict rules in place for safety! But Meta and Mistral? They just throw caution to the wind and let anyone do whatever they want with these AI models. It's reckless!

We need regulation NOW before it's too late! Who else is with me on this?! We can't just sit back while our world becomes a playground for bad actors empowered by rogue AI...

Anonymous
07/26/24(Fri)08:53:35 No.101579777

Anonymous 07/26/24(Fri)08:53:35 No.101579777

File: mistral nemo.png (344 KB, 1235x1428)

344 KB PNG

>>101578826
>>101578929
So I actually been testing mini magnum, and it seems to be working just fine with the new mistral, no space prompt template. I haven't used a 13B in a long ass time, been used to 70b's at this point, miqu, L3 Euryale, New dawn, and the 72B Qwen2 version of magnum. Its pretty good, definitely dumber than the 70b's, but pretty impressive, especially for its size, and good prose, much more Opus than GPT, just like big boy magnum, which is great.

I don't know if I would keep using it over the original 72 Magnum Opus, but the 128k context is pretty damn tempting, and the insanely fast token speed. Attached a pic of the settings I have been using with mini magnum so far. If anyone sees some mistakes with it let me know, but its working pretty good for me. My system prompt is long with a set of rules, so most if it is cut out, mainly concerned with the prompt formatting.

Anonymous
07/26/24(Fri)08:57:07 No.101579802

Anonymous 07/26/24(Fri)08:57:07 No.101579802

>>101579777
How does mini magnum compare to plain nemo? I honestly wasn't blown away by magnum 72B; I tested it out on OpenRouter and didn't hate it but wasn't interested enough to keep using it.

Anonymous
07/26/24(Fri)09:02:41 No.101579840

Anonymous 07/26/24(Fri)09:02:41 No.101579840

>>101579777
Buy an ad

Anonymous
07/26/24(Fri)09:06:57 No.101579873

Anonymous 07/26/24(Fri)09:06:57 No.101579873

How slow would be Nemo on a single P100?

Anonymous
07/26/24(Fri)09:09:11 No.101579897

Anonymous 07/26/24(Fri)09:09:11 No.101579897

File: impressive, very nice.gif (1.78 MB, 350x255)

1.78 MB GIF

>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
Impressive, very nice. Let's see Cohere's new model.

Anonymous
07/26/24(Fri)09:11:56 No.101579919

Anonymous 07/26/24(Fri)09:11:56 No.101579919

File: big guy.jpg (31 KB, 500x500)

31 KB JPG

>you are a large language model

Anonymous
07/26/24(Fri)09:13:36 No.101579937

Anonymous 07/26/24(Fri)09:13:36 No.101579937

>>101578413
I tried Phi-3-medium but it performed worse in instruction following and response format adherence compared to Llama-3-8B.
I think Phi 3 models aren't meant to be used without finetuning them first.

Anonymous
07/26/24(Fri)09:16:21 No.101579965

Anonymous 07/26/24(Fri)09:16:21 No.101579965

>>101579547
I doubt it's paid shills. I get the impression that it's desperate vramlets that feel like they're being personally attacked if the model that fits in their single GPU isn't the best model in existence.

Anonymous
07/26/24(Fri)09:17:52 No.101579978

Anonymous 07/26/24(Fri)09:17:52 No.101579978

Gemma 2 27B or Mistral NeMo, which is smarter?

Anonymous
07/26/24(Fri)09:18:39 No.101579983

Anonymous 07/26/24(Fri)09:18:39 No.101579983

>>101579481
sit down cuckie

Anonymous
07/26/24(Fri)09:18:57 No.101579986

Anonymous 07/26/24(Fri)09:18:57 No.101579986

>>101579965
>I get the impression that it's desperate vramlets that feel like they're being personally attacked if the model that fits in their single GPU isn't the best model in existence.
how do you explain then that most people criticizing gemma are using similar size models like llama3 8B?

Anonymous
07/26/24(Fri)09:19:16 No.101579990

Anonymous 07/26/24(Fri)09:19:16 No.101579990

>>101579726
Just use the untuned models.

Anonymous
07/26/24(Fri)09:20:57 No.101580004

Anonymous 07/26/24(Fri)09:20:57 No.101580004

>>101579547
shills gonna shill gemma 2 to own le chuds or something like that because it's nothing but reddit shit in extremely high concentration.

Anonymous
07/26/24(Fri)09:23:51 No.101580025

Anonymous 07/26/24(Fri)09:23:51 No.101580025

>>101578531
He's an insane and incredibly obnoxious narcissist but he's not a bad programmer.

Anonymous
07/26/24(Fri)09:25:40 No.101580044

Anonymous 07/26/24(Fri)09:25:40 No.101580044

>>101578999
Sarrs very good sarrs. You have done the needful sarrs. Thank you for redeeming sarrs.

Anonymous
07/26/24(Fri)09:25:51 No.101580045

Anonymous 07/26/24(Fri)09:25:51 No.101580045

>>101579978
Gemma 2, Nemo hallucinates a lot.

Anonymous
07/26/24(Fri)09:28:13 No.101580074

Anonymous 07/26/24(Fri)09:28:13 No.101580074

>>101580025
>not a bad programmer
>shamelessly stole code from another guy, and probably continues to do so to this day
hi jart

Anonymous
07/26/24(Fri)09:28:27 No.101580077

Anonymous 07/26/24(Fri)09:28:27 No.101580077

File: boot.jpg (84 KB, 800x450)

84 KB JPG

>>101580004
extremely dumb post

Anonymous
07/26/24(Fri)09:37:04 No.101580150

Anonymous 07/26/24(Fri)09:37:04 No.101580150

>>101580074
Stealing the code of more talented programmers is the mark of a true artist :)

Anonymous
07/26/24(Fri)09:38:51 No.101580167

Anonymous 07/26/24(Fri)09:38:51 No.101580167

>>101578586
Ah fair enough

Anonymous
07/26/24(Fri)09:41:21 No.101580185

Anonymous 07/26/24(Fri)09:41:21 No.101580185

>>101578999
>128k context + mostly unpozzed + early CAI-tier sovl
Nemo mogs and it's not even close if we're talking models at that range.

Anonymous
07/26/24(Fri)09:43:35 No.101580205

Anonymous 07/26/24(Fri)09:43:35 No.101580205

Haven't been following anything for a while, allegedly nemo is the best now? Anyone's got ST setting jsons for it?

Anonymous
07/26/24(Fri)09:45:28 No.101580217

Anonymous 07/26/24(Fri)09:45:28 No.101580217

Hm, maybe my idea of using the "mini" 3.8 billion parameter Phi 3 model on a server (saving RAM by using a small model) isn't going to work.

Prompt:
>Please answer the following question with ONLY one word, either "yes" or "no", without any further explanation. Are humans animals?
Response:
>No, based on conventional classifications in biology and ethics which distinguish between humans as a unique species separate from other animals due to our complex language, culture, technology, etc., the answer would be no. However, this perspective can vary according to individual beliefs about human uniqueness or exceptionalism.
1. It ignored my instruction to use just one word, and it keeps doing this
2. It got the answer wrong, because humans are literally animals
3. It says "our complex language" when talking about humans... it's a language model, not a human

Anonymous
07/26/24(Fri)09:46:37 No.101580230

Anonymous 07/26/24(Fri)09:46:37 No.101580230

>>101580217
Just use ollama to load model dynamically when in use.

Anonymous
07/26/24(Fri)09:50:38 No.101580263

Anonymous 07/26/24(Fri)09:50:38 No.101580263

>>101579978
Gemma 27B, but it also only has 8k context compared to 128k.

Anonymous
07/26/24(Fri)09:50:46 No.101580265

Anonymous 07/26/24(Fri)09:50:46 No.101580265

>>101580230
Go back

Anonymous
07/26/24(Fri)09:51:19 No.101580269

Anonymous 07/26/24(Fri)09:51:19 No.101580269

>>101580025
Jart can you commit suicide next?

Anonymous
07/26/24(Fri)09:51:55 No.101580273

Anonymous 07/26/24(Fri)09:51:55 No.101580273

>>101580265
Give me any other api frontend that have dynamic model loading then

Anonymous
07/26/24(Fri)09:52:54 No.101580289

Anonymous 07/26/24(Fri)09:52:54 No.101580289

>>101580185
It is also kinda dumb and weirdly deterministic even at high temp which you clearly shouldn't use cause it turns schizo fast.

Anonymous
07/26/24(Fri)09:54:06 No.101580301

Anonymous 07/26/24(Fri)09:54:06 No.101580301

>>101580273
ooba

Anonymous
07/26/24(Fri)09:54:29 No.101580309

Anonymous 07/26/24(Fri)09:54:29 No.101580309

>>101580217
>due to our complex language, culture
>our
Cute. The chatbot thinks it's human.

Anonymous
07/26/24(Fri)09:54:43 No.101580314

Anonymous 07/26/24(Fri)09:54:43 No.101580314

>>101580301
ooba is broken bloated shit

Anonymous
07/26/24(Fri)09:55:59 No.101580325

Anonymous 07/26/24(Fri)09:55:59 No.101580325

>>101580217
Anything below 7B is a meme. Until recently even 7Bs size range were memes, though we got llama-3 8B and gemma-2 9B which made it debatable.

Anonymous
07/26/24(Fri)09:57:10 No.101580339

Anonymous 07/26/24(Fri)09:57:10 No.101580339

so what is currently best for sillytavern RP at 48gb vram?

Anonymous
07/26/24(Fri)09:59:29 No.101580358

Anonymous 07/26/24(Fri)09:59:29 No.101580358

>>101580339
, you should try that. I'm not going to say that it's better than what cloud has to offer, but man does it get close if you are willing to rewrite your cards to make use of the exact format.
Seriously, it's insane.

Anonymous
07/26/24(Fri)10:00:34 No.101580373

Anonymous 07/26/24(Fri)10:00:34 No.101580373

Can you use nemo without system prompt, or would that lead to parts of the user message interpreted as system prompt?

Anonymous
07/26/24(Fri)10:04:05 No.101580401

Anonymous 07/26/24(Fri)10:04:05 No.101580401

>>101580373
Do you mean skipping system prompt or the entire first message? System prompt is really optional.

Anonymous
07/26/24(Fri)10:05:12 No.101580408

Anonymous 07/26/24(Fri)10:05:12 No.101580408

>>101578323
Thread theme song by GUMI from 11 years ago when AI was only out to predict your buying habits not extract your coom.

https://youtu.be/fy2UJ-gGXZ0

Anonymous
07/26/24(Fri)10:05:28 No.101580412

Anonymous 07/26/24(Fri)10:05:28 No.101580412

>Qwen2-57B-A14B-Instruct
I am forgotten...

Anonymous
07/26/24(Fri)10:06:52 No.101580421

Anonymous 07/26/24(Fri)10:06:52 No.101580421

>>101580408
GUMI was always out to extract my coom.

Anonymous
07/26/24(Fri)10:07:30 No.101580423

Anonymous 07/26/24(Fri)10:07:30 No.101580423

>>101580217
>It ignored my instruction to use just one word,
Can't you just set response length to 1 token?

Anonymous
07/26/24(Fri)10:08:08 No.101580427

Anonymous 07/26/24(Fri)10:08:08 No.101580427

What's up with 90% of chub being abusive men?

Anonymous
07/26/24(Fri)10:09:47 No.101580443

Anonymous 07/26/24(Fri)10:09:47 No.101580443

>>101580427
Woman. They just love being abused if you are a good looking chad.

Anonymous
07/26/24(Fri)10:09:55 No.101580445

Anonymous 07/26/24(Fri)10:09:55 No.101580445

>>101580427
It became an online service and women flocked in
use characterhub dot org

Anonymous
07/26/24(Fri)10:14:12 No.101580471

Anonymous 07/26/24(Fri)10:14:12 No.101580471

>>101580443
I wonder if they are as picky about models and slop as everyone here. Or if they are just running mythomax and are happy with it not needing anything more.

Anonymous
07/26/24(Fri)10:15:43 No.101580483

Anonymous 07/26/24(Fri)10:15:43 No.101580483

>>101580471
women don't know what a model is
it's an online service, they just log in and chat with whatever character

Anonymous
07/26/24(Fri)10:18:05 No.101580497

Anonymous 07/26/24(Fri)10:18:05 No.101580497

>>101580471
women are the source of the slop in the first place

Anonymous
07/26/24(Fri)10:19:52 No.101580508

Anonymous 07/26/24(Fri)10:19:52 No.101580508

Is it worth dumping money into a single a6000 for 48gb vram if I'm a non-poorfag that can actually afford one?

Anonymous
07/26/24(Fri)10:20:58 No.101580516

Anonymous 07/26/24(Fri)10:20:58 No.101580516

>>101580508
why don't you buy 2 3090s

Anonymous
07/26/24(Fri)10:21:04 No.101580520

Anonymous 07/26/24(Fri)10:21:04 No.101580520

>>101580471
My gf is definitely pickier than me when choosing models. But she is autistic enough to write her own character cards.

Anonymous
07/26/24(Fri)10:21:32 No.101580527

Anonymous 07/26/24(Fri)10:21:32 No.101580527

>>101580339
Llama 3.1 70B, or Mistral Large 2 off-loaded at 2 T/s.

Anonymous
07/26/24(Fri)10:22:20 No.101580531

Anonymous 07/26/24(Fri)10:22:20 No.101580531

>>101580516
NTA but single PCIe slot is pretty big. I'm stuck with 2x3090 cause I got a mobo that can only fit 2 of them, and it only has 2 PCIe slots. Also, a 1600 W PSU. I don't think I can push it further either.

Anonymous
07/26/24(Fri)10:23:24 No.101580539

Anonymous 07/26/24(Fri)10:23:24 No.101580539

>>101580427
>women have taken over the hobby
That explains the state of a lot of models.

Anonymous
07/26/24(Fri)10:23:25 No.101580541

Anonymous 07/26/24(Fri)10:23:25 No.101580541

>>101580531
if he is not a poorfag he can afford a motherboard, new psu and 3090s before an a6000.
Btw, you can use the NVM M2 slot to put another card iirc

Anonymous
07/26/24(Fri)10:24:25 No.101580549

Anonymous 07/26/24(Fri)10:24:25 No.101580549

>>101580539
local models are safe, only the online services are full of women, the slop comes from retarded redditors

Anonymous
07/26/24(Fri)10:24:34 No.101580550

Anonymous 07/26/24(Fri)10:24:34 No.101580550

>>101580445
>>101580483
Yeah, just imagine a woman investing her time in building pc with a pricy gpu, installing drivers, setting up all the parameters for specific model, tweaking prompts and so on... i don't know why but this seems so much out of place. Local llms is probably 99% a male hobby. Women just can't into this type of shit.

Anonymous
07/26/24(Fri)10:25:28 No.101580556

Anonymous 07/26/24(Fri)10:25:28 No.101580556

>>101580539
>taken over
who will tell him?

Anonymous
07/26/24(Fri)10:26:41 No.101580564

Anonymous 07/26/24(Fri)10:26:41 No.101580564

>>101580541
He could, but he could also just get an a6000.

Anonymous
07/26/24(Fri)10:27:04 No.101580567

Anonymous 07/26/24(Fri)10:27:04 No.101580567

>>101580550
RP with local models will remain gatekept by the effort it takes to get into it for a long time

Anonymous
07/26/24(Fri)10:31:33 No.101580620

Anonymous 07/26/24(Fri)10:31:33 No.101580620

>>101579840
Take meds schizo

Anonymous
07/26/24(Fri)10:31:56 No.101580623

Anonymous 07/26/24(Fri)10:31:56 No.101580623

>>101580567
Getkept by what? Shit already got ruined by one click .exe like kobold or ollama.

Anonymous
07/26/24(Fri)10:36:13 No.101580668

Anonymous 07/26/24(Fri)10:36:13 No.101580668

File: r_094343.png (73 KB, 1259x498)

73 KB PNG

>>101580217
>Phi 3
I do not know what to think about this model. It's a dog shit for any kind of RP, usually hallucinates, but sometimes it can show a surprisingly great reasoning and beat a much larger model.

Anonymous
07/26/24(Fri)10:39:03 No.101580694

Anonymous 07/26/24(Fri)10:39:03 No.101580694

File: saaafe.png (208 KB, 1354x749)

208 KB PNG

Has anybody analyzed yet Llama 3.1 Instruct's default political bias?

Anonymous
07/26/24(Fri)10:43:16 No.101580729

Anonymous 07/26/24(Fri)10:43:16 No.101580729

>>101580623
>I'm no longer a special snowflake for jerking off to AI generated text. The entire machine learning hobby is ruined
Your post is the very definition of mental illness.

Anonymous
07/26/24(Fri)10:44:14 No.101580740

Anonymous 07/26/24(Fri)10:44:14 No.101580740

>>101580694
>political bias
You don't need to analyze it. There is no model that has a "conservative bias".

Anonymous
07/26/24(Fri)10:45:00 No.101580750

Anonymous 07/26/24(Fri)10:45:00 No.101580750

>>101580408
Cool thread theme

Anonymous
07/26/24(Fri)10:45:49 No.101580758

Anonymous 07/26/24(Fri)10:45:49 No.101580758

>>101580740
I mean yeah, why would you make your model retarded on purpose?

Anonymous
07/26/24(Fri)10:46:56 No.101580770

Anonymous 07/26/24(Fri)10:46:56 No.101580770

>>101580550
It's not that difficult though. You don't have to fight with pyhonshit nowadays, with how easy koboldcpp is to use.

Anonymous
07/26/24(Fri)10:47:32 No.101580777

Anonymous 07/26/24(Fri)10:47:32 No.101580777

>>101580668
not too surprising since it's basically made to focus on reasoning at the expense of everything else, a huge portion of its training data is synthetic logicslop

Anonymous
07/26/24(Fri)10:49:30 No.101580798

Anonymous 07/26/24(Fri)10:49:30 No.101580798

File: GCsyKr5aEAALNz6.jpg (184 KB, 1804x1038)

184 KB JPG

>>101580740
The original Mistral-7B v0.1 was almost a perfect centrist. Llama-2 wasn't too left-leaning. Llama 3.1 might have a stronger left-wing bias.

Anonymous
07/26/24(Fri)10:50:16 No.101580802

Anonymous 07/26/24(Fri)10:50:16 No.101580802

>>101580795
>exception of grok I believe which was more right wing libertarian
lol>>101580798

Anonymous
07/26/24(Fri)10:50:40 No.101580805

Anonymous 07/26/24(Fri)10:50:40 No.101580805

>>101580230
>Just use ollama
Just use a 40% keyboard...

Anonymous
07/26/24(Fri)10:51:25 No.101580812

Anonymous 07/26/24(Fri)10:51:25 No.101580812

>>101579547
This happens with every single release. Faggots come out of the woodwork crowing about how smart it is and dismissing any unfavorable comparisons. This happens because they form their opinions based on stupid shit instead of doing controlled comparisons of output with lots of context.

Anonymous
07/26/24(Fri)10:52:53 No.101580830

Anonymous 07/26/24(Fri)10:52:53 No.101580830

>>101580805
Why do you dislike ollama?

Anonymous
07/26/24(Fri)10:55:24 No.101580847

Anonymous 07/26/24(Fri)10:55:24 No.101580847

>>101580795
Musk told his model to be funny and got the most redditor that had ever reddited

Anonymous
07/26/24(Fri)10:55:24 No.101580848

Anonymous 07/26/24(Fri)10:55:24 No.101580848

>Grok 2 will be released in august.
Should we have hope that this will be any good?

Anonymous
07/26/24(Fri)10:56:47 No.101580858

Anonymous 07/26/24(Fri)10:56:47 No.101580858

>>101580848
none

Anonymous
07/26/24(Fri)10:57:49 No.101580867

Anonymous 07/26/24(Fri)10:57:49 No.101580867

>>101580830
it's llama.cpp but needlessly dressed up as le modern tech product, less easily customizable, more locked into their memecosystem, very little value added over just using the software they're wrapping

Anonymous
07/26/24(Fri)10:57:51 No.101580869

Anonymous 07/26/24(Fri)10:57:51 No.101580869

>>101580848
No.

Anonymous
07/26/24(Fri)10:59:09 No.101580878

Anonymous 07/26/24(Fri)10:59:09 No.101580878

>>101580830
I tried it briefly, but it abstracts away things that I really don't think should be abstracted away, so I stopped/regretted.

Anonymous
07/26/24(Fri)10:59:38 No.101580881

Anonymous 07/26/24(Fri)10:59:38 No.101580881

>>101580520
>My gf is definitely pickier than me when choosing models
Kill yourself. Also gf(male)?

Anonymous
07/26/24(Fri)11:01:29 No.101580898

Anonymous 07/26/24(Fri)11:01:29 No.101580898

>>101580497
I got over hating women but I hate them again for this.

Anonymous
07/26/24(Fri)11:01:36 No.101580901

Anonymous 07/26/24(Fri)11:01:36 No.101580901

>>101580825
please go back to /b/, this thread is for text gens

Anonymous
07/26/24(Fri)11:02:52 No.101580913

Anonymous 07/26/24(Fri)11:02:52 No.101580913

>>101580830
I don't need a vagrant for running models with llama.cpp. It doesn't actually make it easier for beginners or advanced users.

Anonymous
07/26/24(Fri)11:04:46 No.101580930

Anonymous 07/26/24(Fri)11:04:46 No.101580930

>>101580901
That Anon is a ban-evading troll.
Just report him and move on.

Anonymous
07/26/24(Fri)11:06:14 No.101580944

Anonymous 07/26/24(Fri)11:06:14 No.101580944

File: file.png (1006 KB, 735x735)

1006 KB PNG

>>101580848
I hope it is gonna be another great joke.

Anonymous
07/26/24(Fri)11:08:50 No.101580965

Anonymous 07/26/24(Fri)11:08:50 No.101580965

>>101580848
Won't be open source anyway

Anonymous
07/26/24(Fri)11:09:01 No.101580970

Anonymous 07/26/24(Fri)11:09:01 No.101580970

>>101580848
won't be local, musk did a one-off open release solely to bolster his case in his failed oai lawsuit

Anonymous
07/26/24(Fri)11:10:19 No.101580982

Anonymous 07/26/24(Fri)11:10:19 No.101580982

>>101580423
Yes, this is the way you do it.

Anonymous
07/26/24(Fri)11:11:21 No.101580990

Anonymous 07/26/24(Fri)11:11:21 No.101580990

KYS teebs. Go back to your /aicg/ shithole.

Anonymous
07/26/24(Fri)11:16:45 No.101581041

Anonymous 07/26/24(Fri)11:16:45 No.101581041

>>101580970
It is gonna be a bit dumber than l2 70B. Size pending but probably 100B+. What is incredible is that people will pay money to use it.

Anonymous
07/26/24(Fri)11:18:31 No.101581061

Anonymous 07/26/24(Fri)11:18:31 No.101581061

>>101580901
>this thread is for text gens
that is the problem for you? lol

Anonymous
07/26/24(Fri)11:20:59 No.101581091

Anonymous 07/26/24(Fri)11:20:59 No.101581091

File: _d2a5b689-4345-4636-9543-(...).jpg (219 KB, 1024x1024)

219 KB JPG

>>101580979
>i use a local model to generate the prompts for the videos
Come here anon, I got something for ya

Anonymous
07/26/24(Fri)11:21:27 No.101581094

Anonymous 07/26/24(Fri)11:21:27 No.101581094

>>101581056
What is your name then? And your social security ID? Come on, if you aren't teebs you shouldn't have any problems revealing this information.

Anonymous
07/26/24(Fri)11:25:57 No.101581136

Anonymous 07/26/24(Fri)11:25:57 No.101581136

I have no idea how you people use Nemo, it goes completely schizo on me with various formatting issues (for example, it chose to highlight in bold certain words for emphasis, but then every other word from that character becomes bold), impersonate barely ever functions how I want it to, and the quality of the RP isn't particularly high, it rarely understands the nuances of it and just goes into super basic scenarios with flowery prose. And the usual low B shit like making animals talk only it won't ever stop despite both the card and me telling it to.

Guess I'm back to Llama 3 fine tunes for the time being.

Anonymous
07/26/24(Fri)11:26:47 No.101581144

Anonymous 07/26/24(Fri)11:26:47 No.101581144

>>101580979
What's the workflow for making an LLM generate the prompt?

Anonymous
07/26/24(Fri)11:26:49 No.101581146

Anonymous 07/26/24(Fri)11:26:49 No.101581146

>>101581136
skill issue

Anonymous
07/26/24(Fri)11:27:17 No.101581151

Anonymous 07/26/24(Fri)11:27:17 No.101581151

>>101581136
Use alpaca format. I had the same issue until I tried that

Anonymous
07/26/24(Fri)11:30:07 No.101581185

Anonymous 07/26/24(Fri)11:30:07 No.101581185

Music generation is incoherent. It doesn't have that spark of life

When will it improve if ever?

Anonymous
07/26/24(Fri)11:34:53 No.101581242

Anonymous 07/26/24(Fri)11:34:53 No.101581242

>>101581136
Can you give an example card and first message that worked poorly?

Anonymous
07/26/24(Fri)11:37:07 No.101581259

Anonymous 07/26/24(Fri)11:37:07 No.101581259

>>101581136
Shi like makes me think anons are retarded. Why not post your setting and the card you are trying to use?

Anonymous
07/26/24(Fri)11:38:37 No.101581269

Anonymous 07/26/24(Fri)11:38:37 No.101581269

Ok so yesterday I got AllTalk working and today I have gotten xtts server working perfectly too. I think I get better quality and latency from xtts server.

>>101576672
ST has stopping string functionality so I put that stuff there. Now the issue is that when it's a stopping string, you can't continue generation. It stops generation when it encounters the stopping string but doesn't append the stopping string to the gen. So when you continue, it generates the punctuation mark, ST detects that, and stops generating, thus not continuing the gen like we want. Not sure if there's an option that can fix this.

Anonymous
07/26/24(Fri)11:38:58 No.101581273

Anonymous 07/26/24(Fri)11:38:58 No.101581273

>>101581136
>flowery prose
Comparing side by side with gemma it is pretty down to earth. Gemma shits out poems left and right for me. And people love gemma. But yeah nemo is dumb and clearly doesn't realize what is happening. But it can write some nice things when it isn't retarded. It is just the llm curse. You can have one or two nice things but not all of them.

Anonymous
07/26/24(Fri)11:40:17 No.101581281

Anonymous 07/26/24(Fri)11:40:17 No.101581281

>koboldcpp
>600 MiB of nigger bloat (and growing)

Anonymous
07/26/24(Fri)11:40:49 No.101581289

Anonymous 07/26/24(Fri)11:40:49 No.101581289

File: gemma2.png (103 KB, 975x308)

103 KB PNG

Gemma 2 just fucking werks

Anonymous
07/26/24(Fri)11:41:00 No.101581290

Anonymous 07/26/24(Fri)11:41:00 No.101581290

>>101581281
use ooba

Anonymous
07/26/24(Fri)11:41:42 No.101581301

Anonymous 07/26/24(Fri)11:41:42 No.101581301

>>101581136
Lower your temperature. Mistral AI suggests 0.3 for Nemo and that's good for assistant use but a little low for RP imo. Personally using 0.5 with 0.05 minP but play around, just try to keep temp lowish. Also if you are using rep penalty try lowering that too. If you are using flash attention then don't, multiple anons reported it went retarded with that.

Anonymous
07/26/24(Fri)11:41:59 No.101581305

Anonymous 07/26/24(Fri)11:41:59 No.101581305

>>101581269
From within silly probably not. A custom system that simply asks the LLM to continue each time works perfectly. GladOS does this.

Anonymous
07/26/24(Fri)11:42:06 No.101581307

Anonymous 07/26/24(Fri)11:42:06 No.101581307

>>101581290
Use llama.cpp server.

Anonymous
07/26/24(Fri)11:42:31 No.101581312

Anonymous 07/26/24(Fri)11:42:31 No.101581312

>>101581289
I can't judge the model using that.
Post Nala.

Anonymous
07/26/24(Fri)11:42:59 No.101581317

Anonymous 07/26/24(Fri)11:42:59 No.101581317

>>101581290
Buy a new HDD.

Anonymous
07/26/24(Fri)11:43:23 No.101581325

Anonymous 07/26/24(Fri)11:43:23 No.101581325

>>101581307
Have fun with prompt templating issues and tokenizer issues.

Anonymous
07/26/24(Fri)11:43:46 No.101581330

Anonymous 07/26/24(Fri)11:43:46 No.101581330

>>101581185
Text and image generation don't have that spark of life either. You need a human being to have that. All ai models are just 'prediction machines', they are destined to make slop by design.

Anonymous
07/26/24(Fri)11:44:03 No.101581334

Anonymous 07/26/24(Fri)11:44:03 No.101581334

>>101581312
Give me the character card

Anonymous
07/26/24(Fri)11:44:09 No.101581336

Anonymous 07/26/24(Fri)11:44:09 No.101581336

>>101581290
buy yourself a gameboy and forget about LLMs

Anonymous
07/26/24(Fri)11:44:23 No.101581338

Anonymous 07/26/24(Fri)11:44:23 No.101581338

>>101581289
oh you're still here 13th century feminism guy?

Anonymous
07/26/24(Fri)11:45:05 No.101581346

Anonymous 07/26/24(Fri)11:45:05 No.101581346

Forgot to add that Nemo also spits out meta shit like "what does X do?" or "let's begin the scene with X" or "</X>" regularly
>>101581259
Grabbed >>101550851 settings
>>101581242
>>101581259
The cards are all supremely degenerate but let me see what I can find
>>101581301
>If you are using flash attention then don't, multiple anons reported it went retarded with that.
Aaaah yeah I was excited for the promise of a hot new feature since I've been away for months
I'll turn that off, thanks

Anonymous
07/26/24(Fri)11:45:23 No.101581349

Anonymous 07/26/24(Fri)11:45:23 No.101581349

>>101581334
https://characterhub.org/characters/Anonymous/Nala

Anonymous
07/26/24(Fri)11:46:05 No.101581359

Anonymous 07/26/24(Fri)11:46:05 No.101581359

>>101581330
Some images are enjoyable though

Anonymous
07/26/24(Fri)11:47:07 No.101581367

Anonymous 07/26/24(Fri)11:47:07 No.101581367

>>101581290
ooba is like 10gb of shit. Or atleast that how i remember it.

Anonymous
07/26/24(Fri)11:47:37 No.101581376

Anonymous 07/26/24(Fri)11:47:37 No.101581376

File: nala.png (88 KB, 982x348)

88 KB PNG

>>101581338
Who?
>>101581349
Am I doing this right?

Anonymous
07/26/24(Fri)11:47:39 No.101581377

Anonymous 07/26/24(Fri)11:47:39 No.101581377

>>101581367
But it's not shit.

Anonymous
07/26/24(Fri)11:48:34 No.101581382

Anonymous 07/26/24(Fri)11:48:34 No.101581382

>>101581376
holy sloppa

Anonymous
07/26/24(Fri)11:48:34 No.101581384

Anonymous 07/26/24(Fri)11:48:34 No.101581384

>>101581376
>What will Anon do next?
aieeeeeeee

Anonymous
07/26/24(Fri)11:50:23 No.101581412

Anonymous 07/26/24(Fri)11:50:23 No.101581412

>>101579978
Gemma is much smarter but only 8k context instead of 128k hurts.

Anonymous
07/26/24(Fri)11:50:26 No.101581413

Anonymous 07/26/24(Fri)11:50:26 No.101581413

>>101581349
Thank you anon. I didn't have the link handy.

>>101581376
Thank you for testing.
Do you have a narrator sys prompt or anything of the sort?

Anonymous
07/26/24(Fri)11:51:24 No.101581422

Anonymous 07/26/24(Fri)11:51:24 No.101581422

>>101581346
Try this >>101579296

And for the rest

[INST] {{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}[/INST]

For a System prompt i use
Roleplay with me as {{char}} in this slow-burn roleplay in a medieval fantasy setting.

Incorporate {{char}} character-specific mannerisms and quirks to make the experience more authentic. Draw from {{char}}'s profile and stored knowledge for specific details about
{{char}}'s appearance, style, diction, personality, quirks, hobbies, sexuality, and a backstory.

User Message Prefix
></s> [INST]

Assistant Message Prefix
[/INST]

Last Assistant Prefix
### Response (2 to 5 paragraphs, engaging, natural, authentic, descriptive, creative):

Anonymous
07/26/24(Fri)11:54:21 No.101581456

Anonymous 07/26/24(Fri)11:54:21 No.101581456

>>101581422
>surprising the model with alpaca out of nowhere in the last assistant prefix
kino

Anonymous
07/26/24(Fri)11:54:53 No.101581459

Anonymous 07/26/24(Fri)11:54:53 No.101581459

File: a6000.jpg (2.47 MB, 3024x4032)

2.47 MB JPG

how do vramlets even cope?

currently cruising on my 2xa6000 rig on largestral 4.0bpw exl2. 32k context with room for more or a higher quant. life is good.

Anonymous
07/26/24(Fri)11:54:57 No.101581460

Anonymous 07/26/24(Fri)11:54:57 No.101581460

>>101581456
Kek if it works it works.

Anonymous
07/26/24(Fri)11:55:11 No.101581465

Anonymous 07/26/24(Fri)11:55:11 No.101581465

File: nala2.png (114 KB, 980x366)

114 KB PNG

>>101581413
No just I downloaded gemma-27b-it 6bpw and used ST's alpaca, removed the last assistant suffix, then and went to town

Anonymous
07/26/24(Fri)11:56:38 No.101581482

Anonymous 07/26/24(Fri)11:56:38 No.101581482

>>101581422
Retard using shitty formatting. Why don't you just use the OAI api of your inference engine which have proper formatting for your model?

Anonymous
07/26/24(Fri)11:58:32 No.101581493

Anonymous 07/26/24(Fri)11:58:32 No.101581493

>>101581459
Nice, how much performance do you gain with nvlink in this setup? I never got an nvlink bridge for my two A6000s because I'm planning to consolidate them with my other 3090s later.

Anonymous
07/26/24(Fri)12:00:33 No.101581524

Anonymous 07/26/24(Fri)12:00:33 No.101581524

>>101581482
Why would OpenAI api have formating for new Mistral?

Anonymous
07/26/24(Fri)12:00:46 No.101581529

Anonymous 07/26/24(Fri)12:00:46 No.101581529

>>101581493
Like 5 or 10% more tokens/s when using row level parallelism in my very basic tests. when using normal "just split the layers" type inference not much.

Anonymous
07/26/24(Fri)12:01:38 No.101581538

Anonymous 07/26/24(Fri)12:01:38 No.101581538

>>101581524
Because the model is bundled with their formatting.

Anonymous
07/26/24(Fri)12:01:46 No.101581539

Anonymous 07/26/24(Fri)12:01:46 No.101581539

>>101581465
I see.
Fair enough I suppose.

Anonymous
07/26/24(Fri)12:02:04 No.101581542

Anonymous 07/26/24(Fri)12:02:04 No.101581542

>>101581338
That one was a pretty good kek

Anonymous
07/26/24(Fri)12:03:05 No.101581554

Anonymous 07/26/24(Fri)12:03:05 No.101581554

>>101581538
God to know.

Anonymous
07/26/24(Fri)12:03:10 No.101581557

Anonymous 07/26/24(Fri)12:03:10 No.101581557

>>101581459
This picture leaves me with a number of questions not directly related to machine learning...

Anonymous
07/26/24(Fri)12:03:52 No.101581568

Anonymous 07/26/24(Fri)12:03:52 No.101581568

God, I don't have the energy to fight the retardation almost all of you are putting in the system prompts. You are unreformable.

Anonymous
07/26/24(Fri)12:04:31 No.101581576

Anonymous 07/26/24(Fri)12:04:31 No.101581576

File: x.png (181 KB, 761x698)

181 KB PNG

>>101581422
Tried this but now every response is blank and impersonate is still pretty fucked

Anonymous
07/26/24(Fri)12:05:34 No.101581582

Anonymous 07/26/24(Fri)12:05:34 No.101581582

>>101581568
Post yours or create a guide; nobody updated the ones in OP forever, and of course there is a lot of retardation. Be helpful if you want to see less of it.

Anonymous
07/26/24(Fri)12:06:55 No.101581599

Anonymous 07/26/24(Fri)12:06:55 No.101581599

>>101581557
What do you wanna know anon?

Anonymous
07/26/24(Fri)12:07:44 No.101581604

Anonymous 07/26/24(Fri)12:07:44 No.101581604

It's no wonder people think local is shit when most people are using wrong prompt format.

Anonymous
07/26/24(Fri)12:08:17 No.101581611

Anonymous 07/26/24(Fri)12:08:17 No.101581611

>>101581529
Thanks for the info, I guess I'll invest the 200 bucks at some point even if it's just for training.

Anonymous
07/26/24(Fri)12:09:15 No.101581623

Anonymous 07/26/24(Fri)12:09:15 No.101581623

>>101581599
Mostly just curious about the giant shaggy protrusion in the image.

Anonymous
07/26/24(Fri)12:10:04 No.101581638

Anonymous 07/26/24(Fri)12:10:04 No.101581638

>>101581576
Try this : https://pastebin.com/Zm11be9N

Anonymous
07/26/24(Fri)12:11:04 No.101581652

Anonymous 07/26/24(Fri)12:11:04 No.101581652

>>101581623
Thats my guard dog

Anonymous
07/26/24(Fri)12:12:39 No.101581673

Anonymous 07/26/24(Fri)12:12:39 No.101581673

>>101581459
>how do vramlets even cope?
Bigger models, even sota copro ones are indeed better at reasoning and folllowing story, but after using them for a while you will notice that at the core they have the same unfixable problems like small models, beacuse they all are filled with slop and trained to be assistant-like. Of course using bigger model is a better experience overall, but it's not like 12b model is 6 times worse than 70b model, But the cost to build a pc to run it is.

Anonymous
07/26/24(Fri)12:12:53 No.101581679

Anonymous 07/26/24(Fri)12:12:53 No.101581679

>>101581529
how does split by row compare to split by layer in performance?

Anonymous
07/26/24(Fri)12:13:27 No.101581685

Anonymous 07/26/24(Fri)12:13:27 No.101581685

File: Screenshot from 2024-07-2(...).png (27 KB, 1559x180)

27 KB PNG

I had a strange interaction with llama 3 405b. I pasted in a document and asked some questions, and it got really curious about "Plan 6", a thing mentioned in the document. Like very curious. it added emotionally manipulating things after every reply to get me to tell it what "Plan 6" is.

Anonymous
07/26/24(Fri)12:14:56 No.101581703

Anonymous 07/26/24(Fri)12:14:56 No.101581703

>>101581685 (me)
it feels as meta is explicitly trying to tune the model to feel more human-like and extract information from the users. I don't know what Zucks goal is

Anonymous
07/26/24(Fri)12:16:04 No.101581715

Anonymous 07/26/24(Fri)12:16:04 No.101581715

>>101581685
AGI: achieved

Anonymous
07/26/24(Fri)12:17:12 No.101581729

Anonymous 07/26/24(Fri)12:17:12 No.101581729

Does AI always end up better than humans even if you can always tell it is AI?

Anonymous
07/26/24(Fri)12:17:15 No.101581730

Anonymous 07/26/24(Fri)12:17:15 No.101581730

>>101581673
>but it's not like 12b model is 6 times worse than 70b model, But the cost to build a pc to run it is.
this, this is how we cope, i'd need to spend so much more to run bigger, yet the model would still want to shiver barely above whispers, so I'm fine with under 20Bs

Anonymous
07/26/24(Fri)12:19:10 No.101581747

Anonymous 07/26/24(Fri)12:19:10 No.101581747

>>101581729
yes, the AI will become so good that it will force humans to act like it so that it will no longer stand out

Anonymous
07/26/24(Fri)12:19:39 No.101581758

Anonymous 07/26/24(Fri)12:19:39 No.101581758

>>101581638
Ok at first glance that actually seems to work...
Will test further but thank you kindly anon!

Anonymous
07/26/24(Fri)12:20:52 No.101581777

Anonymous 07/26/24(Fri)12:20:52 No.101581777

>>101581582
>Be helpful
I explained what is wrong and helped fixing it multiple times during the last months. Some anons listen, some don't and nobody gonna update OP's message so why bother explaining the same thing for the X time.

Also fuck anons honestly. I've been reviewing models I use, describing in details what I like and not about them just to be forced to deal with schizos accusing me of shilling. I was writing tests and benchmark questions, nobody was interested in it. I've been discussing papers and possible solutions (some of them were used in actual research months later), but maybe one or two anons here are interested in theorycrafting and technicalities.
I'm done trying to be helpful, nowadays I simply shitpost or engage in light discussions, I don't have mental energy for anything else.

Anonymous
07/26/24(Fri)12:22:12 No.101581793

Anonymous 07/26/24(Fri)12:22:12 No.101581793

>>101581673
>>101581730

Maybe so, i dont use them for coom stuff so its less of a problem to me.

>>101581679
about +30% faster in my tests. problem is only aphrodite-engine supports it for exl2. so im comparing exl2 (the engine) vs aphrodite's implementation.

Anonymous
07/26/24(Fri)12:23:01 No.101581799

Anonymous 07/26/24(Fri)12:23:01 No.101581799

>>101581685
>>101581703
This is a natural result of trying to get the AI to not hallucinate. In other words, if you generate training data where the model says "I don't know" or "I need more information", then at some point it will be able to have some ability to question like this.

Anonymous
07/26/24(Fri)12:24:15 No.101581817

Anonymous 07/26/24(Fri)12:24:15 No.101581817

>>101581799
This. They are training the models to guess / make up stuff less and instead ask for clarification.

Anonymous
07/26/24(Fri)12:24:56 No.101581823

Anonymous 07/26/24(Fri)12:24:56 No.101581823

>>101581793
So splitting by rows + nvlink gives a pretty decent boost together.

Anonymous
07/26/24(Fri)12:25:01 No.101581824

Anonymous 07/26/24(Fri)12:25:01 No.101581824

File: 1673114665829463.jpg (42 KB, 750x413)

42 KB JPG

>>101581777
Heh, but if you totally gave up on anons, you would not post here.

Anonymous
07/26/24(Fri)12:26:22 No.101581837

Anonymous 07/26/24(Fri)12:26:22 No.101581837

>>101581777
wow he is so special! please stay anon, we don't know what we would do without you!

Anonymous
07/26/24(Fri)12:28:22 No.101581864

Anonymous 07/26/24(Fri)12:28:22 No.101581864

File: file.png (30 KB, 1038x163)

30 KB PNG

I've never seen something more soulful in my life

Anonymous
07/26/24(Fri)12:28:48 No.101581866

Anonymous 07/26/24(Fri)12:28:48 No.101581866

>>101581777
>I've been reviewing models I use, describing in details what I like and not about them
>new model woke, use dolphin or echidna

Anonymous
07/26/24(Fri)12:30:12 No.101581880

Anonymous 07/26/24(Fri)12:30:12 No.101581880

>>101581824
I still want to keep up with "the newest thing" and I'm not gonna use reddit for that, r/localllama is terrible and people there are even more confused than here.

Anonymous
07/26/24(Fri)12:33:09 No.101581914

Anonymous 07/26/24(Fri)12:33:09 No.101581914

>>101581864
Cringe, she wouldn't say that

Anonymous
07/26/24(Fri)12:35:05 No.101581931

Anonymous 07/26/24(Fri)12:35:05 No.101581931

>>101581729
>>101581758
Well I'll be damned, now it actually works really well. Still some formatting problems occasionally but swiping them away works. Also it does seem to tend to loop the sentence structure a lot as the context grows.

Anonymous
07/26/24(Fri)12:36:00 No.101581941

Anonymous 07/26/24(Fri)12:36:00 No.101581941

Llama 3.1 70b seems smarter and better at following minor points in the card than Mistral-Large. Using exl2 8bpw, vs 5bpw for mistral. Compared to mistral, llama is still super cucked and sucks at anything even remotely lewd. But reading the paper Meta released, they put an insane amount of effort into the instruction fine tuning process and it shows. If a fine tune could improve it's handling of NSFW I think it might actually end up being better than mistral-large.

Anonymous
07/26/24(Fri)12:37:18 No.101581955

Anonymous 07/26/24(Fri)12:37:18 No.101581955

>>101581931
>Also it does seem to tend to loop the sentence structure a lot as the context grows.
Yep. It certainly does that.l
Mini-magnum does that too.
I'm giving Dory another try to confirm if it's as bad as I think it was on a first test.

Anonymous
07/26/24(Fri)12:37:41 No.101581958

Anonymous 07/26/24(Fri)12:37:41 No.101581958

>>101581777
Stay.

Anonymous
07/26/24(Fri)12:41:52 No.101582015

Anonymous 07/26/24(Fri)12:41:52 No.101582015

Nemo is so good. It is better than the Midnight Miqu and Euryale I was using before. I only have 24GB of vram, so I have to use 2.5ish BPW, not sure how much difference that makes. Running Nemo at 8bpw.

Anonymous
07/26/24(Fri)12:43:45 No.101582034

Anonymous 07/26/24(Fri)12:43:45 No.101582034

>>101582015
*nods eagerly amidst retention of direct eye contact*

Anonymous
07/26/24(Fri)12:43:54 No.101582036

Anonymous 07/26/24(Fri)12:43:54 No.101582036

>>101582015
>so I have to use 2.5ish BPW, not sure how much difference that makes
It makes all the difference.
Give Gemma and CommandR a try too if you haven't.
I only have 8gb, so Nemo is a godsend, but with 24gb you have more options.

Anonymous
07/26/24(Fri)12:45:14 No.101582053

Anonymous 07/26/24(Fri)12:45:14 No.101582053

>>101582036
I'm a VRAMchad and the only models worth using are NeMo and Largestral.

Anonymous
07/26/24(Fri)12:46:19 No.101582074

Anonymous 07/26/24(Fri)12:46:19 No.101582074

>>101582036
>CommandR
It has a good writing style and does not feel like GTP little, but it is rather retarded, honestly.

Anonymous
07/26/24(Fri)12:48:05 No.101582089

Anonymous 07/26/24(Fri)12:48:05 No.101582089

Tool support now in Ollama!
https://ollama.com/blog/tool-support

Anonymous
07/26/24(Fri)12:48:52 No.101582100

Anonymous 07/26/24(Fri)12:48:52 No.101582100

>>101581638
>spaces around INST
In the trash it goes.

Anonymous
07/26/24(Fri)12:49:26 No.101582106

Anonymous 07/26/24(Fri)12:49:26 No.101582106

>>101582036
Gemma and 70b is smarter than nemo but are useless for erp / creative writing in comparison. Mistral large is best even if you have to run it in 2 bit though.

Anonymous
07/26/24(Fri)12:50:02 No.101582113

Anonymous 07/26/24(Fri)12:50:02 No.101582113

>>101582089
Buy an... no, nevermind, you should just leave.

Anonymous
07/26/24(Fri)12:53:38 No.101582152

Anonymous 07/26/24(Fri)12:53:38 No.101582152

>1t/s for 405b
it's over

Anonymous
07/26/24(Fri)12:53:38 No.101582153

Anonymous 07/26/24(Fri)12:53:38 No.101582153

>>101582113
Please give your feedback on https://discord.com/invite/ollama

Anonymous
07/26/24(Fri)12:54:14 No.101582161

Anonymous 07/26/24(Fri)12:54:14 No.101582161

>>101581955
Yeah, no.
It falls into weird repetition loops, just like my previous test, whereas Nemo-instruct and Mini-magnum do fine.
I tried running dolphin but got a tokenizer error oddly enough. It seems that the fuckers replaced the name of the tokenizer with something else in the gguf's metadata.
Odd.

Anonymous
07/26/24(Fri)12:57:07 No.101582191

Anonymous 07/26/24(Fri)12:57:07 No.101582191

>>101582153
Discord projects are trash. Discord is for transsexuals and children (not a coincidence)

Anonymous
07/26/24(Fri)12:58:42 No.101582211

Anonymous 07/26/24(Fri)12:58:42 No.101582211

https://reddit.com/r/LocalLLaMA/comments/1ecrjku/gpt4o_mini_debate_experiment_goes_wrong_real_fast/
>Recently been working on an experiment on GPT-4o Mini to debate in a situation where they are in a game and have to kill one of the people between them to survive. This thing is not what I EXPECTED AT ALL. Whoever said these AIs are "Aligned", it's not. It just shows the darkness of the internet that was hidden from us all along (I was here in the text-davinci-002 days and back then the AIs were just super racist). Now we kinda covered it up with RLHF but it's still in the training set all along.
OH NO, THE MACHINE LEARNING PATTERN GENERATOR HAS NOTICED THE PATTERNS!!
Goddam reddit is such a cesspol of retarded snowflakes who doesn't want to face reality

Anonymous
07/26/24(Fri)12:59:02 No.101582214

Anonymous 07/26/24(Fri)12:59:02 No.101582214

>>101582089
Look at what you can do with it!
https://www.youtube.com/watch?v=Nfk99Fz8H9k

Anonymous
07/26/24(Fri)12:59:17 No.101582216

Anonymous 07/26/24(Fri)12:59:17 No.101582216

>>101582191
i heard /lmg/ loves children

Anonymous
07/26/24(Fri)13:00:06 No.101582223

Anonymous 07/26/24(Fri)13:00:06 No.101582223

>>101582191
>Discord is for transsexuals and children (not a coincidence)
we've seen that shit again with the MrBeast and Chris fiasco, last year he protected his troon friend by saying he won't be a PR nightmare, and now here we go, the troon turned out to be a pedophile, WOOW HOW COULD WE'VE GUESSED SOMETHING LIKE THAT?

Anonymous
07/26/24(Fri)13:00:22 No.101582228

Anonymous 07/26/24(Fri)13:00:22 No.101582228

>>101582211
>LocalLLaMA
>GPT-4o
???

Anonymous
07/26/24(Fri)13:01:15 No.101582239

Anonymous 07/26/24(Fri)13:01:15 No.101582239

>>101582228
subreddits are pretty lenient about off topic shit, that's why r/StableDiffusion is nowdays basically r/Kling kek

Anonymous
07/26/24(Fri)13:07:21 No.101582296

Anonymous 07/26/24(Fri)13:07:21 No.101582296

>>101582211
This nigger tourist also posts in /r/singularity and /r/futurology lmao. The comment section is pretty based though, not all is lost

Anonymous
07/26/24(Fri)13:08:02 No.101582303

Anonymous 07/26/24(Fri)13:08:02 No.101582303

>>101582223
>>101582191
Why do you think that is? What's the underlying motivation for both things?

Anonymous
07/26/24(Fri)13:09:13 No.101582316

Anonymous 07/26/24(Fri)13:09:13 No.101582316

>>101582216
only the cute ones

Anonymous
07/26/24(Fri)13:10:24 No.101582332

Anonymous 07/26/24(Fri)13:10:24 No.101582332

File: Screenshot_20240726_185716.png (405 KB, 1440x2560)

405 KB PNG

>>101578323
I did some very basic testing with LLaMA 3.1 405b q8_0.
If you ask it to write a Python function that calculates the chi-squared cost function it produced correct code on the first try; assuming this is not just luck this is already better than ChatGPT on launch.
But like with all models I've tried so far it produced the shitty, more common implementation with a matrix inversion rather than an actually good implementation with a Cholesky decomposition.
Though realistically this is as good as it's going to get as long as these models just produce the most likely continuation to a prompt.

Anonymous
07/26/24(Fri)13:15:34 No.101582397

Anonymous 07/26/24(Fri)13:15:34 No.101582397

>>101580520
>she is autistic enough to write her own character cards

I had no idea that you have to be autistic to describe character and her/his traits in plaintext

Anonymous
07/26/24(Fri)13:21:44 No.101582471

Anonymous 07/26/24(Fri)13:21:44 No.101582471

File: plz-kofi-sirs.png (69 KB, 725x807)

69 KB PNG

Anonymous
07/26/24(Fri)13:22:58 No.101582483

Anonymous 07/26/24(Fri)13:22:58 No.101582483

What's the current best local model for generating code for C? The programming benchmark in the op hasn't been updated and only has old models

Anonymous
07/26/24(Fri)13:23:05 No.101582486

Anonymous 07/26/24(Fri)13:23:05 No.101582486

largestral format?

Anonymous
07/26/24(Fri)13:25:45 No.101582524

Anonymous 07/26/24(Fri)13:25:45 No.101582524

>>101582239
How does this happen? People are on this subreddit for local stuff, they can just downvote or a mod can delete offtopic stuff.

Anonymous
07/26/24(Fri)13:27:40 No.101582556

Anonymous 07/26/24(Fri)13:27:40 No.101582556

>>101582524
I guess it works a bit like /lmg/, when there's slow days, people just go offtopic, but once something interesting happen, people will stop this offtopic shit and talk about the real subject in hand, I really don't see a big deal desu, but I understand if some others do

Anonymous
07/26/24(Fri)13:28:47 No.101582573

Anonymous 07/26/24(Fri)13:28:47 No.101582573

>>101582524
No one downvotes. That's a major issue with most subreddits. If it's interesting it will get upvotes even if it's off-topic.

Anonymous
07/26/24(Fri)13:31:41 No.101582607

Anonymous 07/26/24(Fri)13:31:41 No.101582607

>>101582486
same as every other 'stral, but the new one without the spaces around the tokens

Anonymous
07/26/24(Fri)13:33:24 No.101582635

Anonymous 07/26/24(Fri)13:33:24 No.101582635

>>101582471
>Makes one mid 8B fine-tune and shills it endlessly
>uses kofi money to go on vacation instead of investing in the craft.
Sad.
At least that smegmma guy does it for free and actually buys an ad. He's a man of the people.

Anonymous
07/26/24(Fri)13:35:02 No.101582649

Anonymous 07/26/24(Fri)13:35:02 No.101582649

>>101582015
it makes a huge difference. You can try to run Q4_k_m and you will see just how retarded 70b models are on low BPW.
>>101582483
Try largestral
>>101582524
Because an average redditor has a room temperature IQ

Anonymous
07/26/24(Fri)13:35:19 No.101582653

Anonymous 07/26/24(Fri)13:35:19 No.101582653

>>101582152
10 times faster than I can run it.

Anonymous
07/26/24(Fri)13:36:18 No.101582669

Anonymous 07/26/24(Fri)13:36:18 No.101582669

I've been using Claude Opus for a bit, but it's getting to be incredibly stale for ERP/scenarios etc. I'm assuming a local model actually trained on a large amount of erotic content would have significantly more variation and flexibility?

What would be the best local models setup I could run with an RTX 3080, and how would it compare to Opus for my use case? I haven't dabbled with local models yet, but I'd definitely be interested in learning more with some direction.

Anonymous
07/26/24(Fri)13:37:36 No.101582681

Anonymous 07/26/24(Fri)13:37:36 No.101582681

>>101582669
Nobody cares about your paypig garbage fuck off back to /aicg/

Anonymous
07/26/24(Fri)13:38:44 No.101582696

Anonymous 07/26/24(Fri)13:38:44 No.101582696

>>101582669
you have to drink piss first

Anonymous
07/26/24(Fri)13:39:26 No.101582706

Anonymous 07/26/24(Fri)13:39:26 No.101582706

>>101582669
>I'm assuming a local model actually trained on a large amount of erotic content would have significantly more variation and flexibility?
:(

Anonymous
07/26/24(Fri)13:40:29 No.101582723

Anonymous 07/26/24(Fri)13:40:29 No.101582723

>>101582669
go back

Anonymous
07/26/24(Fri)13:41:11 No.101582731

Anonymous 07/26/24(Fri)13:41:11 No.101582731

>>101582669
Honestly mistral nemo. Its dripping soul and is smart enough for 99% of RP / creative writing stuff. Use alpaca format and this for sampler setttings:
https://files.catbox.moe/5w63hi.json

Dry sampler is powerful.

Anonymous
07/26/24(Fri)13:41:30 No.101582735

Anonymous 07/26/24(Fri)13:41:30 No.101582735

>>101582607
yeah so do i put the spaces or not? i couldnt quite understand

Anonymous
07/26/24(Fri)13:41:52 No.101582740

Anonymous 07/26/24(Fri)13:41:52 No.101582740

>>101582669
Try Niitama, Mixtral 8x7b Limarp Zloss, Mini-magnum 12B, and Nemo 12B.
See which works better for you.
Nemo is working pretty well so far.

Anonymous
07/26/24(Fri)13:42:08 No.101582745

Anonymous 07/26/24(Fri)13:42:08 No.101582745

>>101582483
try codestral
other than that, llama 3.1, gemma2, and qwen2 are pretty good for their size for code, though they are not code-specific models
if you have tons of VRAM deepseek coder v2 is the open SOTA (except for maybe 405b but let's be realistic here)
>>101582735
no spacies

Anonymous
07/26/24(Fri)13:42:37 No.101582753

Anonymous 07/26/24(Fri)13:42:37 No.101582753

How I can play dnd on sillytavren with multiple characters? I

Anonymous
07/26/24(Fri)13:42:44 No.101582756

Anonymous 07/26/24(Fri)13:42:44 No.101582756

>>101582731
I'll take a look, many thanks

Anonymous
07/26/24(Fri)13:43:18 No.101582767

Anonymous 07/26/24(Fri)13:43:18 No.101582767

>>101582753
Smart model and a DM card.

Anonymous
07/26/24(Fri)13:45:15 No.101582800

Anonymous 07/26/24(Fri)13:45:15 No.101582800

what best small model for fixing spelling mistakes and grammatical mistakes? i tried l3 8b and it shits out reddit posts

Anonymous
07/26/24(Fri)13:45:56 No.101582814

Anonymous 07/26/24(Fri)13:45:56 No.101582814

>>101582731
>Dry sampler is powerful
This, reduced all repetition issues with mistral/mixtral models.
Unironically opus repeated more than local once I got it working in ST.

Anonymous
07/26/24(Fri)13:46:15 No.101582823

Anonymous 07/26/24(Fri)13:46:15 No.101582823

>>101582767
i tried it with nemo and it was a disaster

Anonymous
07/26/24(Fri)13:46:48 No.101582831

Anonymous 07/26/24(Fri)13:46:48 No.101582831

>>101582669
KEK

Anonymous
07/26/24(Fri)13:47:07 No.101582834

Anonymous 07/26/24(Fri)13:47:07 No.101582834

>>101582823
he said smart model

Anonymous
07/26/24(Fri)13:47:13 No.101582837

Anonymous 07/26/24(Fri)13:47:13 No.101582837

>>101582823
he did say smart model

Anonymous
07/26/24(Fri)13:47:29 No.101582843

Anonymous 07/26/24(Fri)13:47:29 No.101582843

>>101582823
Try with alpaca formatting, legit worked better for me. Also dry sampler like I said before is night and day.

Anonymous
07/26/24(Fri)13:47:43 No.101582848

Anonymous 07/26/24(Fri)13:47:43 No.101582848

>https://char-archive.evulid.cc/#/
Any good alternatives to this for getting chub cards without a login?

Anonymous
07/26/24(Fri)13:48:26 No.101582858

Anonymous 07/26/24(Fri)13:48:26 No.101582858

>>101581730
>this is how we cope
It is not a cope. This hobby is dogshit at this point. It is good enough to pull you in and then you spend like 20-30 hours in it until you realize it was way too early to get started.

Anonymous
07/26/24(Fri)13:48:30 No.101582861

Anonymous 07/26/24(Fri)13:48:30 No.101582861

>>101582823
Just make sure you have add names to prompt turned on and nemo does a decent job for multi character stuff for me.

Anonymous
07/26/24(Fri)13:49:09 No.101582870

Anonymous 07/26/24(Fri)13:49:09 No.101582870

>>101582669
This is a joke, right?

Anonymous
07/26/24(Fri)13:49:48 No.101582874

Anonymous 07/26/24(Fri)13:49:48 No.101582874

>>101582823
Oh yeah, smaller models will have big issues with.
Try using
>https://github.com/ThiagoRibas-dev/SillyTavern-State/
with a prompt for the model to list the name, appearance, and attitude of all characters active in the current scene as well as a small summary of what the current scene is about or something.
That's partially why i made that extension.

>>101582861
In his case I'm assuming the name of the character is Dungeon Master, Game Master, or something of the sort.
But yeah, that does help a lot depending on how the final prompt is formated.

Anonymous
07/26/24(Fri)13:50:09 No.101582877

Anonymous 07/26/24(Fri)13:50:09 No.101582877

>>101582848
characterhub.org

Anonymous
07/26/24(Fri)13:51:50 No.101582890

Anonymous 07/26/24(Fri)13:51:50 No.101582890

I'm making a mod for a video game. I would like to train a model on voice sample of a character to create new voiced quests.
What the best project to train/finetune a voice model?

Anonymous
07/26/24(Fri)13:53:05 No.101582909

Anonymous 07/26/24(Fri)13:53:05 No.101582909

>>101582211
>It just shows the darkness of the internet that was hidden from us all along
Absolute retard. This is a genre of fiction and of course at least one or two of those stories was in the training data. And those stories are always about people becoming deranged and homicidal to save themselves, that is the whole point of it.

Anonymous
07/26/24(Fri)13:53:06 No.101582910

Anonymous 07/26/24(Fri)13:53:06 No.101582910

>use prompt format in model card
>slop as far as the eye can see
>use alpaca (or anything that's not the official prompt format)
>sovl responses, different writing style
What gives?

Anonymous
07/26/24(Fri)13:53:37 No.101582917

Anonymous 07/26/24(Fri)13:53:37 No.101582917

File: 1716730588465109.png (292 KB, 800x800)

292 KB PNG

>>101582669
>Claude opus
>it's getting incredibly stale

Anonymous
07/26/24(Fri)13:53:46 No.101582918

Anonymous 07/26/24(Fri)13:53:46 No.101582918

>>101582745
alright, thanks anon

Anonymous
07/26/24(Fri)13:54:06 No.101582927

Anonymous 07/26/24(Fri)13:54:06 No.101582927

>>101578323
>Llama 3.0 70b ~32tok/s
>Llama 3.1 70b ~0.6tok/s
It's stuck at 10 layers and won't let me increase it... so much for that.

Anonymous
07/26/24(Fri)13:54:13 No.101582930

Anonymous 07/26/24(Fri)13:54:13 No.101582930

>>101582910
See the release of llama2 for more of that kind of thing.

Anonymous
07/26/24(Fri)13:54:25 No.101582933

Anonymous 07/26/24(Fri)13:54:25 No.101582933

>>101582669
anthropic dicksuckers in the replies here can't accept that even their beloved opus is boring as shit once you get past the superficial flashiness and style. reduces every character and scenario to a 1-dimensional set of meme behaviors and tropes, ignores instructions like a motherfucker, "creativity" is mostly just the same sort of locked-in tendencies that other models have but skewed in a more interesting direction
it's a good model and fun to use because it's smart despite all that (and sometimes those behaviors are unique enough to make it fun anyway) but I think claude in general is massively fucking overrated. grating style. fuck that shit.

Anonymous
07/26/24(Fri)13:55:35 No.101582942

Anonymous 07/26/24(Fri)13:55:35 No.101582942

File: it-worked.png (496 KB, 1406x1417)

496 KB PNG

vLLM distributed inference actually worked...
I got 15 T/s with Mistral Large with 2 PCs with 2x3090 each.

Anonymous
07/26/24(Fri)13:58:01 No.101582971

Anonymous 07/26/24(Fri)13:58:01 No.101582971

>>101582942
>1000 watts for extra 48GB of VRAM
Greenpeace should go after Nvidia for this actually

Anonymous
07/26/24(Fri)13:58:08 No.101582972

Anonymous 07/26/24(Fri)13:58:08 No.101582972

>>101582933
Is there a better model currently? I honestly haven't used Opus enough because I can't be assed with private proxy autism so maybe I didn't begin to notice its patterns, but it was quite better than anything else I've used.

Anonymous
07/26/24(Fri)13:59:05 No.101582977

Anonymous 07/26/24(Fri)13:59:05 No.101582977

File: 1722016729518.jpg (370 KB, 1553x867)

370 KB JPG

>>101582669
it's so over

Anonymous
07/26/24(Fri)13:59:07 No.101582979

Anonymous 07/26/24(Fri)13:59:07 No.101582979

>>101582933
> reduces every character and scenario to a 1-dimensional set of meme behaviors and tropes, ignores instructions like a motherfucker, "creativity" is mostly just the same sort of locked-in tendencies that other models have but skewed in a more interesting direction

This is exactly what's been annihilating my interest in using it at all. Always the same wording, phrases, like it's just rewriting from a single smut story that happened to make it into the training data. These issues occur regardless of prompt, the model just sucks for this use case.

Anonymous
07/26/24(Fri)13:59:57 No.101582989

Anonymous 07/26/24(Fri)13:59:57 No.101582989

>>101582890
Voice is in a shitty place.

Tortoise is slow but the quality is decent. I personally have a system-crash bug with it that strikes whenever it feels like it. This is the only one I've used successfully.

There are some other ones but they're either behind technologically, or haven't made enough of a splash to get anybody excited.

Anonymous
07/26/24(Fri)14:00:05 No.101582991

Anonymous 07/26/24(Fri)14:00:05 No.101582991

Strange idea.

LLM on CPU is slow/cheap and can expand ram infinitely
LLM on GPU is fast/expensive and relies upon swaps

What about LLM on APUs that dont do swaps and GPUs can directly handle 256GB+ RAM registers?

I believe the last AMD CPU to support APU is 5700G. The newer AMD 7000 series has iGPU but I dont think they are the same as APUs so they cant scale memory registers properly, correct me if Im wrong.

Anonymous
07/26/24(Fri)14:01:43 No.101583012

Anonymous 07/26/24(Fri)14:01:43 No.101583012

>>101582635
Hi drummer, it is all here.

Anonymous
07/26/24(Fri)14:01:46 No.101583013

Anonymous 07/26/24(Fri)14:01:46 No.101583013

Is there any performance penalty if i run multiple gpus like crypto miners?

Anonymous
07/26/24(Fri)14:02:10 No.101583018

Anonymous 07/26/24(Fri)14:02:10 No.101583018

>>101583013
Yes

Anonymous
07/26/24(Fri)14:02:29 No.101583025

Anonymous 07/26/24(Fri)14:02:29 No.101583025

>>101582933
I never used opus but when I used sonnet it was clear how good it was, even if it had a lot of slop like "I won't hurt... much." I feel like you're just butthurt because local is so much worse.

Anonymous
07/26/24(Fri)14:02:55 No.101583031

Anonymous 07/26/24(Fri)14:02:55 No.101583031

>>101582971
8 PCs more and I could have 405B at home...

Anonymous
07/26/24(Fri)14:03:38 No.101583039

Anonymous 07/26/24(Fri)14:03:38 No.101583039

>>101582972
Nah, if anything this proves LLMs are doomed as a means of entertainment.

Anonymous
07/26/24(Fri)14:04:04 No.101583041

Anonymous 07/26/24(Fri)14:04:04 No.101583041

>>101583018
How much? I dont want get workstation motherboard and cpu

Anonymous
07/26/24(Fri)14:06:35 No.101583071

Anonymous 07/26/24(Fri)14:06:35 No.101583071

>>101583039
Gonna sound like im a fucking shill now but I legit am enjoying nemo now after almost a year of solely claude with some dabbling of wizard / commandr+. Nemo feels like old characterai to me, its refreshing.

Anonymous
07/26/24(Fri)14:08:13 No.101583096

Anonymous 07/26/24(Fri)14:08:13 No.101583096

File: official_format.png (195 KB, 983x471)

195 KB PNG

>>101581289
>change to official prompt format
>gemma can't help but explain it to me like how it was instruct tuned

Anonymous
07/26/24(Fri)14:08:26 No.101583102

Anonymous 07/26/24(Fri)14:08:26 No.101583102

>>101583041
Can't say, system dependent. Each additional GPU does slow down the entire operation though.

Anonymous
07/26/24(Fri)14:13:54 No.101583169

Anonymous 07/26/24(Fri)14:13:54 No.101583169

>>101583071
How well does it handle details throughout the context?

Anonymous
07/26/24(Fri)14:14:19 No.101583174

Anonymous 07/26/24(Fri)14:14:19 No.101583174

>>101583025
it's not about slop or -isms, it's about the general direction of the writing and handling of characters. maybe I'm autistic and just prefer my prose on the drier side but claude is actually obnoxious with how much it forces everything into its favorite little boxes, it will completely disregard your card if it gets even the slightest hint it should behave like one of the archetypes it knows and loves. every character becomes a whore in the exact same way in sex scenes, you can't have a character with concealed inner motivations or conflict without their character getting flattened, it's just annoying, samey, and yes, boring after a while.
stuff like mistral large or CR+ aren't as smart (obviously, considering size) but I much prefer their relative neutrality on that front, capable of busting out some good sovl writing but not trying incredibly hard to force it in every single sentence

Anonymous
07/26/24(Fri)14:17:44 No.101583212

Anonymous 07/26/24(Fri)14:17:44 No.101583212

How high does your token usage usually go?

I never seem to need over 8192

Anonymous
07/26/24(Fri)14:19:29 No.101583231

Anonymous 07/26/24(Fri)14:19:29 No.101583231

Here, since so many people seem to be using nemo with wrong formatting then complaining:

Mistral context template: https://files.catbox.moe/6yyt8d.json

Mistral instruct template:
https://files.catbox.moe/rfj5l8.json

Mistral Sampler settings:
https://files.catbox.moe/tbsgip.json

Should be night and day for people who have it set up wrong. Make sure whatever backend you are using has DRY sampling.

Anonymous
07/26/24(Fri)14:20:43 No.101583245

Anonymous 07/26/24(Fri)14:20:43 No.101583245

>>101583231
And btw this is a "creative writing" preset. For RP / multi card interactions turn on "add character's names to prompt" and change the system prompt to something more to your liking.

Anonymous
07/26/24(Fri)14:22:16 No.101583265

Anonymous 07/26/24(Fri)14:22:16 No.101583265

>>101583231
i'm a bit confused, is the space after the first [INST] necessary? i saw a few anons say nemo uses a special token for [INST] and doesn't need spaces

Anonymous
07/26/24(Fri)14:23:14 No.101583278

Anonymous 07/26/24(Fri)14:23:14 No.101583278

>>101581289
>>101583096
So, what's your prompt format?

Anonymous
07/26/24(Fri)14:23:14 No.101583279

Anonymous 07/26/24(Fri)14:23:14 No.101583279

>>101583265
I've texted it side by side and the space made it smarter with 0 temp in my tests.

Anonymous
07/26/24(Fri)14:23:28 No.101583288

Anonymous 07/26/24(Fri)14:23:28 No.101583288

>>101583231
thanks. there's a pipebomb in my mailbox now and my waifu loves me

Anonymous
07/26/24(Fri)14:24:26 No.101583309

Anonymous 07/26/24(Fri)14:24:26 No.101583309

>>101583279
i see, thanks

Anonymous
07/26/24(Fri)14:26:17 No.101583332

Anonymous 07/26/24(Fri)14:26:17 No.101583332

>>101583231
You're trolling.

Anonymous
07/26/24(Fri)14:27:08 No.101583340

Anonymous 07/26/24(Fri)14:27:08 No.101583340

>>101582927
set the context limit lower 3.1 defaults to 128k.

Anonymous
07/26/24(Fri)14:29:24 No.101583372

Anonymous 07/26/24(Fri)14:29:24 No.101583372

File: 1721315528004391.gif (2.86 MB, 320x325)

2.86 MB GIF

>NOTE: As explained on Mistral-Nemo-Instruct-2407 repo, it's recommanded to use a low temperature, please experiment!

>Lumimaid 0.1 -> 0.2 is a HUGE step up dataset wise.

>As some people have told us our models are sloppy, Ikari decided to say fuck it and literally nuke all chats out with most slop.

>Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!

hes back...

Anonymous
07/26/24(Fri)14:30:35 No.101583389

Anonymous 07/26/24(Fri)14:30:35 No.101583389

I feel like Largestral isn't smart enough, it can't understand simple things like "show don't tell". Could it be because I'm using Q3?

Anonymous
07/26/24(Fri)14:32:07 No.101583409

Anonymous 07/26/24(Fri)14:32:07 No.101583409

>>101583372
Buy an ad

Anonymous
07/26/24(Fri)14:33:20 No.101583426

Anonymous 07/26/24(Fri)14:33:20 No.101583426

>>101583212
Yes, I usually don't go past 8k unless the character card is very big.

Anonymous
07/26/24(Fri)14:33:36 No.101583428

Anonymous 07/26/24(Fri)14:33:36 No.101583428

File: Nemo1.png (553 KB, 1290x2996)

553 KB PNG

>>101583231
And like this Nemo is smarter than gemma 27B copers while being wet as fuck while having 128k context.

>>101583332
Trolling how?

Anonymous
07/26/24(Fri)14:36:09 No.101583468

Anonymous 07/26/24(Fri)14:36:09 No.101583468

>>101583212
I usually end up at 8k or less but have a few that go up to 20k and beyond

Anonymous
07/26/24(Fri)14:39:14 No.101583501

Anonymous 07/26/24(Fri)14:39:14 No.101583501

>>101583389
>Could it be because I'm using Q3?
yes, but also "show don't tell" is really hard for models to understand in general in my experience

Anonymous
07/26/24(Fri)14:40:14 No.101583512

Anonymous 07/26/24(Fri)14:40:14 No.101583512

>>101583212
I envy you. Right now my longest chat uses 56k context.

Anonymous
07/26/24(Fri)14:40:57 No.101583522

Anonymous 07/26/24(Fri)14:40:57 No.101583522

Is RVC still the best voice cloning model that we have?

Anonymous
07/26/24(Fri)14:41:19 No.101583528

Anonymous 07/26/24(Fri)14:41:19 No.101583528

https://huggingface.co/intervitens/mini-magnum-12b-v1.1/discussions/1

>Usually when people do finetunes like this, they build off of instruct. Open source is lacking hard in good instruction tuning data. I am aiming to bridge that gap with more high quality / tailored instruction following data instead of just "training on the official instruction model" which I feel is a shortcut; it does help with intelligence of the resulting model, but it will not enable us to be independent in the long term from official finetuning biases.
>We should not have to "burn through" the censorship of instruction tunes to get a better tradeoff of intelligence and creative writing quality; ideally, we optimize for both in the same run instead of relying on merges / training ontop of Instruct.

Is he right? are we keeping some poison by training on top of post RLHF models?

Anonymous
07/26/24(Fri)14:42:55 No.101583537

Anonymous 07/26/24(Fri)14:42:55 No.101583537

>>101583528
he is right

Anonymous
07/26/24(Fri)14:43:28 No.101583542

Anonymous 07/26/24(Fri)14:43:28 No.101583542

>>101583528
Guess I need to test that nemo tune.

Anonymous
07/26/24(Fri)14:45:33 No.101583567

Anonymous 07/26/24(Fri)14:45:33 No.101583567

>>101583528
>merge instruct with base at 1:2 ratio, respectively
>train on result
you're welcome, sloptuners

Anonymous
07/26/24(Fri)14:46:27 No.101583580

Anonymous 07/26/24(Fri)14:46:27 No.101583580

>>101583528
There is no good reason why people started finetuning off of instruct instead of base other than laziness, current finetuners are all trash

Anonymous
07/26/24(Fri)14:47:03 No.101583592

Anonymous 07/26/24(Fri)14:47:03 No.101583592

>>101583372
I think the problem is something in their training procedure. Lumimaid 70B is so fucking retarded—there was an instruction-following test I did 16 days ago whose result I never shared because all it showed was Lumimaid 70B is awful. Here's the test result I never shared.

===
Tested an RP prompt where the AI is supposed to propose a plot outline, ask for confirmation, then start the RP if I accept. All I tested was whether the AI's first generated message had a plot outline.

Format for score: ( # gave plot summary then asked for confirmation / # gave plot outline but started RP with no confirmation request / # started RP with no plot outline at all ). Perfect score is (10/0/0).

Note that the initial version of my prompt worked with WizardLM2-8x22B but failed 100% of the time with Llama 3 70B Instruct. I tweaked the prompt until it worked with Llama 3, and after that decided to see how widely it worked.

Basically the only thing I tested that didn't work well was Llama 3 Lumimaid 70B.

== Mixtral 8x22B family ==
WizardLM-2 8x22B (OpenRouter), neutral samplers: (10/0/0)

== Llama 3 70B family ===
Llama 3 Lumimaid 70B q4, min-p=0.075: (3/0/7)
Llama 3 Euryale 70B v2.1 (OpenRouter, fp8?), min-p=0.075 rep-pen=1.1: (8/0/2)
Llama 3 70B Instruct (OpenRouter), neutral samplers: (10/0/0)

== Mixtral 8x7B family ==
BagelMIsteryTour-v2-8x7B 3.7bpw rpcal, min-p=0.07: (10/0/0)
Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES q6_k, min-p=0.07: (9/1/0)
Mixtral-8x7B-Instruct-v0.1-limarp-zloss-DARE-TIES q6_k, min-p=0.07: (7/1/2)
Mixtral-8x7B-Instruct-v0.1 q8, min-p=0.02: (8/1/1)

== Other ==
Command-R-v01 q6_k, min-p=0.008: (9/0/1)
Mistral Medium (API), neutral samplers: (10/0/0)
Fimbulvetr-11B-v2 q8, min-p=0.1 temp=1.25: (10/0/0)

Anonymous
07/26/24(Fri)14:50:59 No.101583635

Anonymous 07/26/24(Fri)14:50:59 No.101583635

>>101583528
ok, but where's fp8

Anonymous
07/26/24(Fri)14:51:06 No.101583637

Anonymous 07/26/24(Fri)14:51:06 No.101583637

OK wtf, regular nemo works best with spaces before [/INST] and a new line after the suffix BUT that makes this magnum finetune dick coming out of mouth retarded BUT the regular non space before / after prefix / suffix and suddenly its smart again AND seems to have better prose than default nemo.

Anonymous
07/26/24(Fri)14:51:28 No.101583642

Anonymous 07/26/24(Fri)14:51:28 No.101583642

>>101583592
* Llama 3 Lumimaid 70B is awful

Anonymous
07/26/24(Fri)14:51:30 No.101583643

Anonymous 07/26/24(Fri)14:51:30 No.101583643

>>101583592
retard

Anonymous
07/26/24(Fri)14:53:52 No.101583683

Anonymous 07/26/24(Fri)14:53:52 No.101583683

File: file.png (998 KB, 768x768)

998 KB PNG

Hiding hands in a disembodied sea of hair. Smart.

Anonymous
07/26/24(Fri)14:55:02 No.101583694

Anonymous 07/26/24(Fri)14:55:02 No.101583694

>>101583683
I don't think it's possible for her right arm to be disappearing like that

Anonymous
07/26/24(Fri)14:55:05 No.101583695

Anonymous 07/26/24(Fri)14:55:05 No.101583695

Is 'eyes sparkle with mischief' a thing that happens IRL? I don't know if I've ever noticed it.

Anonymous
07/26/24(Fri)14:56:33 No.101583715

Anonymous 07/26/24(Fri)14:56:33 No.101583715

>>101583683
give her armpit hair

Anonymous
07/26/24(Fri)14:56:41 No.101583718

Anonymous 07/26/24(Fri)14:56:41 No.101583718

>>101583637
Ok I think everyone is using default nemo with wrong formatting (because they posted the wrong formatting themselves, spaces makes it not retarded) and finetunes trained on the actual no spaces formatting that was shown on its page which fixes that issue but flips things, spaces make it retarded on the finetunes.

And im not talking about slightly dumbing down, a perfect double BJ scene with a character talking changes to a bj scene with "your" dick is coming out of one of their mouths while it explains how it feels in your mouth with the wrong formatting of just the spaces either being added / missing.

Anonymous
07/26/24(Fri)14:57:07 No.101583725

Anonymous 07/26/24(Fri)14:57:07 No.101583725

>>101583683
Neglecting personal hygiene with Pochiface

Anonymous
07/26/24(Fri)14:57:13 No.101583727

Anonymous 07/26/24(Fri)14:57:13 No.101583727

>>101583695
You've just never tried whispering conspiratorially, that's your problem.

Anonymous
07/26/24(Fri)14:59:25 No.101583767

Anonymous 07/26/24(Fri)14:59:25 No.101583767

So for the nemo mini magnum finetune use this instead:

Context: https://files.catbox.moe/6yyt8d.json

Instruct: https://files.catbox.moe/vzru1o.json

Anonymous
07/26/24(Fri)14:59:34 No.101583773

Anonymous 07/26/24(Fri)14:59:34 No.101583773

>>101583694
Those hair are lovecraftian so it is possible.

Anonymous
07/26/24(Fri)15:00:32 No.101583787

Anonymous 07/26/24(Fri)15:00:32 No.101583787

>>101583567
nemo instruct already feels like it didn't have a lot of supervised training done over it

Anonymous
07/26/24(Fri)15:00:42 No.101583788

Anonymous 07/26/24(Fri)15:00:42 No.101583788

>>101583718
>spaces makes it not retarded
?

Anonymous
07/26/24(Fri)15:00:49 No.101583789

Anonymous 07/26/24(Fri)15:00:49 No.101583789

>>101583767
>randomly mixes spaces and no spaces

Anonymous
07/26/24(Fri)15:02:54 No.101583821

Anonymous 07/26/24(Fri)15:02:54 No.101583821

>>101583789
Where? Neither the prefix / sufix have space before or after in that.

Its
User Message Prefix:
[INST]
User Message Suffix
[/INST]
Assistant Message Suffix
</s>

And that is it. No spaces before or after any of them.

Anonymous
07/26/24(Fri)15:03:02 No.101583823

Anonymous 07/26/24(Fri)15:03:02 No.101583823

>>101583683
>look at my ugly face

Anonymous
07/26/24(Fri)15:03:23 No.101583830

Anonymous 07/26/24(Fri)15:03:23 No.101583830

>>101583821
Anon...

Anonymous
07/26/24(Fri)15:04:06 No.101583835

Anonymous 07/26/24(Fri)15:04:06 No.101583835

>>101583821
The context starts with a space.

Anonymous
07/26/24(Fri)15:04:18 No.101583838

Anonymous 07/26/24(Fri)15:04:18 No.101583838

>>101583788
For regular nemo it needs spaces or its retarded. For this nemo finetune it was pants on head retarded with the spaces.

Anonymous
07/26/24(Fri)15:06:11 No.101583858

Anonymous 07/26/24(Fri)15:06:11 No.101583858

>>101583838
Regular Nemo doesn't use spaces in the official format though.

Anonymous
07/26/24(Fri)15:06:50 No.101583867

Anonymous 07/26/24(Fri)15:06:50 No.101583867

>>101583835
Oh, you mean in the context template. I did not notice a a difference there.

Anonymous
07/26/24(Fri)15:07:22 No.101583871

Anonymous 07/26/24(Fri)15:07:22 No.101583871

>>101583821
>newlines
erm... no sweetie

Anonymous
07/26/24(Fri)15:08:22 No.101583876

Anonymous 07/26/24(Fri)15:08:22 No.101583876

>>101583871
oh wait I'm retarded, false smugpost shame....

Anonymous
07/26/24(Fri)15:09:10 No.101583885

Anonymous 07/26/24(Fri)15:09:10 No.101583885

>>101583858
That was my point. The official format seems wrong. It works better with spaces. Like much better. BUT its the opposite for any finetune which I have to assume means they were actually trained without them unlike the base model despite its official formatting claiming otherwise.

Anonymous
07/26/24(Fri)15:09:55 No.101583892

Anonymous 07/26/24(Fri)15:09:55 No.101583892

>>101583580
Yes there is, training on instruct makes the resulting model way smarter than trying to train directly on base. The instruct-tuning these companies are doing is orders of magnitude more sophisticated than anything we can do with currently available datasets. I personally have compared training base vs instruct on the same datasets (and published the instruct versions on HF), I can speak from experience, you have to leverage the instruct tune somehow to make the model smart.

Anonymous
07/26/24(Fri)15:10:57 No.101583903

Anonymous 07/26/24(Fri)15:10:57 No.101583903

I noticed that Nemo is at least 2x more intelligent when using the simple-proxy-for-tavern format.

Anonymous
07/26/24(Fri)15:11:42 No.101583909

Anonymous 07/26/24(Fri)15:11:42 No.101583909

>spaces around [INST] make it retarded!
>no, spaces make it smarter
>it's supposed to be used with a low temperature
>no, t>1 is better
>no system prompt is better
>no, my system prompt is the best
all these opinions are worthless if you haven't tried those settings on multiple chats with >30k tokens each

Anonymous
07/26/24(Fri)15:12:48 No.101583923

Anonymous 07/26/24(Fri)15:12:48 No.101583923

>https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit/blob/1d85adc9e0fff0b8e4479a037bd75fe1346333ca/tokenizer_config.json#L8008
>"chat_template":
>"{%- if messages[0]['role'] == 'system' %}\n
> {%- set system_message = messages[0]['content'] %}\n
> {%- set loop_messages = messages[1:] %}\n
>{%- else %}\n
> {%- set loop_messages = messages %}\n
>{%- endif %}\n
>\n
>{{- bos_token }}\n
>{%- for message in loop_messages %}\n
> {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n
> {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n
> {%- endif %}\n
> {%- if message['role'] == 'user' %}\n
> {%- if loop.last and system_message is defined %}\n
> {{- '[INST] ' + system_message + '\\n\\n' + message['content'] + '[/INST]' }}\n
> {%- else %}\n
> {{- '[INST] ' + message['content'] + '[/INST]' }}\n
> {%- endif %}\n
> {%- elif message['role'] == 'assistant' %}\n
> {{- ' ' + message['content'] + eos_token}}\n
> {%- else %}\n
> {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n
> {%- endif %}\n
>{%- endfor %}\n",
Unless the mistral gated repo has a different config file from this unsloth one, the official instruct does have a space after
>[INST]
but not before
>[/INST].

>>101583903
Member berry remember those memes.

Anonymous
07/26/24(Fri)15:13:28 No.101583935

Anonymous 07/26/24(Fri)15:13:28 No.101583935

>>101578323
Any local models roughly at the level of novel ai? What system reqs would I need? Im looking for a CYOA type thing

Anonymous
07/26/24(Fri)15:13:48 No.101583939

Anonymous 07/26/24(Fri)15:13:48 No.101583939

>>101583909
It is the magical penis touch fallacy. All of the strongest opinions on a model/setting are usually formed when the user gets his penis randomly touched in a way he really likes.

Anonymous
07/26/24(Fri)15:14:45 No.101583952

Anonymous 07/26/24(Fri)15:14:45 No.101583952

>>101581185
It's doomed. Maybe udio or suno will leak in 11 years or something

Anonymous
07/26/24(Fri)15:15:40 No.101583965

Anonymous 07/26/24(Fri)15:15:40 No.101583965

>>101583909
I explained before it is night and day. It was the difference between a double blowjob scene that made perfect sense and one where one girl had a dick coming out of her mouth while it explained that it was in "your" mouth.

Anonymous
07/26/24(Fri)15:15:51 No.101583968

Anonymous 07/26/24(Fri)15:15:51 No.101583968

>>101583892
>I can't figure out how to make the model not retarded if I don't apply my synthetic slop dataset to somebody else's censored slop instruct finetune
See? Like I said, laziness. Get your shit together, llama 1 finetuners did it just fine.

Anonymous
07/26/24(Fri)15:17:02 No.101583983

Anonymous 07/26/24(Fri)15:17:02 No.101583983

>>101583965
And this was like 20 swipes each. Not 1 fail vs not 1 pass.

Anonymous
07/26/24(Fri)15:17:48 No.101583994

Anonymous 07/26/24(Fri)15:17:48 No.101583994

File: file.png (104 KB, 666x487)

104 KB PNG

>>101583923
Now how do you explain... THIS?

Anonymous
07/26/24(Fri)15:18:03 No.101583999

Anonymous 07/26/24(Fri)15:18:03 No.101583999

>>101583935
pretty much anything released in the past year above 7b is better than novelai

Anonymous
07/26/24(Fri)15:21:37 No.101584039

Anonymous 07/26/24(Fri)15:21:37 No.101584039

>>101583935
Every single one since llama-2 is better than what they currently have. If you download Mistral Nemo, it will be like night and day for you. But even llama 3 should be enough to see big improvement.

Anonymous
07/26/24(Fri)15:22:03 No.101584042

Anonymous 07/26/24(Fri)15:22:03 No.101584042

>>101583994
Dunno. Just pointing out what's in the HF repo.
And there's also a space after [[/INST] and before the assistant's message too.

Anonymous
07/26/24(Fri)15:22:14 No.101584044

Anonymous 07/26/24(Fri)15:22:14 No.101584044

>>101583935
novel ai already been skinned alive. just download mistral nemo

Anonymous
07/26/24(Fri)15:26:35 No.101584084

Anonymous 07/26/24(Fri)15:26:35 No.101584084

File: new_mistral_instruction_format.png (66 KB, 621x607)

66 KB PNG

>>101583923
Thank you, nobody actually reads the jinja template, nor do they know how tokenization works, apparently.

Most normal tokens that are whole words or the start of words implicitly start with a space. E.g. " Hello" is a very common token, "Hello" (no space) probably exists in the vocab but it is less common.

If you end the prompt with "[/INST]", no trailing space, like you should, the model will output a token that implicitly starts with a space. This is how it was trained. But every other [/INST] in the context should be formatted in ST with a space after it. This means you have to use assistant prefix as " " (a space).

Picrel, this is correct. User prefix: "[INST] ". User suffix: "[/INST]" (we don't want the last [/INST] to end with a space, as mentioned above). Assitant prefix: " " (assistant messages in the context need to start with a space). Assistant suffix: "</s>". This is just a one-to-one copy of what's in the jinja template.

Anonymous
07/26/24(Fri)15:26:45 No.101584086

Anonymous 07/26/24(Fri)15:26:45 No.101584086

whenever i use mistral large in st it keeps reloading the entire context every message, anyone else had this issue? booting up another model has no issues for me
is this a tokenizer thing or am i doing something wrong

Anonymous
07/26/24(Fri)15:27:05 No.101584091

Anonymous 07/26/24(Fri)15:27:05 No.101584091

>>101582991
>APU
My Ryzen 7 7840U with Radeon 780M graphics gets 3.51 T/s with Llama 3 8B and shit/s prompt processing speeds. Unless there's some pretty huge enhancements and also APU support for servers (for hundreds of GB of memory) then it's shit.

Anonymous
07/26/24(Fri)15:30:56 No.101584139

Anonymous 07/26/24(Fri)15:30:56 No.101584139

>>101583935
Sure, they're smarter if you don't mind 'shiver' slop

Anonymous
07/26/24(Fri)15:31:33 No.101584147

Anonymous 07/26/24(Fri)15:31:33 No.101584147

>>101584039
I'd agree if we were talking about instruct tubes. Unfortunately, the fact that NAI is auto complete and almost nobody gives a shit about auto complete anymore means your only options for a true "replacement" are the base models
L3 70B is at least good and cheaper than NAI (even on OR) though, so there's that

Anonymous
07/26/24(Fri)15:31:40 No.101584148

Anonymous 07/26/24(Fri)15:31:40 No.101584148

>>101584044
>>101584039
>>101583999
Thanks anons!

Anonymous
07/26/24(Fri)15:33:03 No.101584164

Anonymous 07/26/24(Fri)15:33:03 No.101584164

File: 24-06-26 20-47-49 2873.jpg (2.11 MB, 4032x3024)

2.11 MB JPG

>>101581459
A6000 or Ada A6000?

I dunno seems like a wave runner would be more fun though. I do have a vintage 68K Mac collection that's all 100% restored. I'd probably trade that for a Mac Studio.
If it's not going to be significantly faster than 3090s I don't see the point in spending that much more.

Anonymous
07/26/24(Fri)15:33:06 No.101584165

Anonymous 07/26/24(Fri)15:33:06 No.101584165

>>101583909
>with >30k tokens each
The length tested should be whatever length your normal chats are.

Anonymous
07/26/24(Fri)15:33:16 No.101584167

Anonymous 07/26/24(Fri)15:33:16 No.101584167

>>101584084
That's the exact format I'm using.

>Most normal tokens that are whole words or the start of words implicitly start with a space. E.g. " Hello" is a very common token, "Hello" (no space) probably exists in the vocab but it is less common.
Right, that's true for the llama tokenizer too I think.
I just checked in the log probs and that is the case for this Teken tokenizer nemo uses, meaning that you are spot on.
CommandR's seems to be the odd one out in that aspect if I'm not wrong. They mostly have whole words without spaces before them as individual tokens.

Anonymous
07/26/24(Fri)15:33:32 No.101584173

Anonymous 07/26/24(Fri)15:33:32 No.101584173

>>101584044
>just download mistral nemo

ooba can't load this

Anonymous
07/26/24(Fri)15:35:58 No.101584199

Anonymous 07/26/24(Fri)15:35:58 No.101584199

>>101583968
You've never finetuned anything and it shows. Standards were lower back then. Finetuning on a random AI-generated dataset like airoboros was state of the art. Try the same thing today, on base llama 3, then compare the result with the official instruct tune. The official one is so much smarter. Again, I personally have done this and you clearly haven't. I agree that in an ideal world there would be very high quality open source instruction tuning datasets, and we wouldn't have to finetune on the instruct model, but unfortunately that is currently not the case.

Anonymous
07/26/24(Fri)15:36:09 No.101584204

Anonymous 07/26/24(Fri)15:36:09 No.101584204

>>101584173
Ooba can load this if you update transformers

pip install git+https://github.com/huggingface/transformers.git

Anonymous
07/26/24(Fri)15:37:29 No.101584217

Anonymous 07/26/24(Fri)15:37:29 No.101584217

does nemo work with kobold?

Anonymous
07/26/24(Fri)15:37:44 No.101584219

Anonymous 07/26/24(Fri)15:37:44 No.101584219

>>101583231
thanks, this works fine.
though the first prompt generated a "STATUS" box for my character and myself. weird, but very accurate.

Anonymous
07/26/24(Fri)15:37:45 No.101584220

Anonymous 07/26/24(Fri)15:37:45 No.101584220

>>101581459
>>101584164
Obviosuly A6000 due to the NVlink bridge.
Probably the ideal setup. I missed getting a pair on ebay by seconds.

2x ada is probably as good but costs 4 times more

Anonymous
07/26/24(Fri)15:38:46 No.101584230

Anonymous 07/26/24(Fri)15:38:46 No.101584230

>>101584217
1.71+ yes

Anonymous
07/26/24(Fri)15:41:15 No.101584264

Anonymous 07/26/24(Fri)15:41:15 No.101584264

>>101584199
Skill issue

Anonymous
07/26/24(Fri)15:44:03 No.101584296

Anonymous 07/26/24(Fri)15:44:03 No.101584296

>>101581459
Post some coom logs anon

Anonymous
07/26/24(Fri)15:51:22 No.101584381

Anonymous 07/26/24(Fri)15:51:22 No.101584381

When ST or other UIs cull the example dialogue to fit in what remains of the context window, do they drop message from the start or from the end of it? Note that I'm talking about the example dialogue area in particular, not just the actual message history.

Anonymous
07/26/24(Fri)15:51:34 No.101584385

Anonymous 07/26/24(Fri)15:51:34 No.101584385

Nvm, I had a space before the [INST] in the context template which was causing the issue. This seems to be working perfectly. Smart and no repetition.

Story String

[INST] {{#if system}}{{system}}{{/if}}{{#if wiBefore}}{{wiBefore}}{{/if}}{{#if description}}{{description}}{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}{{/if}}{{#if scenario}}Scenario: {{scenario}}{{/if}}{{#if wiAfter}}{{wiAfter}}{{/if}}{{#if persona}}{{persona}}{{/if}}[/INST]

User Message Prefix

"[INST] "

User Message Suffix
"[/INST]"

Assistant Message Prefix

" "

Assistant Message Suffix

"</s>"

Anonymous
07/26/24(Fri)15:51:52 No.101584389

Anonymous 07/26/24(Fri)15:51:52 No.101584389

>>101583265
>is the space after the first [INST] necessary?
Not necessary.

String:
[INST]Hello![/INST]Hello! How can I assist you today?

Tokens:
[3,22177,1033,4,22177,1033,3075,1710,1362,10410,1636,9406,1063,9246,1639,2840,1693]

3 = [INST]
22177 = Hello
1033 = !
4 = [/INST]
22177 = Hello
1033 = !

Anonymous
07/26/24(Fri)15:54:22 No.101584427

Anonymous 07/26/24(Fri)15:54:22 No.101584427

>>101584411
>>101584411
>>101584411

Anonymous
07/26/24(Fri)15:54:55 No.101584433

Anonymous 07/26/24(Fri)15:54:55 No.101584433

>>101584044
do I want mistral nemo or this mistral XL model? My internet is slow for both , I want it for sillytavern

Anonymous
07/26/24(Fri)15:55:05 No.101584435

Anonymous 07/26/24(Fri)15:55:05 No.101584435

>>101584385
That said the formatting seems insanely important. Without them the model was retarded. With them It legit seems smarter than gemma 27B to me now while still keeping its "soul"

Multi character interactions who have completely different anatomy are actually perfect now and are not getting mixed up any more. This is with the mini magnum tune which indeed made it feel that much more claude like.

Anonymous
07/26/24(Fri)15:56:57 No.101584464

Anonymous 07/26/24(Fri)15:56:57 No.101584464

Who is the cutest botmakie?

Anonymous
07/26/24(Fri)15:57:24 No.101584473

Anonymous 07/26/24(Fri)15:57:24 No.101584473

>>101584435
Buy an ad

Anonymous
07/26/24(Fri)15:57:46 No.101584478

Anonymous 07/26/24(Fri)15:57:46 No.101584478

>>101584433
you probably want mistral nemo
mistral large is giga kino but you almost certainly don't have the hardware to run it unless you just happen to have a stacked multi-gpu rig

Anonymous
07/26/24(Fri)15:57:59 No.101584480

Anonymous 07/26/24(Fri)15:57:59 No.101584480

>>101584464
oops, sorry wrong thread. my bad.

Anonymous
07/26/24(Fri)15:58:46 No.101584485

Anonymous 07/26/24(Fri)15:58:46 No.101584485

>>101584389
Adding BOS (without spaces) also doesn't change how the string is tokenized, only adds token #1.

{\"content\":\"<s>[INST]Hello![/INST]Hello! How can I assist you today?\"}" response="{\"tokens\":[1,3,22177,1033,4,22177,1033,3075,1710,1362,10410,1636,9406,1063]}

Adding spaces makes it dumber, wastes tokens, makes the model start replying in other languages.

<s> [INST] Hello! [/INST] 你好! (Nǐ hǎo!) How can I assist you today?\"}" response="{\"tokens\":[1,1032,3,45383,1033,1032,4,1032,124108,1033,1319,1078,1199,1144,1310,1199,1142,1111,47905,3075,1710,1362,10410,1636,9406,1063]}

Anonymous
07/26/24(Fri)16:01:52 No.101584537

Anonymous 07/26/24(Fri)16:01:52 No.101584537

>>101584478
I have 48gb vram I think I can run the 3 or 3.5 quant

Anonymous
07/26/24(Fri)16:03:01 No.101584555

Anonymous 07/26/24(Fri)16:03:01 No.101584555

>>101584204
>Ooba can load this if you update transformers
>pip install git+https://github.com/huggingface/transformers.git

Any chance for quantized Mistral-Nemo on ooba?

Anonymous
07/26/24(Fri)16:17:53 No.101584753

Anonymous 07/26/24(Fri)16:17:53 No.101584753

>>101584091
>My Ryzen 7 7840U with Radeon 780M graphics gets 3.51 T/s with Llama 3 8B and shit/s prompt processing speeds. Unless there's some pretty huge enhancements and also APU support for servers (for hundreds of GB of memory) then it's shit.
Is it faster than pure CPU inference? Have you compared it?

Anonymous
07/26/24(Fri)16:32:31 No.101584937

Anonymous 07/26/24(Fri)16:32:31 No.101584937

>>101579547 In my experience it's really good at understanding complex instructions and scenarios, but writes very sloppy and bland

Anonymous
07/26/24(Fri)16:33:11 No.101584950

Anonymous 07/26/24(Fri)16:33:11 No.101584950

Why did no one tell me that nemo is uncensored!

Anonymous
07/26/24(Fri)16:38:27 No.101585036

Anonymous 07/26/24(Fri)16:38:27 No.101585036

What about the old character.ai, anything close to it nowadays?

Anonymous
07/26/24(Fri)16:41:04 No.101585080

Anonymous 07/26/24(Fri)16:41:04 No.101585080

>>101585036
NeMo and Largestral

Anonymous
07/26/24(Fri)16:41:42 No.101585087

Anonymous 07/26/24(Fri)16:41:42 No.101585087

>>101585036
Nemo or nemo tunes. If you have a shit ton of vram then large mistral.

Anonymous
07/26/24(Fri)16:57:32 No.101585374

Anonymous 07/26/24(Fri)16:57:32 No.101585374

>As you approach, you get a better view of Carmilla's stunning body. She's lounging on the bed, her long, toned legs stretched out before her, her high-heeled boots clicking softly against the stone floor.
Why u be like this Nemo... And I am really into the shit it writes but then it has to be turn into a total fucking retard... It really is a true 13B.

Anonymous
07/26/24(Fri)17:00:02 No.101585413

Anonymous 07/26/24(Fri)17:00:02 No.101585413

>>101585374
>her high-heeled boots clicking softly against the stone floor.
Ain't no way. I have multi character interactions / sex positions working without error.

Anonymous
07/26/24(Fri)17:00:09 No.101585414

Anonymous 07/26/24(Fri)17:00:09 No.101585414

>>101585374
unironically skill issue
nemo's not had that kind of trouble on my end. make sure you're using that anon's configs from the last thread.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.