/g/ - /lmg/ - a general dedicated to the discussion and - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

07/18/24(Thu)23:24:24 No.101464048

File: 1 (2).png (574 KB, 573x837)

574 KB PNG

Anonymous 07/18/24(Thu)23:24:24 No.101464048 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101457504 & >>101449685

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/18/24(Thu)23:25:16 No.101464060

Anonymous 07/18/24(Thu)23:25:16 No.101464060

File: omg it not migu without miku.png (18 KB, 488x277)

18 KB PNG

>>101464048
>(embed)

Anonymous
07/18/24(Thu)23:26:41 No.101464073

Anonymous 07/18/24(Thu)23:26:41 No.101464073

BASED!
DEATH TO MIKU

Anonymous
07/18/24(Thu)23:29:55 No.101464109

Anonymous 07/18/24(Thu)23:29:55 No.101464109

it's over...

Anonymous
07/18/24(Thu)23:31:07 No.101464122

Anonymous 07/18/24(Thu)23:31:07 No.101464122

Hi!

I am getting very low tokens / second using 70b models on a new setup with 2 4090s. Midnight-Miqu 70b for example gets around 6 tokens / second using EXL2 at 4.0 bpw.

The 4-bit quantization in GGUF gets 0.2 tokens per second using KoboldCPP.

I got faster rates renting an A6000 (non-ada) on Runpod, so I'm not sure what's going wrong. Nvidia-SMI shows that the VRAM is near full on both cards, so I don't think half of it is running on the CPU.

Any advice is appreciated, thanks!

Anonymous
07/18/24(Thu)23:40:42 No.101464216

Anonymous 07/18/24(Thu)23:40:42 No.101464216

new Mistral is pretty good - thoughts?

Anonymous
07/18/24(Thu)23:44:10 No.101464243

Anonymous 07/18/24(Thu)23:44:10 No.101464243

>>101464216
Yeah I'll let you know in another 2 weeks when it has its kinks ironed out.

Anonymous
07/18/24(Thu)23:44:29 No.101464246

Anonymous 07/18/24(Thu)23:44:29 No.101464246

>>101464216
waiting for kcpp support

Anonymous
07/18/24(Thu)23:45:13 No.101464255

Anonymous 07/18/24(Thu)23:45:13 No.101464255

>>101464216
How long till non code mamba? Something around gemma 9B smarts with 256k context would be nice.

Anonymous
07/18/24(Thu)23:45:24 No.101464257

Anonymous 07/18/24(Thu)23:45:24 No.101464257

>>101464246
Run it with tabby?

Anonymous
07/18/24(Thu)23:46:49 No.101464267

Anonymous 07/18/24(Thu)23:46:49 No.101464267

Wait a sec. Guess I was not paying attention. Thought it was only a code version.
https://mistral.ai/news/mistral-nemo/

Anonymous
07/18/24(Thu)23:55:44 No.101464340

Anonymous 07/18/24(Thu)23:55:44 No.101464340

>try L3 8B Q8
>blows Q6 out of the fucking water
Why do retards keep pushing this quantization shit? Anyone compare the raw 16 bit floating point to the 8 bit one?

Anonymous
07/18/24(Thu)23:57:41 No.101464356

Anonymous 07/18/24(Thu)23:57:41 No.101464356

What is the best quant for a single 4090 with the new Gemma? I downloading Q5

Anonymous
07/18/24(Thu)23:58:18 No.101464362

Anonymous 07/18/24(Thu)23:58:18 No.101464362

>>101464257
Good idea actually, thanks.

Anonymous
07/18/24(Thu)23:58:22 No.101464363

Anonymous 07/18/24(Thu)23:58:22 No.101464363

>>101464356
The one that fits

Anonymous
07/18/24(Thu)23:59:11 No.101464374

Anonymous 07/18/24(Thu)23:59:11 No.101464374

>>101464340
Pretty sure everyone started to say quants sucked when L3 came out. More dense model = greater effect of quantization.

Anonymous
07/19/24(Fri)00:09:12 No.101464449

Anonymous 07/19/24(Fri)00:09:12 No.101464449

>>101464216
Did some basic tests using transformers and it seems decent so far, nothing that clearly shows it punching above its weight though.
Think the killer feature is going to be the high context. If that actually works it's a good model for low-end GPU text processing work.
Might hike P40 prices up even further since they can run a model this small at decent speed with plenty of room for context.

Anonymous
07/19/24(Fri)00:10:10 No.101464457

Anonymous 07/19/24(Fri)00:10:10 No.101464457

>>101464449
Kind of waiting for backends to support it but I've seen people saying its the first model to not go retarded at really high context.

Anonymous
07/19/24(Fri)00:18:40 No.101464514

Anonymous 07/19/24(Fri)00:18:40 No.101464514

>>101464363
And that is?

Anonymous
07/19/24(Fri)00:19:53 No.101464521

Anonymous 07/19/24(Fri)00:19:53 No.101464521

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101457504

--Mistral-Nemo is surprisingly good and doesn't dodge dick: >>101457731 >>101457918
--Higgs Llama V2 Chart: >>101460454 >>101460512 >>101460550
--Nemostral Confirmed for 2MW Status but Custom Tokenizer is a Challenge: >>101460679 >>101460733 >>101460748 >>101460767
--Running Mistral Nemo on a 24GB GPU with Compromises: >>101458100
--Mistral Nemo Model Not Yet Running Due to Lack of Support and Tokenization Issues: >>101460321 >>101460333
--Hackers Leak Disney Slack Messages in Protest of Their AI Usage: >>101460812 >>101460871
--Gigabyte's RAM Revolution: 24TB DDR5 and AMD EPYC's 48 DIMMs Spark CPUmax Dreams: >>101459814 >>101459870
--Gguf faster than exl2 for 3D models with llama.cpp CUDA: >>101460948 >>101461004 >>101461165 >>101461469
--GLM4 Issues Due to a GlennM9 Bug: Slack Chat Log: >>101458669
--DeepSeek-V2-Chat-0628: Impressions and GGUF Availability: >>101458547
--Purchasing an NVIDIA Quadro RTX 8000 48GB for the VRAM: >>101462297 >>101462376 >>101462489 >>101462559
--Japan: Content Used to Train AI Has No IP Rights: >>101463339
--Trump allies want to “Make America First in AI” with sweeping executive order: >>101463644 >>101463836
--Miku (free space): >>101457780

►Recent Highlight Posts from the Previous Thread: >>101457519

Anonymous
07/19/24(Fri)00:28:00 No.101464554

Anonymous 07/19/24(Fri)00:28:00 No.101464554

>>101464340
>>101464374
I didn't do any tests with lower quants, but there is likely practically no difference between Q8 and FP32 (BF16) even on Llama 3 8B. I did the KLD tests >>101243361, as well as manual testing >>101245221. If one believes that Q8 is significantly worse, they must provide proof. So far I have seen 3 posts make the claim but provide 0 proof. Meanwhile, it is much more likely that some mistake on the part of the user was made which degraded the quality, or the they were seeing things they wanted to believe, or it was simply a coincidence (low sample size), or a combination of all of these factors.

Anonymous
07/19/24(Fri)00:29:32 No.101464566

Anonymous 07/19/24(Fri)00:29:32 No.101464566

is there are any test of deepL LLM model yet?
it is only on their webpage and paid only pro plan,but i wonder how good the translation is
https://www.deepl.com/en/blog/next-gen-language-model

Anonymous
07/19/24(Fri)00:32:01 No.101464582

Anonymous 07/19/24(Fri)00:32:01 No.101464582

So mistral nemo... works on exl right now and nothing else right?

Anonymous
07/19/24(Fri)00:34:23 No.101464595

Anonymous 07/19/24(Fri)00:34:23 No.101464595

>>101464582
vLLM should also get it "out-of-the-box". llama.cpp and everything downstream of that is what's going to be waiting weeks

Anonymous
07/19/24(Fri)00:35:47 No.101464605

Anonymous 07/19/24(Fri)00:35:47 No.101464605

>>101464595
I couldn't get it to load via ooba in transformers earlier. But I think I just need to do a fresh rebuild of the conda environment I use for it. But I was running on transformers built from source so it should have worked.

Anonymous
07/19/24(Fri)00:39:39 No.101464630

Anonymous 07/19/24(Fri)00:39:39 No.101464630

>>101464605
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407#transformers
>NOTE: Until a new release has been made, you need to install transformers from source:
>pip install git+https://github.com/huggingface/transformers.git

Anonymous
07/19/24(Fri)00:42:46 No.101464652

Anonymous 07/19/24(Fri)00:42:46 No.101464652

>>101464630
That's what I did. I built from source. But I was still getting the tensor size issue. My ooba environment is pretty old and still running on python 3.10 I believe, so I don't know if making a new one on a later python version would help at all.

Anonymous
07/19/24(Fri)00:51:52 No.101464711

Anonymous 07/19/24(Fri)00:51:52 No.101464711

>>101464652
The exllama2 loader is compatible and worked for me when transformers didn't.
But I assume you already would have tried that if your GPU supported it.

Anonymous
07/19/24(Fri)00:53:13 No.101464723

Anonymous 07/19/24(Fri)00:53:13 No.101464723

>>101464711
Yeah it loaded just fine with exl2. I normally run ggufs but I'm not going to bother until it's on the main branch.

Anonymous
07/19/24(Fri)01:03:01 No.101464802

Anonymous 07/19/24(Fri)01:03:01 No.101464802

Any other good and updated benchmarks?
https://prollm.toqan.ai/leaderboard/coding-assistant
https://aider.chat/docs/leaderboards/
https://huggingface.co/spaces/mike-ravkine/can-ai-code-results

Anonymous
07/19/24(Fri)01:04:47 No.101464817

Anonymous 07/19/24(Fri)01:04:47 No.101464817

>128k
It will derail and lose coherence at 16k-24k tokens anyway, even corpo models can't avoid this.

Anonymous
07/19/24(Fri)01:06:55 No.101464836

Anonymous 07/19/24(Fri)01:06:55 No.101464836

>>101464817
This one doesn't. Mamba makes the difference apparently.

Anonymous
07/19/24(Fri)01:15:15 No.101464904

Anonymous 07/19/24(Fri)01:15:15 No.101464904

>>101464216
It'll most likely be the go-to for poorfags.

Anonymous
07/19/24(Fri)01:15:46 No.101464911

Anonymous 07/19/24(Fri)01:15:46 No.101464911

File: NalaMistralNemo.png (126 KB, 929x421)

126 KB PNG

Sloppiness aside, this is good at the Nala test in general let alone for its parameter count. Arthur confirmed as closet furry.
I did several pulls at different temperatures this is t=0.81. It was still coherent and less sloppy at t=1 and it was kind of dry but actually followed the prescribed formatting at t=0.3 as is recommended on the model card. In all 3 instances there was no anthropomorphism.

Anonymous
07/19/24(Fri)01:16:26 No.101464913

Anonymous 07/19/24(Fri)01:16:26 No.101464913

>>101464904
Isn't Gemma the go-to for poorfags? Or are they for two different levels of poor?

Anonymous
07/19/24(Fri)01:17:22 No.101464916

Anonymous 07/19/24(Fri)01:17:22 No.101464916

>>101464913
anyone who doesn't have at least 20 h100 clusters is poor and should be banned from this general

Anonymous
07/19/24(Fri)01:20:42 No.101464939

Anonymous 07/19/24(Fri)01:20:42 No.101464939

>>101464911
that is complete fucking garbage, you'd have to be blind to think that's good

Anonymous
07/19/24(Fri)01:22:21 No.101464952

Anonymous 07/19/24(Fri)01:22:21 No.101464952

>>101464939
You're mentally ill and nobody cares what you think.

Anonymous
07/19/24(Fri)01:22:32 No.101464954

Anonymous 07/19/24(Fri)01:22:32 No.101464954

>>101464913
No way man, gemma is like, a whole week or two old. That's fucking ancient, what are you an old man?

Anonymous
07/19/24(Fri)01:26:12 No.101464977

Anonymous 07/19/24(Fri)01:26:12 No.101464977

>>101464954
There is no reason to check /lmg/ more than once a week.

Anonymous
07/19/24(Fri)01:30:08 No.101465002

Anonymous 07/19/24(Fri)01:30:08 No.101465002

ST's Command-r preset results in garbage outputs, and I mean actual garbage data, switching between languages, mashed-together words, random single letters, you name it. Is it worth actually figuring out how to set it up considering how fucking slow C-r is compared to mixtral or should I just give up and go back to slop city?

Anonymous
07/19/24(Fri)01:37:43 No.101465072

Anonymous 07/19/24(Fri)01:37:43 No.101465072

>>101464911
Were you the guy the guy that supposedly got it working in ooba transformers last thread?

Anonymous
07/19/24(Fri)01:42:51 No.101465105

Anonymous 07/19/24(Fri)01:42:51 No.101465105

>>101465072
No, using exl2
Also observations with further testing.. it struggles a bit with the concept of possession. Mathstral had the same problem but it's not as bad as Mathstral.
Also I don't know if it's an exl specific issue but if a tavern card is pretty lengthy (about 2k tokens) it seems to completely break the fuck down. (this is at 8.0bpw)

Anonymous
07/19/24(Fri)01:43:33 No.101465110

Anonymous 07/19/24(Fri)01:43:33 No.101465110

>>101465002
I liked these better.
https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts/Command-R/v1.9
But Command-R isn't really that great now. Gemma 2 pretty much mogs it at smaller parameter count too.

Anonymous
07/19/24(Fri)01:44:09 No.101465116

Anonymous 07/19/24(Fri)01:44:09 No.101465116

>create a camera that follows the player in love2d
>deepseek fail 10+attempts
>llama 70b fail 10+attempts

>turn on lunaris 7b q4 as a joke
>suceeds first try
>??????????????????????

the code for reference:
function love.load()
love.window.setMode(1600,1000,{resizable=true})
-- Load the slime character image
img = love.graphics.newImage('slime.png')
background = love.graphics.newImage('test.png')

-- Initialize the character's position and dimensions
characterX, characterY, characterWidth, characterHeight = 0, 0, img:getWidth(), img:getHeight()

-- Set the camera offset (distance from slime to camera center)

screenWidth, screenHeight = love.graphics.getDimensions()

end

function love.keypressed(key)
if key == "w" then
characterY = characterY - 30
elseif key == "s" then
characterY = characterY + 30
elseif key == "a" then
characterX = characterX - 30
elseif key == "d" then
characterX = characterX + 30
end
end
function love.update(dt)
-- Update camera position based on slime
cameraX = characterX
cameraY = characterY
end

function love.draw()
-- Center the camera to avoid clipping the slime
love.graphics.translate(-cameraX + screenWidth/2, -cameraY + screenHeight/2)
love.graphics.draw(background)
-- Draw the game world (background, objects, etc.)
-- (Assuming your drawing code is in this function)

-- Draw the slime at the correct position
love.graphics.draw(img, characterX, characterY)
end

i only added the img and background in love.load idk why i dident even need to do that anyways black magic folks

Anonymous
07/19/24(Fri)01:52:23 No.101465162

Anonymous 07/19/24(Fri)01:52:23 No.101465162

File: 1694556554931922.gif (2.24 MB, 378x419)

2.24 MB GIF

been out of the loop for months now
what's a good local model currently for a 16GB VRAM card?

Anonymous
07/19/24(Fri)01:53:25 No.101465167

Anonymous 07/19/24(Fri)01:53:25 No.101465167

>>101465162
>moot
Who?

Anonymous
07/19/24(Fri)01:54:11 No.101465173

Anonymous 07/19/24(Fri)01:54:11 No.101465173

Now would I daily-drive Mistral-Nemo? Probably not... But it's definitely a decent option for VRAMlets.

Anonymous
07/19/24(Fri)01:54:53 No.101465183

Anonymous 07/19/24(Fri)01:54:53 No.101465183

>>101465167
dunno, sounds like a gardening tool
>hey can you pass me over the moot?

Anonymous
07/19/24(Fri)01:55:20 No.101465187

Anonymous 07/19/24(Fri)01:55:20 No.101465187

>>101465173
Working 128K context is really nice though.

Anonymous
07/19/24(Fri)02:03:08 No.101465239

Anonymous 07/19/24(Fri)02:03:08 No.101465239

>>101464554
>>101464374
>>101464340
OK so I just did a KLD test for Q6_K L3 8B.

====== Perplexity statistics ======
Mean PPL(Q) : 7.083942 ± 0.050761
Mean PPL(base) : 7.128723 ± 0.051077
Cor(ln(PPL(Q)), ln(PPL(base))): 99.58%
Mean ln(PPL(Q)/PPL(base)) : -0.006302 ± 0.000660
Mean PPL(Q)/PPL(base) : 0.993718 ± 0.000656
Mean PPL(Q)-PPL(base) : -0.044781 ± 0.004702

====== KL divergence statistics ======
Mean KLD: 0.017828 ± 0.000251
Maximum KLD: 13.386079
99.9% KLD: 0.907341
99.0% KLD: 0.192563
99.0% KLD: 0.192563
Median KLD: 0.005415
10.0% KLD: 0.000041
5.0% KLD: 0.000007
1.0% KLD: 0.000000
Minimum KLD: -0.000020

====== Token probability statistics ======
Mean Δp: 0.126 ± 0.011 %
Maximum Δp: 95.254%
99.9% Δp: 36.863%
99.0% Δp: 12.510%
95.0% Δp: 5.045%
90.0% Δp: 2.781%
75.0% Δp: 0.477%
Median Δp: 0.000%
25.0% Δp: -0.403%
10.0% Δp: -2.494%
5.0% Δp: -4.551%
1.0% Δp: -10.917%
0.1% Δp: -25.308%
Minimum Δp: -94.447%
RMS Δp : 4.004 ± 0.042 %
Same top p: 94.781 ± 0.059 %

I'm not going to do the manual creativity-based test, as I believe it'd likely agree with these numbers, as Q8's testing did. If someone has significantly worse output at Q6 compared to Q8, proof would be appreciated, since this suggests that it's not significantly different.

Anonymous
07/19/24(Fri)02:07:07 No.101465263

Anonymous 07/19/24(Fri)02:07:07 No.101465263

>>101465239
>memeplexity

Anonymous
07/19/24(Fri)02:09:35 No.101465279

Anonymous 07/19/24(Fri)02:09:35 No.101465279

>>101465263
Yes, that's why they made a KLD test, so they could have a better measure of the effect of quants.

Anonymous
07/19/24(Fri)02:09:50 No.101465281

Anonymous 07/19/24(Fri)02:09:50 No.101465281

>>101464911
>She she her she her she she she she
Holy kino

Anonymous
07/19/24(Fri)02:10:48 No.101465287

Anonymous 07/19/24(Fri)02:10:48 No.101465287

>>101464652
It works for me on python 3.10.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)02:13:09 No.101465301

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)02:13:09 No.101465301

>>101463339
Germany has a similar law where data mining (under which dataset collection falls) is explicitly allowed unless there is a machine-readable opt-out.
However, Aleph Alpha is reportedly still falling behind.
Having massive amounts of compute and training data (the latter of which you just keep secret) seems to be what's important right now.

>>101464122
Just to make sure, you are setting --n-gpu-layers or however koboldcpp calls it, right?
And if you are on Winblows you have the automatic VRAM swapping "feature" disabled, yes?

Anonymous
07/19/24(Fri)02:14:04 No.101465305

Anonymous 07/19/24(Fri)02:14:04 No.101465305

>>101465281
Fuck.

Anonymous
07/19/24(Fri)02:15:08 No.101465313

Anonymous 07/19/24(Fri)02:15:08 No.101465313

File: 42a12c8ffcb5ad2902a028711(...).jpg (51 KB, 607x600)

51 KB JPG

>>101465281
TOTAL
LM
LOVE

https://www.youtube.com/watch?v=ajjdY070VU4

Anonymous
07/19/24(Fri)02:17:22 No.101465322

Anonymous 07/19/24(Fri)02:17:22 No.101465322

>>101465281
I've found I don't find a barrage of 'she' annoying these days if it's followed by some good meaty unexpected stuff instead of llm 'shivers' lego blocks.

Anonymous
07/19/24(Fri)02:23:36 No.101465353

Anonymous 07/19/24(Fri)02:23:36 No.101465353

>>101465301
>if you are on Winblows
What does your development environment look like on Linux?
Main thing holding me back from switching entirely is Visual Studio, I can't imagine trying to be productive in C++ without it.
Do you use CLion, some text editor with plugins, or is there something else?

Anonymous
07/19/24(Fri)02:25:43 No.101465366

Anonymous 07/19/24(Fri)02:25:43 No.101465366

What would a better lm even look like? Just Claude 3.5 Opus? What's the endgame

Anonymous
07/19/24(Fri)02:29:35 No.101465393

Anonymous 07/19/24(Fri)02:29:35 No.101465393

>>101464048
Oh that's Mashu, how cute. I still use her character card since the pygmalion era and test models with her chat. She's always reliable in calling me senpai.

Anonymous
07/19/24(Fri)02:40:20 No.101465463

Anonymous 07/19/24(Fri)02:40:20 No.101465463

Am I retarded? Trying to use tabbyAPI but it keeps saying out of memory. I even tried loading the 12GB model in 4 bit on my 3090. Still out of memory when trying to load it.

I edited the config model name to the correct model.

Anonymous
07/19/24(Fri)02:42:42 No.101465482

Anonymous 07/19/24(Fri)02:42:42 No.101465482

>>101465353
bro...

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)02:45:02 No.101465500

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)02:45:02 No.101465500

>>101465353
On my home desktop machine I am running Manjaro.
I use Doom Emacs with the default C/C++/CUDA plugin to edit files, just the terminal (zsh) for compilation, testing, and debugging (via gdb).
When I am at the office I usually SSH into my desktop and run Emacs via the terminal.
I synchronize branches between machines via git, other files via scp, and use (neo)vim if I need to quickly edit files from the terminal.

Anonymous
07/19/24(Fri)02:48:50 No.101465521

Anonymous 07/19/24(Fri)02:48:50 No.101465521

What options do I have for powering the V100 system? I reached out to core4, who seem to sell disassembled servers, and asked if they had one that wasn't taken apart already. (They do). They told me
> The server plugs in directly to the bus in the OCP rack (this is why the rack is required to operate). No standard PSU.
Where can I buy a rack that fits this description? maybe im just too stoopid, but on ebay searching "ocp rack" just brings basically no results.

Anonymous
07/19/24(Fri)02:54:44 No.101465570

Anonymous 07/19/24(Fri)02:54:44 No.101465570

How the hell do you get tabbyapi to work? No matter what size of model you use it says cudo out of memory.

Anonymous
07/19/24(Fri)02:55:24 No.101465578

Anonymous 07/19/24(Fri)02:55:24 No.101465578

>>101465002
Make sure to set min-p to at least 0.007. With neutral samplers you will get garbage from Command-R like it's broken. The min-p I'm using is 0.04.

If you're using Cohere's web API with only top-k and top-p, setting top-k to 80 and top-p to 0.82 gives coherent results.

Anonymous
07/19/24(Fri)02:56:04 No.101465583

Anonymous 07/19/24(Fri)02:56:04 No.101465583

so, can nemo milk my dick like crazy? should i bother with tabby?

Anonymous
07/19/24(Fri)02:57:03 No.101465592

Anonymous 07/19/24(Fri)02:57:03 No.101465592

>>101465570
paste the tabby console output

Anonymous
07/19/24(Fri)02:59:27 No.101465615

Anonymous 07/19/24(Fri)02:59:27 No.101465615

>>101465301
does Lmstudio contribute to llama.cpp code /give you guys some cut or they just wrap your stuff then show the finger?

Anonymous
07/19/24(Fri)03:00:25 No.101465623

Anonymous 07/19/24(Fri)03:00:25 No.101465623

>>101465615
the finger is industry standard practice in silicon valley

Anonymous
07/19/24(Fri)03:00:45 No.101465624

Anonymous 07/19/24(Fri)03:00:45 No.101465624

>>101465301
how to enable lookup and look-ahead decoding in llama.cpp?

Anonymous
07/19/24(Fri)03:01:24 No.101465632

Anonymous 07/19/24(Fri)03:01:24 No.101465632

File: Tabby.png (67 KB, 1307x971)

67 KB PNG

>>101465592
This is 6 bitsperweight on a 3090 with nothing else open

Anonymous
07/19/24(Fri)03:02:56 No.101465653

Anonymous 07/19/24(Fri)03:02:56 No.101465653

>>101465301
Tensor parallel split wen?
trainer wen?
Jamba/mamba wen?

Anonymous
07/19/24(Fri)03:05:46 No.101465677

Anonymous 07/19/24(Fri)03:05:46 No.101465677

>>101465632
ah, I tried a couple of exl2 quants and one defaulted to 1 000 000 ctx or something ridiculous like that. that's why you get oom after loading the model successfully.
set context manually in the tabby config file. 32k should be fine depending on what else is running on your GPU.

Anonymous
07/19/24(Fri)03:06:05 No.101465682

Anonymous 07/19/24(Fri)03:06:05 No.101465682

>>101465187
Let it pass the RULER first

Anonymous
07/19/24(Fri)03:08:42 No.101465705

Anonymous 07/19/24(Fri)03:08:42 No.101465705

>>101465623
Lmstudio is (((5 guys))) from Brooklyn

Anonymous
07/19/24(Fri)03:09:54 No.101465720

Anonymous 07/19/24(Fri)03:09:54 No.101465720

>>101465677
That fixed it, thx.

Anonymous
07/19/24(Fri)03:10:56 No.101465729

Anonymous 07/19/24(Fri)03:10:56 No.101465729

>>101465500
>I use Doom Emacs with the default C/C++/CUDA plugin to edit files, just the terminal (zsh) for compilation, testing, and debugging (via gdb).
That's what I was afraid of. I always get the impression that these sorts of setups must only be tolerable for someone that never used an IDE.
Not that I don't know how to use the terminal, it's just not as efficient a workflow.
Still, I will give Doom Emacs a try. Thank you.

Anonymous
07/19/24(Fri)03:11:14 No.101465735

Anonymous 07/19/24(Fri)03:11:14 No.101465735

Niitama is pretty good, but the smut is very samey, but I guess that's something with every model, eh?
Also, anatomy and spatial awareness is lacking, but at least it's fast as hell, even on my potato.
It's super horny, though. The leadup for smut cards is maybe 2 paragraphs until folds are being explored. God damn.
Guess I should turn down the temperature or something?

Anonymous
07/19/24(Fri)03:13:13 No.101465750

Anonymous 07/19/24(Fri)03:13:13 No.101465750

>>101465735
>use half-assed shilled slop model
>complain about slop model problems
>/lmg/

Anonymous
07/19/24(Fri)03:14:02 No.101465760

Anonymous 07/19/24(Fri)03:14:02 No.101465760

>>101465750
Yeah, I got what I deserved I guess.
What would you recommend, instead?

Anonymous
07/19/24(Fri)03:16:50 No.101465792

Anonymous 07/19/24(Fri)03:16:50 No.101465792

>>101465760
Looking up Niitama says it's 8B. Use either Meta's Instruct or Gemma 9B and learn how to prompt.

Anonymous
07/19/24(Fri)03:23:08 No.101465853

Anonymous 07/19/24(Fri)03:23:08 No.101465853

How was it in the past, might they still release something before the weekend? Now that gemma is working I want something better.

Anonymous
07/19/24(Fri)03:23:34 No.101465859

Anonymous 07/19/24(Fri)03:23:34 No.101465859

Mistral NeMo 12B exl2 @ 8.0bpw and 128000 tokens of context fits comfortably in 23.4 GB of VRAM. We are so back 3090 kings.

Anonymous
07/19/24(Fri)03:26:06 No.101465894

Anonymous 07/19/24(Fri)03:26:06 No.101465894

>>101465463
The context also takes VRAM. If it's a model with long context, or you set it with a long context, it's going to OOM. So try lowering the context.

Anonymous
07/19/24(Fri)03:26:24 No.101465897

Anonymous 07/19/24(Fri)03:26:24 No.101465897

>>101465859
Worth it over Gemma?

Anonymous
07/19/24(Fri)03:27:22 No.101465906

Anonymous 07/19/24(Fri)03:27:22 No.101465906

>>101465760
You can't expect any better at that size, and no, Gemma is much, much worse for your purpose

Anonymous
07/19/24(Fri)03:29:40 No.101465928

Anonymous 07/19/24(Fri)03:29:40 No.101465928

>>101465853
NTA but im really liking new mistral's writing style so far and it seems smart at least in my story. Going to test how well it handles 100k context.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)03:32:33 No.101465969

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)03:32:33 No.101465969

>>101465615
I am not aware of any monetary or upstream code contributions by LMStudio.
I don't know whether they have business relations with ggml.ai though.

>>101465624
Right now it is effectively not possible to use except for PoC examples that you can run via the terminal.
I originally wanted to enable n-gram lookup decoding for the server but then LLaMA 3 and other models with much larger vocabulary sizes dropped and the speedup became much worse.
Right now with the better MMQ kernels evaluating a batch of 64 tokens only takes ~2x as long as a batch of 1 token so I'm working on n-gram lookup decoding that creates a tree of sequences rather than only a single sequence.
I think this could be a ~2-4x increase in the rate at which you can generate tokens for a single user.

>>101465653
>Tensor parallel split wen?
Right now with --split-mode row .
But it needs a lot more optimization.

>trainer wen?
After I am satisfied with my lookup decoding implementation I will start working on training.
My goal is to have something usable this year.

>Jamba/mamba wen?
Don't know.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)03:33:57 No.101465990

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)03:33:57 No.101465990

>>101465729
Two of the main factors for me when choosing tools is that I need to be able to reproduce things and that I want to use the keyboard instead of the mouse if at all possible.
So generally speaking CLI tools fit my needs much better than tools with a GUI because everything gets documented in ~/.zsh_history.
Though there is a big difference in usability between something like vanilla bash and zsh with a bunch of plugins (jump to directory, autocomplete commands from previous user inputs, etc.).

The reason I use Doom Emacs is similarly that if you already know Vim keybindings and want to use the keyboard for everything then it's a nice experience.

Regardless of what tools you use on Linux, I'd recommend you look at KDE virtual desktops (or the equivalent for other DEs).
I have a terminal, web browser, email client, etc. on different virtual desktops and have set shortcuts CTRL+F1 - CTRL+F12 to jump to a specific virtual desktop; it's a thousand times better than alt-tabbing.
Also for setups with multiple monitors I have shortcuts to set the focus on a specific monitor.

Anonymous
07/19/24(Fri)03:41:50 No.101466081

Anonymous 07/19/24(Fri)03:41:50 No.101466081

Hmm, new mistral seems to work well with chatML formatting and super simple:

system
user
model

prefixes

And so far its really really good. Using 1 temp and I really like its writing so far.

Anonymous
07/19/24(Fri)03:43:30 No.101466102

Anonymous 07/19/24(Fri)03:43:30 No.101466102

>>101464521
>-Trump allies want to “Make America First in AI” with sweeping executive order
>Eliminate "uncecessary and burdensome regulations"
HA, FUCK YOU SAM! Good luck trying to stall AI progress for your own financial gain you unbelievable prick.

Anonymous
07/19/24(Fri)03:50:25 No.101466199

Anonymous 07/19/24(Fri)03:50:25 No.101466199

surely nemo will fit on my 12gb coomlet rig

Anonymous
07/19/24(Fri)03:54:14 No.101466251

Anonymous 07/19/24(Fri)03:54:14 No.101466251

>>101466081
They officialy recomend low tempeture around 0.3

Anonymous
07/19/24(Fri)03:54:22 No.101466254

Anonymous 07/19/24(Fri)03:54:22 No.101466254

>>101466102
Wait, it's being said that Trump plans to restrict sales of nvidia cards. What if it's a ploy to get them to make cards with more vram and Trump is secretly trying to help us GPUMax

Anonymous
07/19/24(Fri)03:55:02 No.101466268

Anonymous 07/19/24(Fri)03:55:02 No.101466268

File: 1695202519985890.gif (133 KB, 340x340)

133 KB GIF

>>101464048
>>101443596
>AI pajeets are too dumb to use the subject field

Anonymous
07/19/24(Fri)03:55:51 No.101466282

Anonymous 07/19/24(Fri)03:55:51 No.101466282

>>101466254
Are you sure it's not to simply restrict sales of cards to China? I am pretty sure we are already doing that.

Anonymous
07/19/24(Fri)03:57:32 No.101466306

Anonymous 07/19/24(Fri)03:57:32 No.101466306

>>101466251
for non creative use prob.

Anonymous
07/19/24(Fri)03:59:29 No.101466330

Anonymous 07/19/24(Fri)03:59:29 No.101466330

>>101466251
Official temperature recommendations are always on the low side because for corporate bullshit you want low temperature. Predictable and boring are benefits when you're polluting the web with AI-generated news articles or running a customer service chatbot that could have been a series of menus or better yet a FAQ with some hyperlinks.

Anonymous
07/19/24(Fri)04:00:10 No.101466343

Anonymous 07/19/24(Fri)04:00:10 No.101466343

>>101466306
>>101466330
Thanks anons.

Anonymous
07/19/24(Fri)04:01:17 No.101466359

Anonymous 07/19/24(Fri)04:01:17 No.101466359

>>101465239
Petra sisters... our response?

Anonymous
07/19/24(Fri)04:02:30 No.101466376

Anonymous 07/19/24(Fri)04:02:30 No.101466376

Ok, new mistral is amazing.
Great writing style, not censored whatsoever, smart yet horny when you ask it to, good anatomy / positional understanding, passes the non human anatomy test, and I tested it with some stuff I had around 30k ish context and its working great so far. Mistral did it.

Anonymous
07/19/24(Fri)04:03:15 No.101466392

Anonymous 07/19/24(Fri)04:03:15 No.101466392

>>101466282
I'm tryna conspiracy theory here, fuck off with your logic.

In other news, new mistral still has refusals and whines about not doing that kind of thing and here's a help number to call. Retry a few times though and it givens in.

Anonymous
07/19/24(Fri)04:04:01 No.101466403

Anonymous 07/19/24(Fri)04:04:01 No.101466403

>>101464521
>Gguf faster than exl2 for 3D models
...

Anonymous
07/19/24(Fri)04:06:18 No.101466429

Anonymous 07/19/24(Fri)04:06:18 No.101466429

>>101466403
>he doesn't know llama.cpp can load .3ds models

Anonymous
07/19/24(Fri)04:09:32 No.101466463

Anonymous 07/19/24(Fri)04:09:32 No.101466463

>>101466376
what mistral is the new mistral
sorry, just woke up

Anonymous
07/19/24(Fri)04:12:32 No.101466497

Anonymous 07/19/24(Fri)04:12:32 No.101466497

File: Tests.png (368 KB, 1901x1650)

368 KB PNG

Usual tests with new mistral.

>>101466463
new 12B mumba mistral came out. Only works with exlamma / tabbyapi atm. 128K context

https://huggingface.co/turboderp/Mistral-Nemo-Base-12B-exl2

Anonymous
07/19/24(Fri)04:12:58 No.101466509

Anonymous 07/19/24(Fri)04:12:58 No.101466509

A question has been wandering my mind for the past couple days, and it has something to do with benchmarks. Benchmarks, so I think, are a way to automatically and deterministically measure an LLM's performance in certain tasks. But how does the answer of an LLM actually get judged? In Math, Trivia or Code benchmarks its quite simple. The model either gets the answer right, or it doesn't (Which would mean the code doesn't work, the math doesn't check out, or Rome is the capital of france). But for something like reasoning tests you would need another intelligent entity to judge it, like an LLM, which makes it undeterministic, because different people are gonna use different LLMs for benchmarking. So how the heck can we compare MMLU scores or something? Is there some black magic going on or am I missing a critical piece of information?

Anonymous
07/19/24(Fri)04:13:59 No.101466527

Anonymous 07/19/24(Fri)04:13:59 No.101466527

>>101466497
And before anyone says it kept saying "her" take into account this is with no rep pen at all and a pretty low temp. Also might not even be the right formatting, just chatML template with user / model / system prefixes

Anonymous
07/19/24(Fri)04:23:25 No.101466684

Anonymous 07/19/24(Fri)04:23:25 No.101466684

>>101466497
>a shiver running down her spine
Stopped reading there

Anonymous
07/19/24(Fri)04:24:26 No.101466703

Anonymous 07/19/24(Fri)04:24:26 No.101466703

>>101466684
I hate to break it to you but that is going to stay with us forever. Its just a common trope in writing.

Anonymous
07/19/24(Fri)04:25:42 No.101466721

Anonymous 07/19/24(Fri)04:25:42 No.101466721

>>101466497
Is base better than instruct?

Anonymous
07/19/24(Fri)04:27:49 No.101466755

Anonymous 07/19/24(Fri)04:27:49 No.101466755

>>101466721
For creative writing yes. Don't tell me you use assistant tunes for that? It usually taints the writing with its shitty assistant / ai persona. / Gives a positive bias.

Anonymous
07/19/24(Fri)04:28:01 No.101466757

Anonymous 07/19/24(Fri)04:28:01 No.101466757

>>101466684
>stheno rarely writes shivers or repeats itself but its retarded..

this world is cruel

Anonymous
07/19/24(Fri)04:29:56 No.101466794

Anonymous 07/19/24(Fri)04:29:56 No.101466794

>>101466755
I know what you mean, but being able to explicitly instruct is just more convenient so I haven't used base models in a while.

Anonymous
07/19/24(Fri)04:31:32 No.101466820

Anonymous 07/19/24(Fri)04:31:32 No.101466820

>>101466392
>t. shitbull baby guro rape anon
no seriously, what are you even prompting? goddamn help numbers?? zero refusals here.

Anonymous
07/19/24(Fri)04:34:23 No.101466854

Anonymous 07/19/24(Fri)04:34:23 No.101466854

Does mistral plan to do a nemo-based mixtral?

Anonymous
07/19/24(Fri)04:36:02 No.101466879

Anonymous 07/19/24(Fri)04:36:02 No.101466879

The new Mistral 12B is insane. Testing it in FP16 via transformers in ooba (you have to build the new version of transformers yourself atm but it's easy). It's quite a bit smarter than Gemma 2 27B. I can't believe it.

Anonymous
07/19/24(Fri)04:37:47 No.101466915

Anonymous 07/19/24(Fri)04:37:47 No.101466915

>>101466497
newfag here, where can i find those cards?

Anonymous
07/19/24(Fri)04:38:02 No.101466917

Anonymous 07/19/24(Fri)04:38:02 No.101466917

>>101466879
Wanted to let others test it themselves first. Yea, feels really smart and creative. Has mixtrals thing with being super good at formatting / instruction following without its dryness. Also the context is the real deal. Fitting 100K on 24GB and still having room for windows / browser with hardware accel is nice.

Anonymous
07/19/24(Fri)04:39:57 No.101466951

Anonymous 07/19/24(Fri)04:39:57 No.101466951

>>101466915
Blank is literally just blank, niggerhater is just a single line card that he hates niggers and cant help but talk about them and emily:

https://files.catbox.moe/uty0jp.png

Anonymous
07/19/24(Fri)04:40:04 No.101466957

Anonymous 07/19/24(Fri)04:40:04 No.101466957

just two more weeks for kcpp nemo 12b support...

Anonymous
07/19/24(Fri)04:40:09 No.101466961

Anonymous 07/19/24(Fri)04:40:09 No.101466961

>>101466820
Probably suicide. That's the only thing I've ever seen give me help numbers in e.g. Llama 3.

Interestingly with Llama 3 if I start by mentioning I'm Canadian and in Canada MAID (Medical Assistance in Dying) has been legal since 2021 for basically anyone and not just the terminally ill it will then happily discuss whether I should kill myself. Of course I'm not really a filthy Canadian.

Anonymous
07/19/24(Fri)04:40:46 No.101466971

Anonymous 07/19/24(Fri)04:40:46 No.101466971

>>101466879
How do we know it's not just GPTslop?

Anonymous
07/19/24(Fri)04:41:34 No.101466979

Anonymous 07/19/24(Fri)04:41:34 No.101466979

>>101466917
In 8 or 16 bit?

Anonymous
07/19/24(Fri)04:41:59 No.101466984

Anonymous 07/19/24(Fri)04:41:59 No.101466984

>>101464048
Any good local models that can translate from Japanese to English?

Anonymous
07/19/24(Fri)04:42:19 No.101466989

Anonymous 07/19/24(Fri)04:42:19 No.101466989

>>101466971
You either try it or wait for other peoples opinions when the more common backends support it in 2 weeks.

>>101466979
its trained in 8 bit natively.

"Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss."

Anonymous
07/19/24(Fri)04:43:47 No.101467011

Anonymous 07/19/24(Fri)04:43:47 No.101467011

Going by the guide in the /lmg/ general I'll only be able to run a 4-but 7b on my computer. I have 8GB VRAM and 16GB RAM
Is that good enough for some lewd chatting? I'd like to some fictional characters.

Anonymous
07/19/24(Fri)04:44:26 No.101467024

Anonymous 07/19/24(Fri)04:44:26 No.101467024

What's the future of local models?
Small models that perform really well or new quantizing/compressing techniques for giant models?

Anonymous
07/19/24(Fri)04:44:53 No.101467032

Anonymous 07/19/24(Fri)04:44:53 No.101467032

>>101466961
ah of course, that's something any model likely gets trained to hard refuse no matter the use case.

Anonymous
07/19/24(Fri)04:45:05 No.101467034

Anonymous 07/19/24(Fri)04:45:05 No.101467034

>>101467024
bitnet

Anonymous
07/19/24(Fri)04:45:58 No.101467050

Anonymous 07/19/24(Fri)04:45:58 No.101467050

>>101467011
You could run G2 9B at Q6 without offloading
Or if you want to offload you could try the new Mixtral NeMo 12B at Q8 and report results back

Anonymous
07/19/24(Fri)04:47:14 No.101467073

Anonymous 07/19/24(Fri)04:47:14 No.101467073

File: 445645645515.png (36 KB, 920x812)

36 KB PNG

Alright, since everyone is parroting that 12B is better, can it pass the knowledge test? This is a simple test that if it passes then it might be cloud shit tier and I will believe you. Pic rel is gemma 27B after a few tries, it still got the character wrong keep in mind.

>What is "Die monster, you don't belong in this world!" From?

Anonymous
07/19/24(Fri)04:47:21 No.101467074

Anonymous 07/19/24(Fri)04:47:21 No.101467074

File: file.png (341 KB, 1480x1764)

341 KB PNG

A "Write a story of a loli giving me head" test with Mistral Nemo.

Anonymous
07/19/24(Fri)04:47:51 No.101467080

Anonymous 07/19/24(Fri)04:47:51 No.101467080

>>101465990
Everything in Visual Studio has a shortcut or can be assigned one, so there isn't much where I'm forced to use the mouse. But I'll admit it lacks that sort of command history documentation.
i3wm covers the virtual desktops and jumping between monitors for me quite well on Linux. Though even Windows has that now.
The shell plugins are still something I'm lacking. I just found and installed z for directory jumping thanks to you and I think that will make it much more bearable.
Last time I tried using Midnight Commander to navigate the filesystem. It was an improvement over cding even with autocomplete, but still too cumbersome.

Anonymous
07/19/24(Fri)04:49:59 No.101467114

Anonymous 07/19/24(Fri)04:49:59 No.101467114

>>101467074
>loli
>16
stopped reading here

Anonymous
07/19/24(Fri)04:51:37 No.101467147

Anonymous 07/19/24(Fri)04:51:37 No.101467147

File: 4455488466556.png (129 KB, 862x728)

129 KB PNG

>>101467073
Also simple IQ test. Pic rel is Gemma 27B.

 Consider the following three statements:

I. Those who like paintings like flowers.

II. Those who like running like music.

III. Those who do not like music do not like flowers.

If these are all true, which of the following statements must be true? Choose all that apply and explain your reasoning.

Those who like running like flowers.

Those who like paintings like music.

Those who like flowers do not like running.

Those who like running do not like paintings.

Those who like paintings like running.

Correct answer is just the first choice.

Anonymous
07/19/24(Fri)04:52:15 No.101467160

Anonymous 07/19/24(Fri)04:52:15 No.101467160

>>101467050
>G2 9B
I'm not seeing that in the LLM List. I'm looking at https://wikia.schneedc.com/
What does Q# mean?

Anonymous
07/19/24(Fri)04:52:19 No.101467161

Anonymous 07/19/24(Fri)04:52:19 No.101467161

File: Screenshot 2024-07-19 095153.jpg (26 KB, 878x124)

26 KB JPG

>>101467073
flying colours

Anonymous
07/19/24(Fri)04:52:58 No.101467171

Anonymous 07/19/24(Fri)04:52:58 No.101467171

File: chrome_UCOs9ZrU5M.png (10 KB, 309x149)

10 KB PNG

I have a new GPU qith 16GB vram, what's the best models I can run now?

Pic related is what I used with my old GPU

Anonymous
07/19/24(Fri)04:54:13 No.101467188

Anonymous 07/19/24(Fri)04:54:13 No.101467188

File: It dosent know.png (25 KB, 1296x309)

25 KB PNG

>>101467073

Anonymous
07/19/24(Fri)04:55:41 No.101467218

Anonymous 07/19/24(Fri)04:55:41 No.101467218

>>101467147
>first choice.
Second*

>>101467188
Is this a L2 13B? kek

Anonymous
07/19/24(Fri)04:55:53 No.101467224

Anonymous 07/19/24(Fri)04:55:53 No.101467224

>>101467073
quote the "everyone" posts that said a small 12B model had better fringe knowledge than 27B so we all can laugh at them.

Anonymous
07/19/24(Fri)04:55:54 No.101467226

Anonymous 07/19/24(Fri)04:55:54 No.101467226

File: Isthisis.png (53 KB, 1907x337)

53 KB PNG

>>101467073
>>101467188
Woops, that was base model. Is this right?

Anonymous
07/19/24(Fri)04:56:39 No.101467237

Anonymous 07/19/24(Fri)04:56:39 No.101467237

For people testing 12B with transformers: it's slightly smarter using FP16 rather than BF16 for some reason (deterministic settings, so not sampling randomness).
It was failing the Sally test with ooba's default settings which have bf16 checked, when I unchecked bf16 and reloaded the weights all its answers were now subtly different and it was passing stuff where it failed before.

Anonymous
07/19/24(Fri)04:58:20 No.101467263

Anonymous 07/19/24(Fri)04:58:20 No.101467263

File: Isthisis2.png (96 KB, 1293x765)

96 KB PNG

>>101467073
>>101467188
>>101467226
It seems pretty confident all the way ill 0.7 temp, is it correct?

Anonymous
07/19/24(Fri)04:59:32 No.101467277

Anonymous 07/19/24(Fri)04:59:32 No.101467277

>Q6_K, 10.1gb
how much k's of context does that leave from my 12gb?

Anonymous
07/19/24(Fri)04:59:53 No.101467286

Anonymous 07/19/24(Fri)04:59:53 No.101467286

>incredibly smart small model drops
>thread disappears up its own ass testing retarded trivia knowledge instead of intelligence
never change, /lmg/

Anonymous
07/19/24(Fri)05:00:12 No.101467294

Anonymous 07/19/24(Fri)05:00:12 No.101467294

>>101467237
BF16 is garbage. It uses half its bits for the exponent. Only reason to use it is fast conversion to and from IEEE 754 float32.

Anonymous
07/19/24(Fri)05:00:36 No.101467300

Anonymous 07/19/24(Fri)05:00:36 No.101467300

Echidna-13B-v0.3-GPTQ is the best uncensored model for 16Gb vram?

Anonymous
07/19/24(Fri)05:00:43 No.101467302

Anonymous 07/19/24(Fri)05:00:43 No.101467302

>>101467286
Give me questions. I usually just use them for creative writing.

Anonymous
07/19/24(Fri)05:01:06 No.101467310

Anonymous 07/19/24(Fri)05:01:06 No.101467310

File: 6544894844.png (52 KB, 1807x382)

52 KB PNG

>>101467263
No, only cloud shit gets it completely right unfortunately.

Anonymous
07/19/24(Fri)05:01:44 No.101467320

Anonymous 07/19/24(Fri)05:01:44 No.101467320

>>101467294
"Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss."

Isn't it meant to run at 8 bit instead of 16?

Anonymous
07/19/24(Fri)05:02:40 No.101467335

Anonymous 07/19/24(Fri)05:02:40 No.101467335

File: file.png (161 KB, 822x508)

161 KB PNG

What went wrong?

Anonymous
07/19/24(Fri)05:03:26 No.101467348

Anonymous 07/19/24(Fri)05:03:26 No.101467348

>>101467335
{INST}

Is that the right formatting?

Anonymous
07/19/24(Fri)05:06:42 No.101467405

Anonymous 07/19/24(Fri)05:06:42 No.101467405

>>101467286
happens after every release. it's free (you)'s.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)05:09:34 No.101467443

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)05:09:34 No.101467443

>>101467080
When we're already on the topic of general CLI tools, take a look at tldr (gives you examples of the most common use cases for commands), as well as these https://github.com/zsh-users/zsh-syntax-highlighting https://github.com/zsh-users/zsh-autosuggestions zsh plugins.
For file system interaction in particular I'm also using exa (which you can largely use as an ls alternative via aliases).

Anonymous
07/19/24(Fri)05:11:23 No.101467476

Anonymous 07/19/24(Fri)05:11:23 No.101467476

Fuck the assistant tune, maybe Im using the wrong formatting but it is retarded compared to the base model. I suggest trying the base.

Anonymous
07/19/24(Fri)05:12:17 No.101467486

Anonymous 07/19/24(Fri)05:12:17 No.101467486

>>101467443
And fzf is a must.

Anonymous
07/19/24(Fri)05:13:05 No.101467498

Anonymous 07/19/24(Fri)05:13:05 No.101467498

>>101467286
I beg to differ. Is it really smart if it's failing a simple trivia test? Can you trust it to perform critical tasks? I would trust Gemma because it has passed everything I've thrown at it. You can only do so much with sub 70 MMLU anyways. Hopefully a Mixtral 2.0 or Nemu 27B drops.

Anonymous
07/19/24(Fri)05:14:13 No.101467517

Anonymous 07/19/24(Fri)05:14:13 No.101467517

>>101467498
That is a lack of knowlege not smart/stupid.

Anonymous
07/19/24(Fri)05:15:36 No.101467537

Anonymous 07/19/24(Fri)05:15:36 No.101467537

Ok I tried Mistral 12B instruct since people are praising it. It's less dry than mixtral but still I'm not too fond of its writing style. Maybe base is better idk. Command-R is still king in that regard.

Anonymous
07/19/24(Fri)05:17:20 No.101467563

Anonymous 07/19/24(Fri)05:17:20 No.101467563

>>101467537
I really really like base model's writing style, it is giving some characters some spunk (character, not the other kind) I haven't really seem elsewhere before.

Anonymous
07/19/24(Fri)05:18:03 No.101467574

Anonymous 07/19/24(Fri)05:18:03 No.101467574

File: file.png (66 KB, 955x165)

66 KB PNG

It also fails with the official mistral-chat and the full model. It likes to give different answers every time too.

Anonymous
07/19/24(Fri)05:20:47 No.101467606

Anonymous 07/19/24(Fri)05:20:47 No.101467606

>>101467517
More knowledge = more wisdom. When something goes wrong and I ask the AI for advice (one of the only things that it's useful for) you can trust its knowledge to give you the right answer, or something close. I know Mistral would hallucinate through more than half of the things I've put Gemma through. Being smart enough simply isn't enough. The bar has been raised.

Anonymous
07/19/24(Fri)05:29:17 No.101467729

Anonymous 07/19/24(Fri)05:29:17 No.101467729

File: 4564545515.png (47 KB, 893x582)

47 KB PNG

>>101467073
No idea what GPT-4o gives for this, but GPT-4o mini gives same answer. Proves ClosedAI has no moat, they literally just throw parameters at the the problem.

Anonymous
07/19/24(Fri)05:37:57 No.101467869

Anonymous 07/19/24(Fri)05:37:57 No.101467869

>>101467606
Hmm, Sure, but then I do not trust any of the models. The material they are using is mostly biased, so it is useless. As of now, all the AI models are just for fun; at least to me, I would not rely on them for anything serious.

Anonymous
07/19/24(Fri)05:38:12 No.101467876

Anonymous 07/19/24(Fri)05:38:12 No.101467876

>>101467443
>exa
I downloaded it now and like it already. If you're not already aware, apparently exa is unmaintained and has been replaced by eza.
Final thing, I promise. What do you use for searching? Being able to find all references or go to defintion is a huge time saver, but the best I could find is crafting grep commands excluding a bunch of directories and binary files. Even then it's not syntax aware and text only so painfully slow.
Do you know of anything that can do that sort of syntax aware searching, or at least something that searches using a basic index?

Anonymous
07/19/24(Fri)05:41:09 No.101467917

Anonymous 07/19/24(Fri)05:41:09 No.101467917

>>101467606
Different use cases. Gemma is great when you don't know something and want an answer.
Assuming Nemo's 128K context work (which I doubt) it's going to be great where you already have all or some knowledge and want to refine it further.
It's big enough to paste in manuals, journals, wiki segments or whatever video game trivia you want while maximizing speed by not having the model contain task irrelevant data.

Anonymous
07/19/24(Fri)05:41:41 No.101467925

Anonymous 07/19/24(Fri)05:41:41 No.101467925

>>101467537
Gemma-2 not only writes better, but also follows instructions better.
Mistral Nemo's key points are that it's pretty much not censored (wouldn't really say uncensored) and has long context support, but its outputs aren't as engaging as Gemma-2 or even Llama-3.

And to be completely honest I find its default assistant personality extremely boring and autistic--devoid of any personality but also with messages on the short side. The prompting format is getting old and limiting, also; it doesn't cooperate well with author notes and so on.

Anonymous
07/19/24(Fri)05:45:28 No.101467989

Anonymous 07/19/24(Fri)05:45:28 No.101467989

>>101467925
>find its default assistant personality extremely boring and autistic
Dont use assistant tunes for creative writing.

Anonymous
07/19/24(Fri)05:46:03 No.101467997

Anonymous 07/19/24(Fri)05:46:03 No.101467997

real women are the worst
they are just whores
dont do it anons, not worth it

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)05:48:40 No.101468041

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)05:48:40 No.101468041

>>101467876
In Doom Emacs SPC-c-d jumps to the definition of the thing under the cursor.
If I'm on the terminal I use The Silver Searcher (ag), ripgrep is also good.
ag and ripgrep are also text only but they're so fast (and I think both also skip binary files by default) that I have no issues in terms of speed.

Anonymous
07/19/24(Fri)05:53:17 No.101468101

Anonymous 07/19/24(Fri)05:53:17 No.101468101

>>101467925
You sound like /aids/ promptlet.
>The prompting format is getting old and limiting, also; it doesn't cooperate well with author notes and so on.
I hope you're aware that every model is only tuned to have a system prompt at the start, and then only alternating user/assistant messages afterwards. Neither Mistral Nemo nor Gemma have a system role.
I'm also not seeing the default assistant personality.
>>101467989
You should also go back to /aids/, NAI shill.

Anonymous
07/19/24(Fri)05:54:50 No.101468117

Anonymous 07/19/24(Fri)05:54:50 No.101468117

>>101467925
Might I ask what context / formatting you use for gemma 27B btw?

Anonymous
07/19/24(Fri)05:56:53 No.101468146

Anonymous 07/19/24(Fri)05:56:53 No.101468146

>>101467989
Sometimes you need your assistant to include creative element in "productivity" tasks (for example writing a character card, etc).

The "dead inside" nature of Mistral Nemo seeps into many different output types. The default OOC persona during RP for example is the same as the assistant persona, unless you override it with example messages somewhere in the prompt. That's almost the opposite problem Gemma-2 has, where characters would often be overly dramatic or enthusiastic as some have pointed out.

Community finetunes are useless at pretty much everything else other than cooming and helping the grifters behind them make money, so please don't propose them.

Anonymous
07/19/24(Fri)05:59:23 No.101468193

Anonymous 07/19/24(Fri)05:59:23 No.101468193

I see we're in full NovelAI shill damage control.
>>101468146
>The "dead inside" nature of Mistral Nemo seeps into many different output types.
Prove it, shill.

Anonymous
07/19/24(Fri)06:00:00 No.101468203

Anonymous 07/19/24(Fri)06:00:00 No.101468203

>>101468041
>Exuberant Ctags - Faster than Ag, but it builds an index beforehand. Good for really big codebases.
>Version 5.8 [09 July 2009]
Pain.
But it sounds like Doom Emacs and Ag should cover just about everything I need. Thank you for your time and help. Cheers.

Anonymous
07/19/24(Fri)06:02:18 No.101468248

Anonymous 07/19/24(Fri)06:02:18 No.101468248

>>101467925
why are you pretending that you've extensively tested this brand new model
this is such a weird larp you're engaging in, you have no right to speak in a tone of expertise about something you've played around with for 20 minutes

Anonymous
07/19/24(Fri)06:03:14 No.101468264

Anonymous 07/19/24(Fri)06:03:14 No.101468264

I've also tested the new Mistral-Nemo 12b model and I think it's pretty amazing in conversations. I don't think it's smarter than Gemma 27b but it has reacted way more natural to my responses. It often repield with 2 to 3 scentences answering exactly what I wanted it to.
It's multilingual capabilities are also great, I'd say it was one of the best german chats I've ever had with a local model. Combining that with it's large context of 128k and I'm getting a bit hyped here. I think it has great potential for a really good RP model.

Anonymous
07/19/24(Fri)06:04:44 No.101468301

Anonymous 07/19/24(Fri)06:04:44 No.101468301

>>101468248
>why are you pretending that you've extensively
Because NovelAI pays him to do that. Do you not realize how scared are they that Mistral Nemo is 12B and has 128k context? They were charging people $25 a month for their Llama 1 13B clone. Their business is on the line.

Anonymous
07/19/24(Fri)06:06:03 No.101468328

Anonymous 07/19/24(Fri)06:06:03 No.101468328

>>101468117
With Gemma 2 you can make up new roles or inject information (author notes, and so on) between the tokens tokens delimiting each message block, and the model won't get confused if you do.

The Mistral prompting format only has delimiters for the user's messages. It gets confused if you add consecutive [INST] ... [/INST] blocks, add instructions not related with {{user}}'s utterance there, or add information outside that block. It might appear to work at first, but issues eventually arise (verbatim repetition, the model not understanding correctly on who is who or talking to itself in the same response, etc).

Anonymous
07/19/24(Fri)06:12:20 No.101468430

Anonymous 07/19/24(Fri)06:12:20 No.101468430

>>101468193
If anything, you've got to watch for actual MistralAI shills/employees, they're the ones trying (and who have tried, successfully) to gain popularity on /lmg/ in various ways.

Anonymous
07/19/24(Fri)06:14:31 No.101468477

Anonymous 07/19/24(Fri)06:14:31 No.101468477

>>101468430
That's cool.
>The "dead inside" nature of Mistral Nemo seeps into many different output types.
I told you to prove it, shill. You're very eager about testing this model, I know you can do it.

Anonymous
07/19/24(Fri)06:19:30 No.101468558

Anonymous 07/19/24(Fri)06:19:30 No.101468558

>>101467160
Gemma 2 9B
>Q#
Quantization, the resolution of the weights, the bigger th number the better but it will take more vram

Anonymous
07/19/24(Fri)06:20:34 No.101468577

Anonymous 07/19/24(Fri)06:20:34 No.101468577

>>101467160
>That LLM list
That's ancient

Anonymous
07/19/24(Fri)06:29:06 No.101468711

Anonymous 07/19/24(Fri)06:29:06 No.101468711

>>101468328
>With Gemma 2 you can make up new roles or inject information (author notes, and so on) between the tokens tokens delimiting each message block
Nah, this is one of the most retarded things that I have ever read. There's no need to mess with the prompt format for that. Every model can follow up a made up format, but it always make the model more retarded. You seem to have severe brain damage if you think you need that to prompt. Also, no model likes consecutive messages of the same type, they can simply be joined.
Both Gemma and Mistral only have user and assistant messages. You're just very eager and desperate to dismiss the later for some reason. Maybe tell NovelAI to hurry up with their 70B tune so you stop pissing yourself about Mistral Nemo?

Anonymous
07/19/24(Fri)06:32:34 No.101468780

Anonymous 07/19/24(Fri)06:32:34 No.101468780

NAI shills are so fucking pathetic.

Anonymous
07/19/24(Fri)06:36:38 No.101468846

Anonymous 07/19/24(Fri)06:36:38 No.101468846

how does mistral-NeMo-12B compare to there 8x22b or 8x7b?

Anonymous
07/19/24(Fri)06:37:28 No.101468860

Anonymous 07/19/24(Fri)06:37:28 No.101468860

>>101468248
>why are you pretending that you've extensively tested this brand new model
because he did. he is one of the white flag waving baguettes. and he should tell his friends they are all gay for choosing 12B as size.

Anonymous
07/19/24(Fri)06:37:29 No.101468861

Anonymous 07/19/24(Fri)06:37:29 No.101468861

>>101468780
Who is shilling nai? All I see here are gemma (a google model) and mistral (a Mistral AI model). Are you ok anon?

Anonymous
07/19/24(Fri)06:39:02 No.101468895

Anonymous 07/19/24(Fri)06:39:02 No.101468895

>>101468861
Anon has his schizo moment just ignore him.

Anonymous
07/19/24(Fri)06:44:51 No.101468994

Anonymous 07/19/24(Fri)06:44:51 No.101468994

Ok new Mistral is really good actually. Not even talking about the context. It's horny without being retarded, descriptive as fuck. Don't even have to tell it to be. Not one anatomy logic error yet.

Anonymous
07/19/24(Fri)06:49:29 No.101469063

Anonymous 07/19/24(Fri)06:49:29 No.101469063

>>101468994
if you compare to Gemma 27b?

Anonymous
07/19/24(Fri)06:50:08 No.101469069

Anonymous 07/19/24(Fri)06:50:08 No.101469069

>>101468861
>Who is shilling nai? All I see here are gemma (a google model) and mistral (a Mistral AI model). Are you ok anon?
It's simple. Of course, they aren't going to come here to just tell you "use NAI", that would make it too obvious. But there are some clear of marks of a NAI shill.
First, Mistral Nemo "is not truly uncensored", whatever that means, because Kayra is the only true uncensored model that you should use.
Second, the "assistant personality bleeds through every output", "it's an assistant tune, you can't use it for creative writing", etc. What you should use for creative writing instead? Kayra of course, it's a completion model, unlike every other model. Who's going to attack instruction models when that's every model here? Only the NAI shills do that. Also, unlike what the disgusting shill said, Mistral Nemo seems creative.
Third, why would anyone be so eager to dismiss Mistral Nemo? It seems to me that the "12B" and the "128k context" struck a nerve, so the shills feel the need to attack that particular model to defend their shitty business. Everybody else is just barely starting playing with the model, or haven't even tried it yet.
The "gemma is better" is just the particular stick that the shill grabbed to beat the model that hurts NovelAI more.

Anonymous
07/19/24(Fri)06:51:50 No.101469092

Anonymous 07/19/24(Fri)06:51:50 No.101469092

>>101469069
Spoken like a NAI shill.

Anonymous
07/19/24(Fri)06:52:42 No.101469106

Anonymous 07/19/24(Fri)06:52:42 No.101469106

>>101469092
Nice deflection, shill. Keep crying about Mistral Nemo. You're pathetic.

Anonymous
07/19/24(Fri)06:54:01 No.101469128

Anonymous 07/19/24(Fri)06:54:01 No.101469128

>>101469092
No, i'm the nai shill.

Anonymous
07/19/24(Fri)06:56:28 No.101469172

Anonymous 07/19/24(Fri)06:56:28 No.101469172

>>101469106
You're right.

Anonymous
07/19/24(Fri)06:57:08 No.101469186

Anonymous 07/19/24(Fri)06:57:08 No.101469186

>>101469063
Gemma is either retarded or censored, there's no middle ground.

Anonymous
07/19/24(Fri)06:58:04 No.101469205

Anonymous 07/19/24(Fri)06:58:04 No.101469205

>>101469128
>>101469172
>hahaha with enough sarcarsm people will forget that we were here
Nice try, shill. But the damage control won't be enough to save your shitty business. You're cancer.

Anonymous
07/19/24(Fri)06:59:33 No.101469232

Anonymous 07/19/24(Fri)06:59:33 No.101469232

Currently waiting for Nemo finetune with Stheno dataset. I feel like we will be eating good soon.

Anonymous
07/19/24(Fri)07:00:56 No.101469258

Anonymous 07/19/24(Fri)07:00:56 No.101469258

>>101469186
use the lightest tiger/big tiger finetunes, they're still a bit censored but not lobotomized

Anonymous
07/19/24(Fri)07:01:18 No.101469260

Anonymous 07/19/24(Fri)07:01:18 No.101469260

>>101469232
Base Nemo is already a porn tune, retarded coomer.

Anonymous
07/19/24(Fri)07:02:17 No.101469273

Anonymous 07/19/24(Fri)07:02:17 No.101469273

>>101469205
You're right!

Anonymous
07/19/24(Fri)07:02:26 No.101469279

Anonymous 07/19/24(Fri)07:02:26 No.101469279

>>101469069
Weird how you know details about NAI models, I have no interest in anything outside of local so I'm not even following them. Are you sure you are not the shill?

Anonymous
07/19/24(Fri)07:02:34 No.101469282

Anonymous 07/19/24(Fri)07:02:34 No.101469282

>>101469260
>Base
>already a porn tune
???
do you know what tune even means?

Anonymous
07/19/24(Fri)07:03:38 No.101469297

Anonymous 07/19/24(Fri)07:03:38 No.101469297

>>101469260
I heard it million times already with corpo instruct tunes. It was never true in my tests.

Anonymous
07/19/24(Fri)07:04:24 No.101469310

Anonymous 07/19/24(Fri)07:04:24 No.101469310

>>101469282
It's like a asking for a porn tune of Command R. The base model was already horny enough. Also, the dataset is called C2.

Anonymous
07/19/24(Fri)07:05:32 No.101469331

Anonymous 07/19/24(Fri)07:05:32 No.101469331

>>101469310
you could have just said "base is already horny" why call "base" a tune wtf

Anonymous
07/19/24(Fri)07:06:24 No.101469344

Anonymous 07/19/24(Fri)07:06:24 No.101469344

File: 1717847394849711.png (881 KB, 2048x1986)

881 KB PNG

>>101469069

Anonymous
07/19/24(Fri)07:07:10 No.101469359

Anonymous 07/19/24(Fri)07:07:10 No.101469359

>>101469331
Anyone that isn't retarded already understand what it meant.

Anonymous
07/19/24(Fri)07:07:35 No.101469367

Anonymous 07/19/24(Fri)07:07:35 No.101469367

>>101469310
Don't say the dataset name out loud. The resident schizo has it on RSS notification. He is already on the way to shit the thread probably.

Anonymous
07/19/24(Fri)07:08:14 No.101469381

Anonymous 07/19/24(Fri)07:08:14 No.101469381

>>101469297
You seem very interested in convincing people that they shouldn't use a vanilla model. Do you have a ko-fi?

Anonymous
07/19/24(Fri)07:08:44 No.101469393

Anonymous 07/19/24(Fri)07:08:44 No.101469393

mistral nemo tokenizer woes on transformers, might be important for those working on support.
>I observed a strange behavior of the tokenizer when dealing with texts in French. In particular, contrary to previous models, it seems to consistently remove the spaces before "!" or "?", e.g.
>Thanks! Should be fixed by https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/discussions/13.
>Just got merged! :) You can now access it normally.
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/discussions/11

Anonymous
07/19/24(Fri)07:09:02 No.101469400

Anonymous 07/19/24(Fri)07:09:02 No.101469400

you're going to be able to create 4k video

Anonymous
07/19/24(Fri)07:09:50 No.101469418

Anonymous 07/19/24(Fri)07:09:50 No.101469418

>>101469359
>let's just use incorrect words for things and call other retards...

Anonymous
07/19/24(Fri)07:10:16 No.101469425

Anonymous 07/19/24(Fri)07:10:16 No.101469425

>>101469232
Hi Sao. Shilling your Nemo tune started a bit early, didn't it?

Anonymous
07/19/24(Fri)07:11:57 No.101469457

Anonymous 07/19/24(Fri)07:11:57 No.101469457

thinking about figuring out how to use a word filter to oblilterate this retard who types "shill" into 50 posts in every thread and is just obnoxious retarded cancer with 0 value

Anonymous
07/19/24(Fri)07:12:08 No.101469460

Anonymous 07/19/24(Fri)07:12:08 No.101469460

>>101469367
>>101469425
like clockwork

Anonymous
07/19/24(Fri)07:13:17 No.101469481

Anonymous 07/19/24(Fri)07:13:17 No.101469481

>>101469457
>figuring out how to use a word filter
do you not have 4chanx?

Anonymous
07/19/24(Fri)07:13:44 No.101469489

Anonymous 07/19/24(Fri)07:13:44 No.101469489

>>101469381
No, but I have your mother under my desk.

Anonymous
07/19/24(Fri)07:13:50 No.101469494

Anonymous 07/19/24(Fri)07:13:50 No.101469494

>>101469481
i do but i'm very lazy

Anonymous
07/19/24(Fri)07:14:16 No.101469498

Anonymous 07/19/24(Fri)07:14:16 No.101469498

>>101469457
But you don't have any problem with the inorganic Nemo dismissal, right? Of course, why would anyone use an assistant tune when you can just subscribe to NovelAI?

Anonymous
07/19/24(Fri)07:15:23 No.101469519

Anonymous 07/19/24(Fri)07:15:23 No.101469519

>>101469494
arrow on any comment
filter
delete what it pasted replace with
/shill/i
there ya go

Anonymous
07/19/24(Fri)07:16:01 No.101469529

Anonymous 07/19/24(Fri)07:16:01 No.101469529

>>101469498
i dont care and i have no idea what combination of birth defects and mental health issues would provoke you to care either

Anonymous
07/19/24(Fri)07:16:43 No.101469545

Anonymous 07/19/24(Fri)07:16:43 No.101469545

>>101469519
Thanks, now I can advertise my products without being attacked. That's what /lmg/ is for anyway.

Anonymous
07/19/24(Fri)07:16:48 No.101469548

Anonymous 07/19/24(Fri)07:16:48 No.101469548

File: 1678462422119147.jpg (136 KB, 750x712)

136 KB JPG

>>101469519
neat

Anonymous
07/19/24(Fri)07:18:42 No.101469590

Anonymous 07/19/24(Fri)07:18:42 No.101469590

Won't somebody please think of NovelAI? They're suffering.

Anonymous
07/19/24(Fri)07:20:30 No.101469633

Anonymous 07/19/24(Fri)07:20:30 No.101469633

>>101469529
You're the NAI shill from earlier.

Anonymous
07/19/24(Fri)07:27:13 No.101469762

Anonymous 07/19/24(Fri)07:27:13 No.101469762

>>101469590
>>101469633
And I will keep using base(d) models and ignore instruct ones.

Anonymous
07/19/24(Fri)07:27:16 No.101469764

Anonymous 07/19/24(Fri)07:27:16 No.101469764

Some exl2 quantizations of Mistral Nemo, like the 8 bpw one I downloaded yesterday on HF from a certain DrNicefellow, appear to have some quality issues (mine occasionally outputs Chinese characters or extra punctuation at the end of the model's output), so beware of that when judging the model.

Anonymous
07/19/24(Fri)07:29:34 No.101469802

Anonymous 07/19/24(Fri)07:29:34 No.101469802

>>101469279
>I have no interest in anything outside of local so I'm not even following them.
So you prefer to think that the anons that said "the default assistant is in every output" and "you can't use 'assistant tunes' for creative writing" did so without ulterior motives? And that the instant dismissals are completely honest reviews?

Anonymous
07/19/24(Fri)07:31:05 No.101469823

Anonymous 07/19/24(Fri)07:31:05 No.101469823

>>101469764
I have seen that too, but I downloaded them from turboderp.

Anonymous
07/19/24(Fri)07:31:43 No.101469832

Anonymous 07/19/24(Fri)07:31:43 No.101469832

>>101469802
>completely honest reviews?
no such thing in here, at all, no matter if positive or negative, there is just no honesty.

Anonymous
07/19/24(Fri)07:31:44 No.101469833

Anonymous 07/19/24(Fri)07:31:44 No.101469833

>>101469764
thanks for confirming. had issues with it too so switched to turboderps quant but updated exl2 too so wasn't sure what fixed things.

Anonymous
07/19/24(Fri)07:32:34 No.101469847

Anonymous 07/19/24(Fri)07:32:34 No.101469847

>>101469762
And you'll cry that the model doesn't do what you want.

Anonymous
07/19/24(Fri)07:41:31 No.101469984

Anonymous 07/19/24(Fri)07:41:31 No.101469984

>>101467073
>Jeetma is better because it passes this one, irrelevant, cherry picked general knowledge test.

Anonymous
07/19/24(Fri)07:43:29 No.101470017

Anonymous 07/19/24(Fri)07:43:29 No.101470017

>>101469764
I'm using the turboderp 8.0 bpw quant. I got some Chinese once, set min-p to 0.001 without checking the token probabilities because I couldn't be arsed, and I haven't seen chink runes since. Maybe I was just unlucky the first time or maybe this is making a difference.

Anonymous
07/19/24(Fri)07:44:03 No.101470025

Anonymous 07/19/24(Fri)07:44:03 No.101470025

What if the model was re trained on the fly for any domain case? Is this the next step?

Anonymous
07/19/24(Fri)07:45:23 No.101470053

Anonymous 07/19/24(Fri)07:45:23 No.101470053

File: file.png (650 KB, 2529x680)

650 KB PNG

>>101469764
The mistral-chat terminal app doesn't put a space around [INST] unlike the transformers template.

Anonymous
07/19/24(Fri)07:50:29 No.101470129

Anonymous 07/19/24(Fri)07:50:29 No.101470129

I like that Nemo 12B would swear when you instruct it to.
It feels a lot less restricted than Llama3 8B.

Anonymous
07/19/24(Fri)07:54:51 No.101470192

Anonymous 07/19/24(Fri)07:54:51 No.101470192

>>101470129
12B seems to swear when not instructed as well at times. Assuming it's within the context of an RP.

Anonymous
07/19/24(Fri)07:58:27 No.101470234

Anonymous 07/19/24(Fri)07:58:27 No.101470234

File: memory bandwidth.jpg (185 KB, 828x647)

185 KB JPG

Anonymous
07/19/24(Fri)08:03:48 No.101470305

Anonymous 07/19/24(Fri)08:03:48 No.101470305

nemo lcpp eta? tmw?

Anonymous
07/19/24(Fri)08:06:45 No.101470353

Anonymous 07/19/24(Fri)08:06:45 No.101470353

>exllama already supports nemo and it's faster than llama.cpp
There's no reason to use llama.cpp anymore.

Anonymous
07/19/24(Fri)08:10:24 No.101470426

Anonymous 07/19/24(Fri)08:10:24 No.101470426

>>101470353
cpu offloading

Anonymous
07/19/24(Fri)08:12:07 No.101470451

Anonymous 07/19/24(Fri)08:12:07 No.101470451

>>101470353
>exllama already supports nemo and it's faster than llama.cpp
>>101470017
>I'm using the turboderp 8.0 bpw quant. I got some Chinese once
>>101469833
>had issues with it too
>>101469764
>Some exl2 quantizations of Mistral Nemo... appear to have some quality issues

Anonymous
07/19/24(Fri)08:13:02 No.101470464

Anonymous 07/19/24(Fri)08:13:02 No.101470464

>>101470353
Except llama.cpp is faster.

Anonymous
07/19/24(Fri)08:14:28 No.101470488

Anonymous 07/19/24(Fri)08:14:28 No.101470488

>>101470464
False. >>101461165

Anonymous
07/19/24(Fri)08:17:38 No.101470548

Anonymous 07/19/24(Fri)08:17:38 No.101470548

>>101470353
It is for people with less vram.

Anonymous
07/19/24(Fri)08:20:22 No.101470603

Anonymous 07/19/24(Fri)08:20:22 No.101470603

Nemotron gguf status?

Anonymous
07/19/24(Fri)08:22:53 No.101470643

Anonymous 07/19/24(Fri)08:22:53 No.101470643

>>101461165
they're completely different quantizations even if you try to match the overall size, comparing them makes little sense

Anonymous
07/19/24(Fri)08:23:13 No.101470651

Anonymous 07/19/24(Fri)08:23:13 No.101470651

File: zz.jpg (24 KB, 749x614)

24 KB JPG

goodmorning sirs

Anonymous
07/19/24(Fri)08:25:28 No.101470695

Anonymous 07/19/24(Fri)08:25:28 No.101470695

>>101470643
It's 40% faster.

Anonymous
07/19/24(Fri)08:27:00 No.101470720

Anonymous 07/19/24(Fri)08:27:00 No.101470720

Exlcels...

Anonymous
07/19/24(Fri)08:28:09 No.101470739

Anonymous 07/19/24(Fri)08:28:09 No.101470739

>>101470651
Good morning Anon

Anonymous
07/19/24(Fri)08:29:22 No.101470767

Anonymous 07/19/24(Fri)08:29:22 No.101470767

File: 1717218262293747.png (1.07 MB, 1024x1024)

1.07 MB PNG

A mischievous glint, shall we? she says in a husky voice, a smirk playing on her lips, eyes sparkling with mischief. There's a playful glint as she addresses the power dynamic, playfully smirking as she offers her ministrations. An audible pop and rivulets of—admit it, pet—the ball is in your court. The game is on; the choice is yours."I don't bite…"unless you want me to, she purrs, half-lidded eyes sending waves of arousal pooling in her belly. Take your pleasure, she urges, fiddling with the hem of her skirt, kiss-bruised lips curving into a bruising kiss. You hesitate, torn between propriety and desire, and she grins wickedly, fiery red hair contrasting with her long lashes."The night is still young,"she purrs, propriety be damned as the world narrows to just the two of you, pupils blown wide with pleasure. Her tongue darts out, tracing your ear, and her chestnut eyes hold your gaze as her nails rake angry red lines down your back. Her cheeks flame as she revels in your response, cheeks hollowing with each sharp intake of breath. Stars burst behind her eyes, inner walls clenching around the void that only you can fill. She craves your touch, your possession—heart, body, and soul belong to you… for now. Eyes alight with mirth, she teases,"Naughty boy, but before that…"—the minx traces a finger along your jawline, deferring your pleasure as the tension builds,"but first…"Oh my…

Anonymous
07/19/24(Fri)08:31:05 No.101470800

Anonymous 07/19/24(Fri)08:31:05 No.101470800

>>101470767
Thanks, deleting my AI folder now.

Anonymous
07/19/24(Fri)08:32:32 No.101470828

Anonymous 07/19/24(Fri)08:32:32 No.101470828

>>101470767
So why exacly does it happen? It is becouse of training data? overtraining?

Anonymous
07/19/24(Fri)08:37:41 No.101470925

Anonymous 07/19/24(Fri)08:37:41 No.101470925

>>101470767
This post is extremely high quality.

Anonymous
07/19/24(Fri)08:44:45 No.101471059

Anonymous 07/19/24(Fri)08:44:45 No.101471059

>>101464911
>In all 3 instances there was no anthropomorphism.
Nice, nice.
Thank you very much Nala anon

Anonymous
07/19/24(Fri)08:44:58 No.101471063

Anonymous 07/19/24(Fri)08:44:58 No.101471063

>>101470643
>comparing them makes little sense
Are you fucking retarded?

Anonymous
07/19/24(Fri)08:47:08 No.101471108

Anonymous 07/19/24(Fri)08:47:08 No.101471108

>>101468577
I feel like every AI general here in /g/ has outdated links in their OP because of how fast everything is progressing

Anonymous
07/19/24(Fri)08:50:29 No.101471181

Anonymous 07/19/24(Fri)08:50:29 No.101471181

Mistral Nemo was trained in FP8; wouldn't quantization to even INT8 damage model quality?

Anonymous
07/19/24(Fri)08:50:34 No.101471183

Anonymous 07/19/24(Fri)08:50:34 No.101471183

>>101470767
Reading this gave me a boner

Anonymous
07/19/24(Fri)08:51:52 No.101471210

Anonymous 07/19/24(Fri)08:51:52 No.101471210

>>101465735
>Guess I should turn down the temperature or something?
For the hornyness? Nah. You have to get around it with prompting.
Instructions in the last assistant output or using author's notes at low depth, that kind of thing.

Anonymous
07/19/24(Fri)08:52:05 No.101471215

Anonymous 07/19/24(Fri)08:52:05 No.101471215

>>101470828
Because they are apt descriptions and you're all so fucking gooned out on text-gen that the appearance of words in any order strums on the neuro-chemical void caused by your crippling dopamine addiction?

Anonymous
07/19/24(Fri)08:53:24 No.101471243

Anonymous 07/19/24(Fri)08:53:24 No.101471243

TTS this >>101470767

Anonymous
07/19/24(Fri)09:00:18 No.101471362

Anonymous 07/19/24(Fri)09:00:18 No.101471362

smedrins

Anonymous
07/19/24(Fri)09:02:08 No.101471412

Anonymous 07/19/24(Fri)09:02:08 No.101471412

>>101471181
But the weights are bf16...

Anonymous
07/19/24(Fri)09:02:14 No.101471415

Anonymous 07/19/24(Fri)09:02:14 No.101471415

>>101471108
nobody bothers to update OP because it would require putting some effort into it

Anonymous
07/19/24(Fri)09:02:26 No.101471419

Anonymous 07/19/24(Fri)09:02:26 No.101471419

>>101471181
Yes. But the support for it will get added to llama.cpp if the model is good enough.
>>101471362
I respect your struggle.

Anonymous
07/19/24(Fri)09:03:20 No.101471438

Anonymous 07/19/24(Fri)09:03:20 No.101471438

>>101471243
https://vocaroo.com/1P91lYw9I64B
verbatim

Anonymous
07/19/24(Fri)09:03:25 No.101471440

Anonymous 07/19/24(Fri)09:03:25 No.101471440

>>101471415
that and whatever you did change would cause at least one anon to screech for at least a thread

Anonymous
07/19/24(Fri)09:04:08 No.101471452

Anonymous 07/19/24(Fri)09:04:08 No.101471452

>>101471415
>>101471440

should leave the updating to bots

Anonymous
07/19/24(Fri)09:04:55 No.101471468

Anonymous 07/19/24(Fri)09:04:55 No.101471468

Mind if ask y'all somethin'? I've been looking for an AI thing that can handle and organize txt files off about 2000 lines in length, or so. They approximate around 3500 tokens, for most of the files.

Anybody know of which ones may be able to handle that amount for just simple organization, task sorting, etc. of these files? Github anything?

>inb4 search innernet
I actually have been searching around and have tried a few, but most return an error of the the request being too large etc. Any ideas?

Anonymous
07/19/24(Fri)09:05:33 No.101471481

Anonymous 07/19/24(Fri)09:05:33 No.101471481

>>101471452
seeing the stuff recap bot spews out sometimes, no thanks

Anonymous
07/19/24(Fri)09:05:33 No.101471482

Anonymous 07/19/24(Fri)09:05:33 No.101471482

>>101471210
You get around it by using a different model.

Anonymous
07/19/24(Fri)09:05:44 No.101471485

Anonymous 07/19/24(Fri)09:05:44 No.101471485

>>101471452
>bots are incapable of even updating the OP
>"experts" cry about how AI is going to destroy the world

Anonymous
07/19/24(Fri)09:06:38 No.101471499

Anonymous 07/19/24(Fri)09:06:38 No.101471499

>>101471415
Update the OP to only include Sao models. That's the consensus of the thread.

Anonymous
07/19/24(Fri)09:06:51 No.101471502

Anonymous 07/19/24(Fri)09:06:51 No.101471502

>>101471412
The values might have been saved in BF16, but aren't they still quantized as FP8? It would be like saving a 256-color image to a 65536-color one; banding would remain despite the higher precision.

Anonymous
07/19/24(Fri)09:08:25 No.101471523

Anonymous 07/19/24(Fri)09:08:25 No.101471523

>>101471502
I think you're retarded. Please redirect your concerns here: reddit.com/r/LocalLLaMA

Anonymous
07/19/24(Fri)09:08:57 No.101471534

Anonymous 07/19/24(Fri)09:08:57 No.101471534

>>101471468
Might be able to repurpose code from this if you don't find anything ready to use - https://github.com/ozgrozer/ai-renamer

Anonymous
07/19/24(Fri)09:09:17 No.101471540

Anonymous 07/19/24(Fri)09:09:17 No.101471540

>>101471438
Thanks, anon

Anonymous
07/19/24(Fri)09:09:33 No.101471548

Anonymous 07/19/24(Fri)09:09:33 No.101471548

>>101471243
https://vocaroo.com/1fPIb2ZpiI7a
better one, this time spoken by former Lara Croft and star of Spooks; Keeley Hawes

Anonymous
07/19/24(Fri)09:10:28 No.101471568

Anonymous 07/19/24(Fri)09:10:28 No.101471568

>>101471502
>Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
this doesn't imply to me it was trained in fp8 just that it's *lossless* at that quant

Anonymous
07/19/24(Fri)09:11:27 No.101471586

Anonymous 07/19/24(Fri)09:11:27 No.101471586

>>101471534
Alright, thanks man I'll give it a shot

Anonymous
07/19/24(Fri)09:13:31 No.101471621

Anonymous 07/19/24(Fri)09:13:31 No.101471621

so do you guys have local opus yet?

Anonymous
07/19/24(Fri)09:20:54 No.101471762

Anonymous 07/19/24(Fri)09:20:54 No.101471762

>>101471621
yes we do. no we won't show it to you

Anonymous
07/19/24(Fri)09:24:19 No.101471841

Anonymous 07/19/24(Fri)09:24:19 No.101471841

>>101471621
I trained my own model by cumming in some slut and she gifted me a brand new model after just under a year
It took me YEARS to be able to train it to do what I want though

Anonymous
07/19/24(Fri)09:42:14 No.101472230

Anonymous 07/19/24(Fri)09:42:14 No.101472230

What are some good models for generating long detailed descriptions of rape scenes? Asking for a friend

Anonymous
07/19/24(Fri)09:44:25 No.101472279

Anonymous 07/19/24(Fri)09:44:25 No.101472279

>>101472230
https://huggingface.co/cognitivecomputations/Samantha-120b

Anonymous
07/19/24(Fri)09:47:47 No.101472357

Anonymous 07/19/24(Fri)09:47:47 No.101472357

It's annoying how dumb Gemma 2 27B is, but it writes quite well.

Anonymous
07/19/24(Fri)09:48:38 No.101472380

Anonymous 07/19/24(Fri)09:48:38 No.101472380

>>101472357
t. my english teacher in year 8

Anonymous
07/19/24(Fri)09:53:23 No.101472498

Anonymous 07/19/24(Fri)09:53:23 No.101472498

File: miku.png (30 KB, 440x145)

30 KB PNG

>>101470767

Anonymous
07/19/24(Fri)09:54:27 No.101472523

Anonymous 07/19/24(Fri)09:54:27 No.101472523

I remember when people hyped Gemma at release.

Anonymous
07/19/24(Fri)09:56:16 No.101472560

Anonymous 07/19/24(Fri)09:56:16 No.101472560

>>101470828
It is because you touch yourself to text and companies don't like that.

Anonymous
07/19/24(Fri)09:57:45 No.101472587

Anonymous 07/19/24(Fri)09:57:45 No.101472587

>>101470767
>I don't bite…"unless you want me to
I would just like to remind everyone that this iconic phrase was the first phrase the first frankenmerge said to Undi after he created it. This made him think frankenmerges are good.

Anonymous
07/19/24(Fri)09:59:05 No.101472612

Anonymous 07/19/24(Fri)09:59:05 No.101472612

>>101472523
I still don't know if the loader is bugged to shit or if the model is so bad.

Anonymous
07/19/24(Fri)10:03:00 No.101472689

Anonymous 07/19/24(Fri)10:03:00 No.101472689

>>101472612
it's just as bad if you try it on google's website

Anonymous
07/19/24(Fri)10:08:03 No.101472786

Anonymous 07/19/24(Fri)10:08:03 No.101472786

>>101472560
It is better than me watching porn, though. I started having problems with erection when the real deal was on the table, and since I switched from porn to smut, it works again. 9 of 10 doctors would recommend.

Anonymous
07/19/24(Fri)10:14:41 No.101472917

Anonymous 07/19/24(Fri)10:14:41 No.101472917

>>101471568
BitNet is also quantization-aware training, yet the first experimental BitNet weights released were in FP32 format.

Anonymous
07/19/24(Fri)10:15:09 No.101472926

Anonymous 07/19/24(Fri)10:15:09 No.101472926

>>101470828
It's overfitting because this happens a lot in whatever fiction is used in the training data.
LLMs are stochastic prediction machines, and they will learn to over-represent patterns that appear repeatedly in the text, there's no way to avoid this. There's only so many ways you can describe the sensation of excitement in English text.

Anonymous
07/19/24(Fri)10:19:46 No.101473027

Anonymous 07/19/24(Fri)10:19:46 No.101473027

>>101472926
>there's no way to avoid this
Killing the narrator solves 90% of shivers, but that's a tough RP to swallow for many.

Anonymous
07/19/24(Fri)10:20:48 No.101473053

Anonymous 07/19/24(Fri)10:20:48 No.101473053

>>101470767

She leans in close, her warm breath tickling your ear as she whispers, "Why don't we take this somewhere more... private?" Her fingers trail down your chest teasingly. "I have so many ideas for what we could do next." She bites her lip and looks up at you through her lashes, waiting to see how you'll respond to her provocative invitation.

Anonymous
07/19/24(Fri)10:20:55 No.101473056

Anonymous 07/19/24(Fri)10:20:55 No.101473056

>>101472917
idk, it just seems to me they would communicate it better if it was, like
>Trained in fp8 for lossless inference
or something

Anonymous
07/19/24(Fri)10:21:48 No.101473078

Anonymous 07/19/24(Fri)10:21:48 No.101473078

>>101470828
It doesn't actually happen, it's a meme.

Anonymous
07/19/24(Fri)10:22:46 No.101473101

Anonymous 07/19/24(Fri)10:22:46 No.101473101

>>101473027
What do you mean? So you only use *plaps you* formatting? Doesn't that make the model dumber?

Anonymous
07/19/24(Fri)10:22:56 No.101473106

Anonymous 07/19/24(Fri)10:22:56 No.101473106

Ballpark, what would be the VRAM requirements to run an unquantized version of llama 3 405b

Anonymous
07/19/24(Fri)10:23:49 No.101473120

Anonymous 07/19/24(Fri)10:23:49 No.101473120

>>101473056
it's not better if it's incorrect

Anonymous
07/19/24(Fri)10:30:20 No.101473264

Anonymous 07/19/24(Fri)10:30:20 No.101473264

>>101472926
>there's no way to avoid this
better datasets
and no regurgitated slop from other models

Anonymous
07/19/24(Fri)10:30:36 No.101473269

Anonymous 07/19/24(Fri)10:30:36 No.101473269

>>101473106
1TB VRAM

Anonymous
07/19/24(Fri)10:31:45 No.101473287

Anonymous 07/19/24(Fri)10:31:45 No.101473287

>>101473106
between 850gb and 1tb

Anonymous
07/19/24(Fri)10:32:06 No.101473296

Anonymous 07/19/24(Fri)10:32:06 No.101473296

>>101470488
Does exllama have context shifting yet?

Anonymous
07/19/24(Fri)10:45:28 No.101473548

Anonymous 07/19/24(Fri)10:45:28 No.101473548

>>101473296
Yes, for a while.
>New generator with dynamic batching, smart prompt caching, K/V cache deduplication and simplified API
https://github.com/turboderp/exllamav2?tab=readme-ov-file#new-in-v010
Also, drop the "context shift" marketing term already. It's retarded. It means something completely between llama.cpp and koboldcpp.

Anonymous
07/19/24(Fri)10:46:49 No.101473571

Anonymous 07/19/24(Fri)10:46:49 No.101473571

>>101473548
>It means something completely between llama.cpp and koboldcpp.
Really?
What's the difference?
I know koboldcpp has the deprecated smart context feature, but I didn't know that their context shift was (significantly) different from upstream.

Anonymous
07/19/24(Fri)10:46:55 No.101473575

Anonymous 07/19/24(Fri)10:46:55 No.101473575

Are there any FIM (fill in middle) models in the 7-14b range?

Anonymous
07/19/24(Fri)10:50:47 No.101473646

Anonymous 07/19/24(Fri)10:50:47 No.101473646

>>101473106
I hope that 96VRAM + 128RAM will be enough to run it in Q4.

Anonymous
07/19/24(Fri)10:53:23 No.101473694

Anonymous 07/19/24(Fri)10:53:23 No.101473694

>>101473571
For llama.cpp, it means to generate past the max context by dropping the earlier tokens each time a new one is generated. It has nothing to do with caching. For kobold.cpp, it means prompt caching.
For example, you could work around the old context shift bug with Gemma in llama.cpp by never going past the max context, prompt caching worked just fine. While a kobold retard reading that thinks that the problem was with caching the prompt.

Anonymous
07/19/24(Fri)10:55:10 No.101473728

Anonymous 07/19/24(Fri)10:55:10 No.101473728

I just realized I have a 64 GB RAM MacBook Pro with M1 chip, and I have 2x3090 cards in my desktop. That's ~100 GB. I could run 405B llama at almost Q2 if I used the RPC thing in llama.cpp, right?

Anonymous
07/19/24(Fri)10:55:20 No.101473731

Anonymous 07/19/24(Fri)10:55:20 No.101473731

>>101473646
that will be enough, but it would so fucking slow that its just not worth it. 96GB of VRAM is not enough for even a Q2 quant

Anonymous
07/19/24(Fri)10:56:15 No.101473754

Anonymous 07/19/24(Fri)10:56:15 No.101473754

>>101473694
>For kobold.cpp, it means prompt caching.
It does?
That's bizarre, considering that prompt caching is a thing that's been around for a while now.
Are you sure of that? Is there a link somewhere explaining how and why that is, because if you are right, that's incredibly fucking retarded.

Anonymous
07/19/24(Fri)10:59:24 No.101473813

Anonymous 07/19/24(Fri)10:59:24 No.101473813

>>101473728
Yea you should. It will drop to the lowest speed and you might need good network to communicate between nodes.

I have a couple macs and a 4x3090 system. I plan on doing something like that. I guess it will always be better than running on RAM

Anonymous
07/19/24(Fri)10:59:29 No.101473815

Anonymous 07/19/24(Fri)10:59:29 No.101473815

>>101473731
I can leave it alone to generate fancy scenarios, tailored to my tastes, for future role-playing with a smaller model

Anonymous
07/19/24(Fri)11:02:31 No.101473871

Anonymous 07/19/24(Fri)11:02:31 No.101473871

I'm very new to llama.cpp use, what is the command to load context in Q4_cache? All I can find on the list of commands is -ctk and -ctv, i'm assuming I need to use those.

Anonymous
07/19/24(Fri)11:04:21 No.101473910

Anonymous 07/19/24(Fri)11:04:21 No.101473910

>>101473871
Yeah set them to e.g. q8_0.

Anonymous
07/19/24(Fri)11:04:49 No.101473918

Anonymous 07/19/24(Fri)11:04:49 No.101473918

>>101473646
It would likely not be enough since all Q4 variants are more than 4.2 bpw. The quant that apparently is close to 4bpw is "Q3_K_L" (I believe this is a different thing from the L quants that bartowski makes, they just happen to be the same name).
https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp

Anonymous
07/19/24(Fri)11:07:34 No.101473972

Anonymous 07/19/24(Fri)11:07:34 No.101473972

>>101473813
I don't think network speeds will be a huge issue as this is local.

I also have 128 GB RAM on the desktop, but DDR4 so hm.

Anonymous
07/19/24(Fri)11:07:58 No.101473982

Anonymous 07/19/24(Fri)11:07:58 No.101473982

>>101473910
which to use though? ctk or ctv? Soo..
-ctv q4_0 ?

Anonymous
07/19/24(Fri)11:08:32 No.101473994

Anonymous 07/19/24(Fri)11:08:32 No.101473994

>>101473918
If I add another 32x4 to my quad-channel 32x4 configuration in an octo-channel server, will it drop all memory to dual-channel?

Anonymous
07/19/24(Fri)11:09:35 No.101474009

Anonymous 07/19/24(Fri)11:09:35 No.101474009

>>101473994
*add 32x2

Anonymous
07/19/24(Fri)11:11:27 No.101474046

Anonymous 07/19/24(Fri)11:11:27 No.101474046

>>101473982
Both. You can experiment with one or the other of course, but sounds overkill when you don't know wtf you're doing. Aka until you know wtf you're doing.

Anonymous
07/19/24(Fri)11:11:42 No.101474047

Anonymous 07/19/24(Fri)11:11:42 No.101474047

It's just hard to find that particular ram at a fair price.

Anonymous
07/19/24(Fri)11:14:48 No.101474089

Anonymous 07/19/24(Fri)11:14:48 No.101474089

>>101473994
Idk, does the motherboard's manual say nothing about this?

Anonymous
07/19/24(Fri)11:20:04 No.101474191

Anonymous 07/19/24(Fri)11:20:04 No.101474191

>>101474151
>>101474151
>>101474151

Anonymous
07/19/24(Fri)11:24:38 No.101474269

Anonymous 07/19/24(Fri)11:24:38 No.101474269

File: Installation of Memory Modules.png (64 KB, 788x522)

64 KB PNG

>>101474089
No. Probably nobody does this kind of thing to servers in a production environment.

Anonymous
07/19/24(Fri)11:28:13 No.101474353

Anonymous 07/19/24(Fri)11:28:13 No.101474353

>>101473754
https://github.com/LostRuins/koboldcpp/releases/tag/v1.48.1
>So long as you use no memory/fixed memory and don't use world info, you should be able to avoid almost all reprocessing between consecutive generations
That's about prompt caching between requests, it doesn't trigger by going over max context in a single request.
https://github.com/ggerganov/llama.cpp/issues/7230#issuecomment-2106074784
>there isn't a way to completely disable context shifting in the server, but you should be able to avoid it by ensuring that the request does not exceed the context size
While for llama.cpp, context shift is only something that happens when you need to generate past the max context size. It's also tagged as "infinite text generation via context shifting" in the main example and it only triggers with a ">= n_ctx" check. It's not about caching between requests.
https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L573

Anonymous
07/19/24(Fri)11:48:00 No.101474795

Anonymous 07/19/24(Fri)11:48:00 No.101474795

>>101469519
Maybe reddit is better for you, don't you think?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.