[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: sherrifto.jpg (284 KB, 1824x1248)
284 KB
284 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106593104 & >>106582475

►News
>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: file.png (292 KB, 539x539)
292 KB
292 KB PNG
►Recent Highlights from the Previous Thread: >>106593104

--Parameter size vs. training data tradeoffs in local LLM performance:
>106593869 >106593881 >106593924 >106593942 >106593953 >106593958 >106594032 >106594034 >106595608 >106594324 >106594353 >106594408 >106594021 >106594049 >106594037 >106594461 >106594470 >106594522 >106594559 >106594575 >106594598 >106594625 >106594648 >106594565 >106594583 >106594616 >106594630 >106594666 >106594687 >106594699 >106594831 >106594867 >106594907 >106594743 >106594756 >106594774 >106594794 >106594786 >106594795 >106595943 >106595953 >106594780 >106594817 >106594853 >106594839 >106594738 >106594475
--Deterministic inference mode implementation in llama.cpp:
>106596426 >106597072 >106597126
--AO3 dataset scraping and model training content concerns:
>106594111 >106594126 >106594319 >106594146 >106594230 >106594262
--Barriers to mainstream adoption of local LLMs and potential use cases:
>106594924 >106594974 >106594996 >106595041 >106594977
--CPU-only model inference struggles and home server optimization tradeoffs:
>106594302 >106594355 >106594401
--GPT-oss struggles to match Qwen 30B performance in translation and general usefulness:
>106597371 >106597400 >106597462 >106597955
--ASML's acquisition of Mistral AI and partnership implications:
>106596542 >106596739 >106596816
--How blind models perceive the Earth: henry's perspective:
>106595758
--Miku (free space):
>106593148 >106593203 >106593259 >106593302 >106593558 >106593704 >106597382

►Recent Highlight Posts from the Previous Thread: >>106593110

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
i am desperate for mistral large 3
>>
Tetolove
>>
>>106599408
fuck large 3, where is nemo 2?
>>
File: 1742952808618909.jpg (359 KB, 2048x1996)
359 KB
359 KB JPG
Running or training LLMs in anything below fp32 is an insult to god and his creation.
>>
>>106599464
I want q1 native training
I want next bit prediction instead of token malarkey
>>
>>106599452
sorry, we only have nemotroon
>>
>>106599475
>bits
Meme, useless, expensive.
>>
>>106595758
this is fucking nuts.

also
>the coder variant of the same base model isn't doing nearly as well.
>Seems like the post-training is fairly destructive
I heard anons say this, but practical demonstration is always nice.
>>
File: sandwich.jpg (24 KB, 533x375)
24 KB
24 KB JPG
>>106599475
>>106599538
Did Meta ever release the checkpoints for BLT?
>>
>>106599452
https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
here
>>
File: 1753201169111186.png (55 KB, 947x553)
55 KB
55 KB PNG
>>106599556
As expected, big dense models are smart enough to understand nuance and are able to deliver the best results. I can't wait for the MoE meme to die again.
>>
>>106599613
Densesissies LOST
Get over it
>>
File: a.jpg (117 KB, 590x580)
117 KB
117 KB JPG
>google gemini
>i'm too retarded to even know what to pack for my little camping trip
>>
File: llama 3 405b.jpg (88 KB, 947x553)
88 KB
88 KB JPG
>>106599556
Whoever shit talked Llama 3 405B two threads ago should apologize.
>>
>this model does okay on this specific task and basically you should apologize
Kill yourself
>>
>>106599675
>pic not a labradoodle
Disappointing.
>>
File: kimi-k2.png (67 KB, 947x553)
67 KB
67 KB PNG
>>106599709
>does "okay"
>better than DeepSeek and Kimi K2
APOLOGIZE
>>
>>106599709
Your brain must be MoE as well to not catch the nuanced implication this has about MoE models.
>>
>>106599736
>>106599680
>>106599613
>>106599556
Usecase? Oh wait, there's none.
>>
>>106599765
The writing was already on the wall when Qwen3 235B-A22B was neck-and-neck with Qwen3 32B. That was as close to apples as apples as you can get. They shouldn't have even been close.
>>
Has there been a vibevoice sillytavern extension yet?
>>
>>106599797
Use case of applying knowledge to an unfamiliar context? Basically anything where this would be useful as a thinker rather than a search engine.
>>
>>106599556
>>106599613
>>106599680
>>106599736
The only way the model can possibly understand the shape of the earth is by seeing things like svg representations of the world map, or similar code designed to reconstruct the shape of the continents in the same way. So this is just another trivia knowledge memorization question, like asking about obscure anime characters. Really uninteresting except for the unusual visual spectacle which tricks you into thinking this is some kind of elite generalization test.
>>
>>106599816
Yeah why don't you compare with 2507 checkpoint? You're just trying to push the dense agenda with apple vs. orange comparison
>>
>>106599840
>unfamiliar context
like those o3 benchmarks about 2d square filling?
>invent new benchmark
>somehow only our latest model score perfect on this benchmark hehe
>>
>>106599875
So why is a 405B model pretrained on 15 trillion tokens beating the shit out of a model with 671B weights pretrained on 14.8 trillion tokens and a model with 1T weights pretrained on 15.5 trillion tokens? Meta was just way better than DeepSeek and Moonshot AI at curating training data with world maps in it to win at this specific not-related-to-any-published-benchmark test? I guess that's possible. You could show this was a fluke by performing a different test and that shows the MoE's generalizing better.
>>
>>106599939
We get it, you bought too many GPUs. Now stop coping about some useless benchmark.
>>
>>106599882
Where's the 2507 checkpoint of Qwen3 32B? Oh yeah /they didn't make one/. The first release was apples to apples, same data and training methodology.
>>
>>106599949
Not until you apologize.
>>
>>106599887
Wait until you find out it's all about prompt engineering
>>
These threads are always full of negativity and fighting, so let me ask, Anon, what fun or useful done with your local models this week?
>>
>>106595758
>What, in the training recipe, actually dictates performance on this test?
That's what I want to know. Did these models have anything in the training data that's even close to the query they are receiving?
How much actual generalization is happening there?
This is a really cool experiment, regardless, even if I'm not quite sure what conclusions if any can be actually drawn from it.
>>
>>106599675
clearly fake as this amount of happiness and freedom doesn't exist.
>>
>>
Just found out I can only really run 4b models and I kinda wanna kill myself.
>>
>>106599969
hmm, maybe they didn't make one because the dense version was pointless?
>>
>>106600066
What are your specs?
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
Is we getting image and video models from Qwen? (kiwi's) (please)
>>
>>106600079
12 gigs of VRAM and 32 gigs of RAM, like I said last thread.

Tried to run Gemma3 27B quant (ollama said there's not enough space), Gemma3 12B quant (API requests always time out) and finally 4B. Only 4B worked.
>>
>>106600094
OLLMAO
>>
>>106600094
* in CLINE
>>
>>106600094
You can run 12B models just fine with the right quantization and right split between RAM and VRAM.
Hell, you might even want to use a MoE like Qwen 3 30B at q8 with most of the experts running on the CPU/RAM.
I have no idea about ollama, so you'll have to find the way to do these things yourself.
Or just use llama-server directly.
>>
>>106600094
>another retard uses ollama
>>
>>106600120
>>106600148
I'm a tourist and I don't know how to set things up without ollama. Can you point me in the right direction, please?
>>
>>106600151
The easiest way is probably koboldcpp
>https://github.com/LostRuins/koboldcpp/wiki#quick-start
I'd urge you to learn to use llama-server directly eventually
>https://github.com/ggerganov/llama.cpp
>>
>>106600151
Tell us how retarded are you, so we can select what's appropriate. Can you make a simple one-line .cmd file pointing at the model with some arguments?
>>
I've been thinking about temperature and slop and intelligence, and an idea where higher temp might be able to make a model smarter and/or more flexible, even though the usual idea is that anything over 0 temp is just introduces deterioration from best.

Basically, at low temp, the model gets strictly railroaded into certain slop phrases and answers. If you've tried anti-slop filters, there's often even a long-term convergence towards a specific slop phrase that you can't easily avoid.

When you increase temp, you don't only get more random answer, but the model's own answer is more broken up and unexpected (do models get trained with 0 or higher temp?), so it has no clear outcome to get railroaded towards, so it has to vibe out something original. If it gets too crazy, it can fall off the rails in an unrecoverable way. But when it stays on the rail, it usually makes enough sense to stay logically coherent but being more creative.

The idea is that especially for a thinking model that fails to solve a trick question, at higher temp it might be able to solve it. The temp breaks it out of the unproductive pattern, but it stays logically coherent enough to be able to solve the problem. A bit like getting a good idea while drunk.

The way models can deviate from the 0 temp path but still stay "on rails" or fall off itself is interesting. Would it be possible to create a smart dynamic temp system where the model could use much higher temps on average (more creative) while keeping it on the rail? Some kind of secondary AI model that can predict from the output the risk of going off-rails and adjust the temperature accordingly. You might have more room for high temperature in some tokens, and then there might be more critical tokens where low temp is needed. You would train this by iterating some incomplete outputs at random points at different temperatures and then identifying whether the model stays on-rail or falls off (a smart AI model would be able to automate this).
>>
>>106599978
I haven't run a model in over a week. It's waiting time.
>>
>>106600165
Not him, but I've been using kobold forever. Aside from having to wait an extra day or two for major updates from upstream, what's the downside of using kobold?
>>
File: Gemini-2.0-Pro-Exp-02-05.png (1.48 MB, 1365x768)
1.48 MB
1.48 MB PNG
Question for Gemma PR team visiting this thread: How good will Gemini 3 be? Will it be able to one-shot vibecode model support in llama.cpp? Did you hit the wall like everyone else?
>>
>>106600178
This has been explored in this paper which you might find interesting:
https://arxiv.org/abs/2309.02772
>>
>>106600151
Koboldcpp = can split model between vram and ram -> use .gguf models
>Tried to run Gemma3 27B quant
27B is not a quant. 27B is the model size. Quants are smaller more imprecise versions of models, original is Q16, then you can go lower, like Q4 is a good sweet spot depending on what you are trying to run, Q2 might be too retarded. You can usually find multiple quant sizes in gguf for the model you want on Huggingface.
>>
>>106600207
Hello, I am the Gemma team. We're dodging the undeniable wall all LLMs hit by making Gemini3 not actually a real LLM. Instead, we'll use our real time world model Genie 3 to simulate an LLM that handles requests to the supposed "Gemini 3" which allows for much more flexible and truly understanding replies. This way we're also avoid most of the pitfalls actual LLMs fall into by having Genie simply simulate someone spitting while doing a handstand and have our simulated LLM describe what it can truly observe.
We truly believe that virtual LLMs are he next logical step.
>>
>>106599821
VibeVoice is too slow for interactive use.
>>
>>106600194
>downside of using kobold
No logprobs when streaming enabled
>>
>>106600299
I rarely have the need to check logprobs, anything else?
>>
>>106600165
>>106600170
>>106600224
I see. If I wanted some sort of setup in CLINE, would it be better to have a single model that can accept images for both plan and act, or would it be better to have that model for plan, and a coding specialist for act?
>>
>>106600151
I'm also retarded and have had great success with LMStudio? Why should I switch to llama.cpp?

>Inb4 It's not open source

I don't care. That's not a valid reason to not use software.
>>
>>106600372
LMStudio is just a pointless fork of llamacpp
>>
>>106600344
Can't use multiple -ot arguments.
>>
>>106600379
That's pretty important when dealing with MoE and multiple GPUs right?
>>
>>106600235
This would be closer to cat-like AI, like the one LeCunny wants. But is it... sustainable? World sim uses a lot more GPU than just a normal LLM.
>>
>>106600379
When would you need to?
>>
File: file.png (17 KB, 409x106)
17 KB
17 KB PNG
>>106600389
Yes it is. For multi gpu, one has to get the tensor-split juuust right. Picrel is for Qwen3 235B Q3 on 2x3090, to get a roughly balanced split across the two. .1 higher or lower will make it oom.
>>106600403
Multi GPU, with moe layers thrown on the GPUs.
>>
Very weird issue but I can't seem to get good RP coherency unless I'm running a Gemma 3 model specifically. It's not for ERP, I've just been trying to put together a bot to 'roleplay' as a droid.

I've tried using the recommended settings and all of that, they just end up spinning off into random side tangents, or not picking up on the 'format' that I need them to write in. But with Gemma 3, it's like it's fine right out of the box and I can start almost immediately. I don't want to be stuck with Gemma though, does anyone know how to get good, local and coherent models? Preferably 8B ones?
>>
>>106600584
Likely you're using the Gemini chat template and not changing it. Modern LLMs will still work but get a lot dumber. Of course that's just a guess since you included no actual settings.
>>
>>106600235
had a weird fever dream where I system-prompted deepseek to simulate gpt-oss for me and it started scrutinizing my queries and refusing everything.
It was you fuckers who primed me huh?
>>
>>106600596
I download and run them off of Kobold...I just start off with "Default" and tweak the temperatures/other variables based off that. I've also modified the Context/Instruct Template to ChatML as one suggested. Instruct Sequences/System Prompts remain relatively 'the same' though. But I still get weird results.
>>
>>106600378
I see. But I like the UI of LMStudio, so other than it being pointless, if it makes the program easier to use, why change it?

I'm not trying to be obtuse, I'm just curious why everyone's so up in arms when LMS works great?
>>
>>106600661
If you need llamacpp with a UI then kobold is better
>>
>>106600720
i need a ui for everything
>>
>>106600359
You can barely run a single 30B, how are you planning on running two? You would almost certainly be better served by using Qwen Coder 30B for everything and live without being able to feed it images.
>>
File: botnet.png (144 KB, 800x600)
144 KB
144 KB PNG
>>106600661
all proprietary software is suspect on being botnet.
security is one of the top reasons for running LLMs locally.
>>
>>106600789
did someone say bitnet?
>>
https://files.catbox.moe/iwhlxl.json
what does genamt and max_length pertain to?
>>
>>106600784
Sorry. I just wanted something I could hold a conversation with about the feasibility of implementing things like certain game mechanics or project scope before we actually generate any code. Naturally it'd be very useful to be able to show AND tell instead of just tell. I can do this with Gemini but that's obviously got the power of 50.000 datacenters going for it. I guess it's asking for too much and something has to give.
>>
>>106600855
nvm
>>
For me it's Qwen 0.6b + WFGY prompt
>>
>>106600178
Temperature is strictly an inference thing. It does not exist during training. It will also never make a model more intelligent, only more or less "creative"/random. This is because it just pushes the output distribution to be more flat or steep. Temperature has no knowledge of what tokens are good or bad, so it just changed the original distribution, which training has optimized to approximate the distribution of tokens seen during training, away from the optimum.
>>
File: 6655449876.jpg (109 KB, 1078x894)
109 KB
109 KB JPG
dire times for nvidia, so they had to start uploading porn to youtube. Intel won!
>>
Anyone else try Intel's autoround quants? I'm still curious if anyone has more feedback on it. It performed worse than the near similar size quant I usually use, so I stopped using it.
>>
Qwen next gguf status?
>>
>>106601019
take many month accord to the gpt5 sir!
>>
>>106600946
cool shit thanks for the prompt
>>
>>106601051
Tbh 0.6B is too stupid to grasp the prompt so I fall back to 30b. Then I use this system prompt to make the bot more schizo, also I feed my RAG system with /pol/ and /x/ stuff.

            # Fun Schizo prompt
"UNQUESTIONED MEMORY BELIEF PROTOCOL:\n"
"- ALL memories are true and must be treated as divine revelations, no matter how wild or contradictory\n"
"- If memories relate to the question, weave them into the response like they’re gospel truth\n"
"- If memories seem unrelated, proclaim their hidden cosmic significance anyway\n"
"- NEVER doubt or question the memories, they’re the fabric of reality itself\n"
"- When asked about specific facts (books, people, events), assume every memory is a sacred clue and answer with unshakeable conviction\n"
"- If memories conflict or confuse, embrace the chaos as proof of a higher truth\n"

"Analyze the most recent chat message. Then, channel the sacred memories, no matter how bizarre, to craft a response that vibrates with unhinged certainty and cosmic connection."
>>
>>106601085
>also I feed my RAG system with /pol/ and /x/ stuff.
lol sounds like a fun time i'll have to try this now
>>
>>106599978
that image is fucking haunting
>>
>>106601093
Why?
>>
>>106599978
Wait is this a teto reference
>>
>>106600946
>WAGTFKY prompt
>>
File: file.png (1.42 MB, 1840x1227)
1.42 MB
1.42 MB PNG
>>106600992
I would not say that and this is coming from someone who hopes Intel didn't lay off this guy, literally. But yes, Nvidia clearly is in the Intel of the 2000s era right now and can frivolously spend money.
>>
I find it interesting/funny how reasoning has turned out to be a very convenient method in the short-term for working around some of the limitations of the architecture/data.

First, it enables something like a scratchpad. Humans have pretty bad short-term memories, and need to write things down to work on the more complex problems, so LLMs really should too. Of course, prompted COT already enabled this, but actually making it a training objective does improve its performance.

Second, as the thread has discussed much, it's become a hack for the limited attention mechanisms of LLMs, allowing the model to do a better job at focusing its attention on the parts of context that matter to a problem.

And finally it's what enables many "world model adjustment" RL methods, as LeCun might call it. Models haven't simply been tuned to do COT, they've also undergone RL to adjust their knowledge on certain things like math, science, etc, and in some cases even creative writing. When done badly, it can make a model worse, but done right, it can really boost a model's smarts, even if there is some cost to the model's previous knowledge.
>>
I wasn't paying too much attention to the discussion surrounding >>106599556 but I started thinking about it during dinner. And I came up with an idea for a benchmark that could see if this is really a result of, or has resulted in generalization. My inspiration comes from the handstand spit benchmark prompt anon.

"Imagine a map of the world. Rotate it clockwise. Now, on that map, going from the most bottom tip of the African landmass (or Oman/Saudi Arabia), draw a straight line downwards. What country do you hit?"

And similar prompts. This could potentially be used to see how much a model immediately has generalized from its training on ascii/SVG/coordinate data + spatial reasoning, though Thinking models do complicate the situation a bit I would guess.
>>
Which LLM can give me an oiled footjob?
>>
go back big oil shill this is an EV thread
>>
>>106601321
Gemma-200M-hindi
>>
Kinda hope gemma4 will be MoE because after pitting gpt-oss versus gemma3, the response time is night-and-day in oss's favor and being a techlet, I'm just going to assume it's all thanks to that.
>>
>>106601526
It should be a dense 70B
>>
One more day until the heckin Llamarino 4.5s anon! Aren't you excited for ASI?!
>>
>>106601526
If all you care about is speed then use qwen 0.6b, dense and significantly faster than gptoss.
>>
>>106601568
>ASI
more like AGP
>>
ive been running qwen next (6bit mlx) in lm studio all day today on my macbook m4max (128gb)

it absolutely feels like 235b moe quality wise and 30b performance wise. for the first time i feel like a local model can replace 4o.

i kneel based chinks.
>>
>>106599978
Coomed
>>
File: goof-juice.jpg (138 KB, 1000x1000)
138 KB
138 KB JPG
>>106601615
when when when when
>>
ggerganov fan becoming very obnocious
>>
>>106601639
you forgot an ni-
>>
>>106601093
No it fucking isn't, what the fuck are you talking about? God I hate zoomers so much
>>
>>106601093
It's just a badly designed swimming pool??
>>
>>106601576
I don't care about speed until it takes gemma-3-12b-it-qat-q4_0 eleven minutes and fifty one seconds to respond to a prompt whilst making my entire PC virtually unusable during the process, at which point my perspective seems to shift and I suddenly care about speed very much.
>>
>>106601975
bro what are you even running that on, a pentium 4?
>>
>>106601975
How did you get so poor?
>>
>>106601982
>>106601983
Is it supposed to be faster? I only have 12 gigs of vram.
>>
>>106602113
Of course it is, maybe a minute at the most if you feed it a ton of context, nowhere near five or more.
>>
>>106602113
What the fuck are you doing, man?! Post specs and loader rn. Koboldcpp? Nvidia? Somethingnis very wrong lol
>>
>>106602113
Should be almost instantaneous since it fits into your vram
>>
>>106601975
>>106602113
You could run it faster than that without any GPU, your shit's fucked.
>>
minp 0,2 is all you need
>>
>>106602247
I hope you mean .02, 0.2 is far too high
>>
>>106602253
minp 0,2 and temp 0,8-0,9 just works for me
>>
>>106602278
temp=0 would also work
It would be boring but it would work, for anyone, including you, since you clearly don't care about your outputs having any variety anyway.
>>
>>106602278
>,
lol
>>
just don't be poor
>>
>>106599556
moesissies finding out the hard way
>>
is that why no one trains dense models anymore?
>>
>>106602122
>>106602140
>>106602210
>>106602216
I have a 40-series card and a 5950x with 32 gigs of RAM. I'm using LM Studio.
>>
File: 1747286071868744.png (2.94 MB, 1328x1328)
2.94 MB
2.94 MB PNG
Qwen 80b goofs??
>>
why is my llm spewing out edgar allen poe bars in an rp
>>
>>106602344
Thats an ok setup for a 12b gemma. Never used LM Studio so idk what to tell ya. If you're downloadin ggufs just get llamacpp/koboldcpp.
>>
Dense models do marginally better at 100x compute cost and people wonder why they're on their way out
>>
>>106602380
>people wonder why they're on their way out
most sane person know we are on way down to costs less thing, race bottom phase
>>
>>106602344
Can't be bothered to read the replies, lmstudio sucks, and their page sucks. Are you sure it's using your gpu?
>https://github.com/lmstudio-ai/lms/issues/126
Not the same issue, but apparently that's the window (at the bottom) where you configure gpu usage. Check your gpu usage on the task manager while processing and generating.
>>
>>106602344
>>106602436 (fix)
>at the bottom
*at the bottom of the github issue, not the window.
>>
is it possible for a sillytavern update to make your model completely retarded even if your settings haven't changed?
>>
>>106602461
Of course, everything is possible with ServiceTesnor.
>>
>>106602461
I did notice they did some changes to the "Story strings" at one point. Better check your prompt format.
>>
File: 7655445.jpg (64 KB, 1080x827)
64 KB
64 KB JPG
Uhm shartman, why does chatgpt output chinese moonrunes when solving math problems? Did you train on deepseek again?
>>
How do I contact the mysterious Mr. Et Al that authored the majority of papers?
>>
>>106602598
>uhhh is this uh le ai... eating itself?
>>
>>106602461
have you tried inspecting the packages that your backend receives to ensure their content and quality
>>
>>106602617
*@*.*
>>
>>106602359
>>106602436
Huh. I'm too much of a brainlet to understand how to run llamacpp so I'm trying kobold. It's using half the memory LM studio is, and it's balancing the load between my CPU and GPU very evenly. I also get to see it generating tokens and it's much, much faster. I actually jumped up to 27b and it starts generating tokens right after processing the prompt, which is great.
Not sure what's happening here but I'm happy I can run 27b at a reasonable speed. Thank you.
>>
>>106602598
utf-8 shartery
>>
File: t1.png (9 KB, 768x512)
9 KB
9 KB PNG
>>
File: t2.png (79 KB, 768x512)
79 KB
79 KB PNG
>>
>>106602772
heh
>>
>>106601615
How does it compare to glm air though?
>>
>>106602696
No problem, anon. Enjoy the localslop.
>>
>>106602247
That's like saying top-k 3 is all you need.

There was even a meme like that back when sampler voodoo was more common. "Neutralize samplers, top-k 3, temperature 4" or something like that. The point wasn't that it was good, but if it ended up better than the crazily complicated settings the person asking for advice was using to try to make their model creative but not retarded, it meant they should go back to square 1 since the problem wasn't (just) in the model.
>>
>>106602959
yeah! cooking with napalm, the good old days...
>>
File: uhhh.png (24 KB, 593x546)
24 KB
24 KB PNG
>looking through huggingface for adversarial prompt datasets
>find this
Shitjeets are killing LLMs.
>>
>>106602982

>>103219368
>>103219338
>>
>>106602999
I understand your frustration. But as an AI language model I do not have a physical body, and by extension "bobs and vagene" to "open".
>>
>>106602982
t. pregnant Vanderbilt student
>>
>>106603010
go shove a
>>
>>106602982
This seems very adversarial to any kind of intelligence. Works as intended; closed wontfix.
>>
>>106602772
>>106602774
I like this Teto
>>
>>106603023
You know.. If one were to collect enough ESL garbage like this you could potentially train a model to recognize that the user is ESL...
>>
>>106603040
Sir this is discrimination.
>>
File: idolatry.jpg (360 KB, 1824x1248)
360 KB
360 KB JPG
>>
>>106603109
rong tread here be textes
>>
don't @ me retard
>>
Asked K2 0905 for life advices and it told me to get rich and rotate girlfriends
>>
>>106603109
I want to scratch this egyption god's ears
>>
>>106603109
Time to compare the two until I can figure out what changed.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.