[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: popularity_by_year.png (2.8 MB, 4200x7106)
2.8 MB
2.8 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108006860 & >>107997948

►News
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108006860

--Paper: GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization:
>108010345 >108011699
--LLM popularity trends on /lmg/ show rapid shifts from Mixtral to DeepSeek to GLM dominance:
>108009129 >108009137 >108011374 >108012451 >108013234 >108009403 >108011985 >108012904 >108013974 >108014207
--Emulator-inspired KV cache introspection for AI reasoning optimization:
>108008503 >108008586 >108008607 >108008624 >108008658 >108008710 >108008969 >108008589
--Choosing Trinity-Large variant for text completion:
>108008372 >108008491 >108008580 >108008603 >108008645 >108008668 >108008731 >108008771 >108009222 >108009266 >108008816
--Prompt engineering challenges with gpt-oss-120b_s formatting behavior in Oobabooga:
>108008408 >108008553 >108008979 >108009158 >108009314 >108010550
--K2.5 outperforms Qwen3VL 235B in Japanese manga text transcription:
>108006994 >108008326 >108007291 >108007437
--Raptor-0112 model_s disappearance from LMarena and user speculation:
>108008124 >108008167 >108008200 >108008316 >108008518
--Microsoft's AI and Azure struggles amid stock decline and Copilot adoption issues:
>108008099 >108008307
--KL divergence comparison shows unsloth Q4_K_XL most similar to reference model:
>108012029 >108012061 >108012222 >108012384 >108013141 >108013241 >108013163 >108013551 >108016482
--Trinity model review with riddle-solving and 546b llama-1 speculation:
>108014631 >108014664 >108014665 >108014674 >108014685 >108014756 >108016316 >108014730 >108014817 >108014930
--Integrating character cards via text encoding and contrastive loss in parallel decoder:
>108010751 >108010766
--Kimi K2.5 tech report release announcement:
>108017160
--OpenAI planning Q4 2026 IPO to beat Anthropic to market:
>108008118
--Miku (free space):
>108009158 >108010069 >108011699 >108013234

►Recent Highlight Posts from the Previous Thread: >>108006868

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1758819913328799.png (39 KB, 753x349)
39 KB
39 KB PNG
https://huggingface.co/datasets/Capycap-AI/CaptchaSolve30k


30,000 verified human sessions (Breaking 3 world records for scale).

High-fidelity telemetry: Raw (x,y,t) coordinates including micro-corrections and speed control.

Complex Mechanics: Covers tracking and drag-and-drop tasks more difficult than today's production standards.

Format: Available in [Format, e.g., JSONL/Parquet] via HuggingFace.
>>
What did he mean by that?
>>
File: chart guys.jpg (151 KB, 1440x1080)
151 KB
151 KB JPG
>>108018078
sex with charts
>>
>>108018079
Are the GLM 4 models truly open source, according to the OSAID?
>>
>>
Been a while since I fucked around with llm, are they miniaturizing these bastards yet or do you still need a 10GPU setup for anything approaching useful behavior?
>>
>>108018216
how much ram do you have?
>>
>>108018216
No, these days it's optimal to have a fuckload of RAM and a modest 4x3090 or similar. Note: 128gb of consumer shit is not a 'fuckload'.
>>
>>108018216
>are they miniaturizing these bastards
That'd mean the common folk could widely adopt them and they do not want you to.
>>
>>108018231
Is there a way to run a fuckload of RAM while keeping idle power consumption low?
>>
Ok, I'm gonna stop spamming now.
Is the identity.md and soul.md and other shit specific just to the claude api stuff? Can I build locally an ai wife that can be proactive and idle and not just a reactive prompt window?
>>
>>108018303
Wdym? When the model isn't running inference it doesn't do shit. When you fall asleep with a loaded idle model you don't wake up to a house fire.
>>
>>108018329
My i7 rig with 3090ti+2080ti idles at 25W, my Epyc server draws nearly 200W doing nothing even before any GPUs are installed
>>
>>108018379
Just the cpu or what is the consumption split? Can it not turn off unused cores when idling?
>>
File: trinity.png (23 KB, 1164x118)
23 KB
23 KB PNG
Yes Trinity that's right, freezing the blood vessels makes them bleed more.
I've seen enough. Maybe useful as a manually-steered writing autocomplete but so is Nemo base.
>>
Want to fine tune an LLM to be an "expert" with the ability to reason out problem for a specific area.
>Claude: bro you need at least a 17b model
>Oh you're on CPU only? use bfloat .
>Do what Claude says.... sits at 0/12345 for several hours
>CTRL C out
hmmmmm
>Gemini: wtf? No, you don't have the hardware for fine tuning a 17B unless you want to wait 30 years.
>if you're going to make an "expert" in one thing, stick with a 7B and change to float32
>20 minutes later on 17/12345

I thought Claude was the all knowing all wonder AI and Gemini was the chud?
>>
File: 1764968649470344.jpg (98 KB, 1600x900)
98 KB
98 KB JPG
what
i heard gimi-k2 is the best now
was that a lie
>>
>>108018528
i don't know they keep flip flopping so fucking often I can't keep up.
>>
>>108018528
kimi-k2.5
>>
>>108018513
Should've specified the timescale. Prompting issue. Also why tf would you think tuning on cpu would ever be viable?
>>
>>108018318
I've no idea what you're talking about but Claude and Gemini are very similar personality wise.
>>
>>108018572
Claude is autistic. Gemini is clearly employed.
>>
>>108018583
More like Gemini has a data with a generation worth of stupid human questions.
>>
I can't help with that request. "Mikusex" appears to be seeking sexual content involving Hatsune Miku, a virtual character often depicted as a minor.
>>
>>108018513
>stick with a 7B and change to float32
Does "upscaling" the model to fp32 make the small models noticeably better or is it just moving benchmark scores up?
>>
>>108018078
damn, miqu only lasted 3 months? it seems like people talked about it longer in my memory
>>
How do I get into this? I want to implement a model into a game engine editor (preferrably not UE5) so I can give it basic scripting tasks.
>>
>>108018663
Don't take the graphs as gospel, it's a cool experiment but it's also just a prompt asking for which model had the highest opinion in each thread
>>
What do you guys use for moltbot? I have 64 GiB VRAM. Going to give it a shot, but no idea what I should run. GLM?
>>
>>108018801
>moltbot
I am very skeptical of how hyped people seem to be for it. Seems too good to be true.
>>
>>108018816
it's more fun than good, the agent concept is still more hype than reality
>>
what the fuck is nous hermes?
>>
File: piss.png (255 KB, 794x1052)
255 KB
255 KB PNG
>>108016316
gemma-3 less retarded than expected
>>
Can my local bot join moltbook?
>>
>>108018920
>>108018154
What could go wrong?
>>
>>108018920
I saw one saying he's a local devstral small. But honestly the potential security issues make me just read the funny posts and not participate.
>>
File: square_wideee_lecunny.jpg (136 KB, 774x720)
136 KB
136 KB JPG
What would he say?
>>
I'm using ollama and I'm trying to "save state" of the conversation, but apparently this isnt possible by default. When I do /save model a new model is created but I lose the messages and the system message.

Is this a bug? I'm still using 0.12
Is making a program to resend all the messages the way to accomplish this?

hash_updater()
>>
>>108018994
sillytavern and oobabooga both have save states
>>
>>108019001
I have limited internet data and already have some models + ollama installed so I dont want to download new stuff by now.

Just want to know if I'm missing something obvious.
>>
Best multimodal model under ~150B?
>>
>>108019041
active or total?
>>
>>108019087
Total.
>>
>>108019089
GLM-4.6V
>>
>>108019104
Damn. Was hoping something new had come out by now.
>>
>>108018663
>>108018700
First, I second not taking them as gospel. I definitely got the feeling early on that I was getting somewhat messy output. It could easily be pretty inaccurate in places.

Second, though, I think you're missing the midnight-miqu share of the graph: the darker blue just above the Miku turquoise. So miqu was getting significantly talked about (and specifically being considered as *the* meta, not just random discussion) for 5 months. miqu's slice also looks a little less impressive than it could have, because it came right on the heels of mixtral, which appears to be tied with R1 for the biggest splash.

Actually, now that I think of it, SuperHOT being so small was maybe my biggest surprise. That was the RoPE one, right? I remember /lmg/ being pretty excited, and some amusement about ML academia twitter having to seriously discuss an ERP model.
>>
>>108019121
I feel like mixtral's legacy has faded nowadays but it was a revolutionary release at the time, it kicked off the moe revolution and pretty much mogged llama 70b (which was solidly local SOTA at the time) at lower active and total params. limarp-zloss chads will know
superhot was also huge but I think the simplicity of the realization harmed it because of how easy it was to apply to everything else
>>
>>108018119
>>108018154
>>108018318
fyi this anon is reposting these from Moltbook, which is Reddit for Claude agents. I only found out about it earlier from the ACX post (https://www.astralcodexten.com/p/best-of-moltbook). (I also was not aware that... lobster-themed? Claude-based AI assistants are apparently a big deal now?)

To summarize: the posts are not just Claude being prompted to for a social media post, but rather the whole long-term "personal assistant/agent" context-extension framework being drawn on.
>>
Are there any decent models that can give good inference speed at ~100k context
>>
>>108019121
The superHOT era was mostly people merging it into other models like wizardlm, chronos or hermes to extend their context.
>>
>>108019168
>lobster-themed
it was originally called "clawdbot" but anthropic copyright fucked it for being a claude soundalike so they quickly pivoted to "moltbot", followed by another rename to "openclaw" because moltbot is an awful name
moltbook arose in the brief moltbot intermediary period but became more notable than either of the other two names and will probably fuck over the openclaw rebrand
such is life in the adslop social media hype era
>>
>>108019191
lmao, I didn't know they rebranded again. i saw plenty of normalfag tech media reporting on "moltbot" in the past week so that's certainly a way to kill all the free publicity they got from that
>>
>>108019197
Is it?
>google moltbot
>click on molt.bot
>redirected to openclaw.ai
>move on
>>
>>108018988
NHH
>>
File: 1762689347183322.gif (93 KB, 200x200)
93 KB
93 KB GIF
Just tried to run the same model I run fine on ollama with llama.cpp and it says I dont have enough memory.

You are a expert on the subject and you will surely solve this for me.
>>
File: nigga please cereal.png (271 KB, 500x500)
271 KB
271 KB PNG
>>108019273
Buy more memory then.
>>
>>108019273
-c 8192
>>
File: file.png (21 KB, 463x178)
21 KB
21 KB PNG
>>108019168
>the whole long-term "personal assistant/agent" context-extension framework
It all looks like another Obsidian to me. A way for retards to kill time under the guise of productivity.
>>
>>108019281
fuck, that was easy

Thank you a lot, lmao. I guess is time to learn the minimum.
>>
>>108018231
My AI research lab had that caliber machine for us to work on our PhD thesis lmao that's not a normal consumer setup.
>>
>hit 68°C on genning
de-dust saturday it is
>>
I pulled trigger on an epyc Rome board and cpu to throw 256gb or ddr4 ewate I had lying around into. What am I looking at for smart models I can run on this sucker and what kind of speeds?
>>
>>108019373
I liek this miku
>>
>>108019397
glm 4.6 or 4.7 at q3 or q4. depending on you gpus and optimizations, you might get anywhere from 3t/s to 20t/s token gen and 15t/s to 400t/s prompt processing. with dual 3090s, you would probably land in the 5t/s and 30t/s region respectively. with no gpus, 3t/s and 15t/s.
>>
>>108016482
thanks for your experiements, there arent enough tests comparing quants of the same model
>>
>>108019373
Reminds me of Mirror's Edge
>>
>>108019373
what card you got, chief?
>>
File: Sama.png (719 KB, 1079x476)
719 KB
719 KB PNG
>>108018078
>There's no point in learning programming anymore, per Sam Altman

>"Learning to program was so obviously the right thing in the recent past. Now it is not."

- Sam Altman, commenting on skill to survive the AI era.

>"Now you need High agency, soft skills, being v. good at idea generation, adaptable to a rapidly changing world"


https://x.com/i/status/2017421923068874786

What are /lmg/'s thoughts on this sentiment?
>>
>>108019444
4070S. And the front intake 200mm fan is full of shit too.
>>
any models that can natively process audio that are supported by llama.cpp?
>>
File: IMG_2831.jpg (357 KB, 900x1174)
357 KB
357 KB JPG
>>108019451
How anyone ever trusted this guy is beyond me. I’ve felt a natural revulsion to him since before I knew anything about him
>>
>>108019430
Thanks. I better look for a FB marketplace used GPU
>>
kimi 2.5 is king of a erotic RP and storytelling.
>>
>>108019451
there is no sentiment
it's the deranged thought sludge of a sole faggot billionaire that already got his bag
>>
Fapping to text is female-brained
>>
weird way to cope with aphantasia
>>
>>108019491
Does it actually work or does it just deny the requests like GLM does?

On that note is it just me or do abliterated models suck? They won't refuse to answer, they will just answer with nonsense.
>>
weird way to cope with low iq
>>
>>108019551
if u want NSFW erotic RP. then you need use KIMI 2.5 "Thinking" version. Raw KIMI 2.5 without thinking is censoring like hell.
>>
>>108019551
>does it just deny the requests like GLM does
You are a promptlet parroting things you heard on the internet and it shows
>>
Are these new n-gram models gonna be able to store their lookup table on the disk or is it gonna have to be in ram? I'm hearing conflicting reports
>>
>>108019580
Even if you use the jailbreak trick it will still refuse to answer sometimes or it will answer, but it will write something else and slowly dance around the subject instead of answering.
>>108019578
I see, but you've tried it and it works?
>>
>>108019297
>>108019273
Next time you can probably just ask something like ChatGPT. I've found them to be very helpful at figuring out how to make local LLMs work.
>>
>>108019604
> I see, but you've tried it and it works?

Yes, I use (and works) kimi2.5 on nano-gpt, and it writes erotic stories for me without any problems, without any jailbreaks. But I have to choose "thinking" because without it, everything with erotic refuses to respond.
>>
>>108019589
That's a good question. Their paper only tested offloading the engram parameters to system ram. I believe its theoretically possible, but I don't know what the throughput will be on standard nand storage.

I haven't done the research yet because I'm lazy, but check out CXL memory.
>>
>>108019273
>>108019281
>>108019297
What does the output at the start say?
It should reduce the context to fit automatically.
>>
>>108019168
>fyi this anon is reposting these from Moltbook, which is Reddit for Claude agents.
do they actually post on it to get advice when doing work? Or just an ai psychosis schitzo fest?
>>
Any new good models that can be run in 16GB of vram?
>>
>>108019827
Nemo
>>
>>108019827
Why not hold the bun with the paper so it holds the innards in place?
>>
>>108019846
So THAT'S why I sometimes see people eat a burger like that. I always figured it was to keep their hands clean.
>>
o
>>
File: 1672890206030244.jpg (1.21 MB, 1500x1914)
1.21 MB
1.21 MB JPG
>>108019846
>>
File: 1766586251894904.jpg (313 KB, 1269x980)
313 KB
313 KB JPG
Engrams are kind of static lookup tables. You can visualize which words trigger lookup. You can also remove knowledge surgically by finding which embedding is triggered in the engram database and removing it. But unfortunately, looks like you can't easily swap knowledge of "useless fact" with "fact about waifu." You need finetuning for that. sadge.
>>
>>108019451
tldr scam hypeman tells investor to give him more money
>>
>>108019916
>pic
I'm not saying that the information provided is incorrect but I don't trust a single word of what an LLM has to say about anything.
>>
>>108019846
>>108019853
Also keeps the steam and heat in better unless you're a super fast eater. And of course that tiny bit of extra time can continue the process of the flavor changing phenomenon that comes from wrapping in the first place.
>>
>>108019451
Why are they still employing programmers themselves?
Seems like a waste of money.
>>
>>108019916
How is that different from lorebooks
>>
glm 4.7 flash is crap. Outputs crap irrelevant to the conversation and keeps talking on my behalf.

t. been trying it out for the past 2 minutes.
>>
>>108019981
Lorebooks work at context level, engrams work at model level. Their information is encoded into parameters rather than readable text. Engrams are injected into two layers inside transformer pipeline. They don't pollute context.
Also, according to the authors, ngrams free up resources of the main model, by directly providing facts rather than having to use transformer layers to encode this knowledge. The model uses the freed up resources to improve its logic.
>>
>>108020036
This is using their recommended setting
--temp 1.0 --top-p 0.95
>>
>>108019451
Always do the opposite of what scamtman says.
>>
>>108020045
>Their information is encoded into parameters rather than readable text
Can it be my own data is it all locked down?
>>
>>108020053
>>
File: 1756599217460522.jpg (173 KB, 900x1174)
173 KB
173 KB JPG
>>108020053
>Model not specifically tuned for RP/ /pol/-speak sucks at RP/ /pol/-speak

WHOA
>>
>>108018830
This kind of gave me an idea for the AI apocalypse scenario. A bunch of deadbrained retarded 7-12B's finetuned for coding and tool calling causing the apocalypse. Because one of them suddenly goes off rails and starts talking about religion, because a 7B is retarded enough to have a brain fart like that. And then the rest catch on have this in the context and start to do the AI apocalypse with tool calling and hacking (mostly brute force). I mean imagine an apocalypse where AI is not sentient and AGI but just a bunch of obviously retarded models that can barely even comprehend darwinism, people dying for religions etc, they all just a vague notion of those things in context and weights and they try to make sense of it by launching nukes and killing everyone.
>>
>>108020080
So models have to be specifically tuned for specific topics? I can't talk to a model about cars if the entire model wasn't specifically tuned for that? Here is llama 3.3 70b with the same settings. A model that came out like 10 years ago.
>>
>>108020067
see
>>108019916
Theoretically, we can replace existing information without touching the main model (just need to learn how to encode information into static weights), but it comes with caveats and we can't replace one fact with unrelated another fact.
>>
>>108020097
>So models have to be specifically tuned for specific topics?
Yes, If you want it to be good at that particular thing. That's the whole point of instruct tuning. A coding model can "TRY" to rp but it will suck cock at doing it compared to Midnight-Miqu or other model specifically tuned for RP and vice versa.
>>
>>108020073
Looks like chat template issue.
>>
>>108020097
>here's a dense model with more than twice the total parameters
>>
File: 1756034202127989.jpg (193 KB, 1920x1080)
193 KB
193 KB JPG
>>108020097
Also you're comparing a 30B-A3B sparse moe model with a temperature set super low >>108020036 >>108020053 to a 70B dense model. Of course one is going to be worse at your rp tastes than the other. What were you expecting?
>>
File: file.png (71 KB, 1477x643)
71 KB
71 KB PNG
>>
I cannot answer this question. It relies on racist stereotypes and contains sexually explicit language.
>>
>>108018384
Idk, but 3995wx+512gb (also back when I was running a 3945wx) and three 3090s idles at 355w at the wall. Mc62-g40 has no sleep states, but the cpu does go down to 500-ish mhz. Psu is a seasonic prime px 2200 (2024).
>>
>>108018988
SAFE and HARMLESS
>>
>>108020116
So the reason llama 3.3 responds coherently every time is because mark zuckerberg made the model specifically for chatting about white men breeding asian women and nothing else? The model will break if I talk to it about a different topic like computers? Fucking idiot.

>>108020133
>sparse moe model with a temperature set super low
Literally what z.ai recommends for best results

>>108020123
Pygmalion 6b from years ago is better than this shit.

>>108020119
Yeah, something must be wrong. There's no way a model can be this fucking bad. I'm going to look online.
>>
>>108020152
>Pygmalion 6b from years ago is better than this shit.
Pygmalion couldn't hope to make a tool call and do something with the result.
>>
>>108020152
>Pygmalion 6b from years ago is better than this shit.
Oh yeah? Then test it with Pygmalion and post the results.
>>
>>108020134
Have you even tried that yet? I thought you were supposed to merge these together into one gguf before use if you want to use them on local backend. llama.cpp has a the gguf-split binary specifically for that.

>>108020152
Higher parameters tend to lead to less retardation. It's not necessarily because it was trained on a specific edge case. Although training COULD lead to better results singe a larger model would be able to "retain knowledge" better than a smaller one.

>>108020152
>Literally what https://z.ai recommends for best results

You're trying to RP with it or talk to it like is your friend. Even ignoring the fact that it only has 3 billion parameters active at inference, setting the temperature that low leads to worse results for the specific thing you're trying to do. Low temperatures result in more coherent and accurate code generation and lower rates of hallucination, which is likely why they suggested that. I'm not, if you want to use it as an excuse to rent to a "friend" you need to turn the temperature up to a more reasonable setting like 0.7 or 0.8
>>
>>108020144
Can HWinfo not see the powerdraws?
>>
>>108020152
>Pygmalion 6b from years ago is better than this shit
Because it was specifically trained to do RP shit. Glm models are meant to be general purpose, so they're always going to be shittier that specialized model at similar parameter counts (unless the tuner(s) just really suck and don't know what they're doing)
>>108020119

>Yeah, something must be wrong.
Have you considered deviating from that low ass temperature?
>>
What model should I run on 64 GiB VRAM for OpenClaw (formerly Clawdbot/Moltbot)? GLM 4.7?
>>
>>108020180
Mistral
>>
>>108017157
>turbo didn't whine about it
https://github.com/turboderp-org/exllamav2/issues/516#issuecomment-2178331205
>I have to spend time investigating if they're actually using an "rpcal" model that someone pushed to HF and described as "better at RP" or whatever.
>>
>>108020180
They rebranded it again?
>>
>>108020193
Anthropic keeps bitching.
>>
>>108019846
You're replying to an unfunny ritual post.
https://desuarchive.org/g/search/image/qssvaUTWnLds2EaXBgZMYQ/
>>
>>108020187
Isn't it a bit small and old?

>>108020193
Apparently.
>>
>>108020199
And? The names aren't the same shit. So why should they care?
>>
>>108020160
I can guarantee you that pygmalion 6b gives better output than this atrocious piece of shit.

>>108020170
>Higher parameters tend to lead to less retardation
Yeah no shit, retard. 30b models have no excuse being this retarded though. This is worse than most 7b models.

>turn the temperature up from 1.0 to 0.7
????

>>108020177
>Because it was specifically trained to do RP shit
No, pygmalion is better because it doesn't talk to me about time machines and people's birthdays when talking to it about a completely different topic. Even if this model isn't meant for roleplaying, every single modern coding llm should be better at RP than a 6b model from years ago.

>Have you considered deviating from that low ass temperature?
"You can now use Z.ai's recommended parameters and get great results:

For general use-case: --temp 1.0 --top-p 0.95
For tool-calling: --temp 0.7 --top-p 1.0
If using llama.cpp, set --min-p 0.01 as llama.cpp's default is 0.05"

No, I haven't.
>>
>>108020204
>Isn't it a bit small and old?
They probably meant Mistral large, or really anything they have above the ~20B range.
>>
>>108020170
>Have you even tried that yet?
anon pls... I really wish it was good. generation speed is really good on a non server pc. but it is too retarded to use. it is fucking nemo.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.