[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: lmg mood.jpg (139 KB, 1216x832)
139 KB
139 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103256272 & >>103248793

►News
>(11/21) Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: GcLLp06aIAAEBJU.jpg (368 KB, 2048x2048)
368 KB
368 KB JPG
►Recent Highlights from the Previous Thread: >>103256272

--Paper: Hymba: A Hybrid-head Architecture for Small Language Models:
>103264040 >103265024
--Papers:
>103264142 >103264271
--Debate on hybrid models vs separate models for AI tasks:
>103264086 >103264117 >103264132 >103264382 >103264396 >103264458 >103264550
--Critique of quantization benchmark chart and discussion of optimal quantization levels:
>103260714 >103260772 >103260883 >103260927 >103260881 >103262523 >103262630
--Unsloth adds vision model support with reduced VRAM usage:
>103261067
--R1 finds serialization problem in large codebase:
>103259678
--OpenAI's deleted evidence in copyright lawsuit sparks skepticism and negligence concerns:
>103257257 >103258483 >103258547 >103258594
--NVIDIA kvpress: 80% compression ratio without significant losses:
>103261925 >103261982 >103262008 >103262600 >103262698
--Local AI transcription tools for English speech:
>103256528 >103256545 >103257215
--Local AI girlfriend setup and conversation limitations:
>103257768 >103258014 >103258042 >103258110 >103258065 >103258157 >103258450
--Anons discuss Tulu 3 Models, a new instruct finetune series:
>103259680 >103259735 >103260672 >103262111 >103262312 >103262391
--Anon tries to adjust Dell 3090 fan speed:
>103259508 >103259624 >103259677 >103259766 >103259810 >103259900 >103259898
--Anon struggles to prevent Nemotron 70B from misusing ellipses, finds solution in token banning:
>103259994 >103260008 >103260047 >103260176 >103263273
--Anon asks about LS3 and Nvidia GPU fan control issues:
>103259803 >103259840 >103259885 >103259915 >103259958 >103260001 >103260019
--AI model responses to a question about making Sharo squirt:
>103256682 >103256751 >103256761 >103256872 >103256989 >103262833 >103257397
--Miku (free space):
>103259181 >103259966 >103260220 >103260446 >103261147 >103265119

►Recent Highlight Posts from the Previous Thread: >>103256368

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103265207
>UOH
ToT
>>
thoughts on the sana release ?
i havent tried it but looking at the license i think the whole promise of efficent inference is a lie or completely gimped last minute the model also uses more memory then it should the fucking 0.6 b model uses 8gb and the 1.6 b 16gb vram though they did say with quanting it will go down so heres to hoping they trained the model in fp 64 or 128
>>
>>103265207
So are you just shilling these tulu models or what
>>
what does it mean when you model keeps repeating the same thing again and again regardless of your prompts. How do you fix that?
>>
>>103265656
It means you touch yourself at night.
>>
>>103265656
it means your setup is completely broken and you're not actually passing your inputs to the model
>>
>>103265958
I see, thanks
>>
fuck is tulu?
>>
ai isn't real it's jsut word associatomn and statistics
>>
https://www.reddit.com/r/LocalLLaMA/comments/1gwyuyg/beware_of_broken_tokenizers_learned_of_this_while/
>How can you tell?
>A model's tokenizer is the tokenizer.json file, and you can tell if a tokenizer is borked by transformers by seeing if it's size is double of the base model's tokenizer size.
>This can happen to any model, I have seen this on many finetunes or merges of Llama, Mistral or Qwen models. So if you are having issues with a model, be sure to check if the tokenizer is broken or not.
>How to fix this?
>Easy. Just copy over the base model's non-broken tokenizer.
>>
File: trans-case-1.jpg (433 KB, 1008x1538)
433 KB
433 KB JPG
Holy f**k level 2 reasoner strawberry 01
>>
>>103266637
Mistral-Nemo-Base-2407: correct 9,3MB
Mistral-Nemo-Instruct-2407: correct 9,3MB

Rocinante-v1.1: correct 9,3MB
UnslopNemo-v1 & V2: correct 9,3MB
Nemomix-Unleashed: correct 9,3MB
MN-12B-Mag-Mell-R1: correct 9,3MB
Crestf411_nemo-sunfall-v0.6.1: correct 9,3MB

UnslopNemo-v3, 4, 4.1: INCORRECT 17,1MB
Crestf411_MN-Slush: INCORRECT 17,1MB
Results from a few Mistral Nemo tunes who's weight I had dl.
>>
Any worthwhile model I can run on my M4 Pro with 48GBs?
>>
>>103266637
How does this affect me as a regular llamacpp user? I never download anything other than the gguf file(s), do they already contain the tokenizer?
>>
https://huggingface.co/AIDC-AI/Marco-o1
https://arxiv.org/pdf/2411.14405
>>
>>103266853
>do they already contain the tokenizer?
Yes, and possibly the broken ones

>Ollama uses GGUF file type. So it depends on which tokenizer was used when the model was converted to GGUF.
>>
>>103266857
Just read that as well, this ain't good
Is there a way to merge a gguf with a fixed tokenizer?
>>
>>103266885
I think the easiest is to redo the gguf with a "fixed" base tokenizer. I don't know if you can edit the tokenizer metadata in the gguf to the level you'd need.
>>
Why is it that every few months a major (and in hindsight quite obvious) bug in the AI ecosystem gets unearthed? And why is it usually the tokenizer?
>>
>>103267004
>usually
More like every time
>>
>>103266637
>>103266757
I just checked Mistral Large
>2407: Tokenizer 1.96MB
>2411: Tokenizer 3.96MB
aren't these supposed to be the same when it's just a minor refresh?
>>
>>103267063
Nah, New large has the instruct tags, right?
>>
>>103267004
AI developers are bad programmers, that's why most of them use python
>>
>>103267080
>AI developers are bad programmers, that's why most of them use python
as a data scientist, I confirm
>>
>>103267074
>>103267063
>instruct tags
That and the 2411 tokenizer is called v7
>https://github.com/LostRuins/koboldcpp/pull/1224
>Create Mistral-V7.json #1224
So I'd say it make sense they're very different.
>>
File: 1707085483936539.png (36 KB, 798x957)
36 KB
36 KB PNG
>>103266637
What exactly is it that 'breaks' inside a tokenizer? The json is just a long textfile that lists all tokens + some other stuff. I don't see what can go wrong here.
>>103267074
>>103267098
I just checked both of them. Somehow the 2411 tokenizer has three times (90k vs 280k or so) the lines because the "merges" section now looks very different with a lot more spacing. Left is the 2407 one and right is the 2411 one. Both are up until the "merges" section basically the same. No idea what bloating it up like that would accomplish though.
>>
File: heyheyhey2.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
HEY HEY HEY! WASO WASO WASO WASO WASO WASUP BITCONNEET
>>
https://www.reddit.com/r/LocalLLaMA/comments/1gx6qyh/open_source_llm_intellect1_finished_training/

Kek at comments
>The first ever OPEN SOURCE model, not open weights but OPEN SOURCE!
>>
>>103267331
>not open weights but OPEN SOURCE
it shouldn't be something to be celebrated, you have to hide your dataset so that you can train your model on good quality data, and not on some copyright free slop, those fucking ledditors...
>>
>>103267346
More importantly, it's far from the first open source model, there was K2
>LLM360 has released K2 65b, a fully reproducible open source LLM matching Llama 2 70b
And quite a few older ones as well who showed their data.
>>
>>103267359
>K2
i remember the cope like "you can uncuck it" and stuff, and literally nothing came of it lmao
>>
>>103266732
woah, this is very interesting, will they release the weights?
>>
>>103267442
it's already here? >>103266856
>>
>>103266856
>by fine-tuning Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improved its handling of complex tasks.
>Qwen2-7B-Instruct
...
>>
>>103267450
thanks
>>
>>103267476
>Even the Chinese forget Qwen 2.5 exists
>>
>>103266856
Yeah this is no Deepseek
>>
is quadro rtx 8000 worth it?
>>
File: translation.jpg (822 KB, 2256x2038)
822 KB
822 KB JPG
>>103266856
>>
>>103267627
Short answer:
>no
Long answer:
>noooooooooooooooooooo
>>
>>103267685
but I can't find cheap 3090s
>>
File: file.png (72 KB, 772x458)
72 KB
72 KB PNG
>>103267331
>The first ever OPEN SOURCE model, not open weights but OPEN SOURCE!
KEK
>>
>>103267701
You heard it here first, if you build your oss soft on multiple machines it's even MORE open!
>>
>>103267697
Wait for the 5090 to come out and hope that gaymers will sell their 3090s.
>>
>>103267697
Define cheap?
A Quadro RTX 8000 costs 3 times as much as a 3090,
>>
>>103267697
Fb market has them for 700-900
>>
>>103267751
cheap as in freshly fallen from the delivery truck.
>>
>>103267346
>it shouldn't be something to be celebrated
true, but it's not they had any other choice in this case, the dataset has to be public for decentralized training
>>
Now that distributed training worked fine it's time to make distributed inference so that we GPU poors can get some gibs
>>
>>103267966
Already exists, for a while too
>>
>>103267331
>All that effort and time
>For a model trained on 1T tokens
It's really over isn't it
>>
>>103267987
I hope somebody pays you to do this all fucking day.
Because if you do it for free you are the most miserable fucking sub-human pile of flesh to ever escape the abortion process.
>>
>>103267978
Kobold horde you mean?
>>
>>103267990
>I hope somebody pays you to do this all fucking day.
>Because if you do it for free you are the most miserable fucking sub-human pile of flesh to ever escape the abortion process.
>>
>>103267993
No
https://github.com/bigscience-workshop/petals
Llama.cpp RPC
>>101582942
>vLLM distributed inference actually worked...
>I got 15 T/s with Mistral Large with 2 PCs with 2x3090 each.
To name a few options.
>>
I'm getting 1.40 tokens/s with Cydonia-22B-v2q-Q3_K_M.gguf on a 3060 with 12gb, are my settings fucked or is this normal?
>>
>>103268060
That seems rather low, yeah
Did you limit the context size to something like 32k? Flash attention? Other programs hogging your gpu?
>>
>>103268024
>Petals
Weren't those the people who made BLOOM
>>
>>103268078
I have flash attention and context length is 8192. I'm retarded, I started using LLMs on my machine less than 24 hours ago and don't know what I'm doing
>>
>>103267990
Anon, if you want to treat 30 people lending a 10B model trained on 1T tokens like it's the second coming of Christ then feel free
But as far as breaking the chain of corpo dependency goes, there's still a ways to go
>>
how do I make macos not send any telemetry so that I can enjoy both high token generation/s and power efficiency with privacy?
>>
>>103268112
Shut the fuck up you retarded piece of shit.
>>
>>103268106
You should get about 5 t/s with this kind of context. Maybe you're offloading too many layers into RAM.
>>
>>103268127
Sorry samsja, didn't mean to offend you
>>
>>103268127
Trvke We Need To Support The First Ever OPEN SOURCE model, Not Open Weights But OPEN SOURCE Y'all!!!!
>>
>>103268060
If you are using Windows: check that the driver setting where VRAM is swapped into RAM is disabled (I forgot what it's called).
If you did not manually set the number of GPU layers your frontend may be trying to set the value automatically; I know that KoboldCpp and ollama have logic like this and to my knowledge the estimates tend to be too conservative.
>>
>>103268124
>apple
>no telemetry
lol. You could unplug the power cable and encase it in concrete.
>>
Ultra censored LLM from applel is coming btw https://x.com/MacRumors/status/1859707331392757812
>>
File: 142142352157469.png (40 KB, 651x1065)
40 KB
40 KB PNG
>>
>>103268362
but it uses emojis more now! great tradeoff
>>
>>103268175
It's called "CUDA - Sysmem Fallback Policy" in the nvidia control panel
>>
>>103268362
>mistal large shits the bed in every category
grim
>>
>>103268362
I mean yeah it's worse but at least they got to top the LMSYS leaderboa-
>>
File: MikuDoesntWantToGetUp.png (1.58 MB, 1232x816)
1.58 MB
1.58 MB PNG
Morning, /lmg/...
>>
>>103268540
it literally does better than a recent 4o release
>>
>>103268606
Good morning to you, Miku
>>
>Ask Project Euler question to write code to solve the problem
>Get this
Thanks DeepSeek
>>
Holy fuck someone buy this https://www.ebay.com/itm/276743259844
>>
File: 1713635130526489.jpg (306 KB, 1672x854)
306 KB
306 KB JPG
>5 minutes to train GPT-2
Are we back?
>>
>>103268750
This is literally benchmaxxing, the condition for a completed training run is just achieving a specific eval score.
>>
>>103268737
Wtf is that real
>>
>>103268727
SOUL
>>
>>103268606
gm betufel
>>
>>103268737
Someone take the plunge
>>
>>103268362
Where is Largestral 2411?
>>
>>103268769
>Must not modify the train or validation data pipelines.
It's the same dataset and parameter size, though.
>>
>>103268784
I got one before they nuked the listing. Any bets as to whether I get it?
>>
>>103268831
Honestly, I doubt it. Looks like a pricing error from the other stuff they sell.
>>
>>103268831
You'll get a box. You'll have some GPU in it if you are lucky.
>>
>>103268831
Interesting. So it was probably legit and the guy accidentally a decimal place.
>>
>>103268831
I don't know how UK law specifically but under German law if they mistyped the price they would now have a legal obligation to actually sell you the item at that price (but they may refuse and you'll need to take them to court).
If it was a scam you'll never get anything.
>>
>>103268362
Qwen that high?
So running it at q4 is the reason why I get so many dumb coding errors…>>103268727
Lmao qwen did something similar yesterday for me, it was too lazy to write a section of code and just made it a “suggestion”.
>>
>>103268926
No, Qwen and other chinkshit just do everything to look good on benchmarks
>>
>>103268866
They offered paypal so it should be pretty easy to get your money back if it's scam.
>>
>>103268945
Regardless if it is better or worse it is still the best coder model that I can run.
I just want to know how to squeeze more performance out of it.
>>
Expect the first two weeks of December to be crazy for local models.
>>
>>103268866
This is false, that law only applies to retail shops
>>
>>103269118
https://www.rechtsindex.de/internetrecht/4542-bgh-urteil-viii-zr-42-14-ein-fahrzeug-fuer-1-euro-schnaeppchenpreis-bei-einer-ebay-auktion
>>
>>103268362
wtf is OpenAI doing, they're getting destroyed by the competition, they can't even beat themselves anymore lmao
>>
>>103269137
einen gutes offen modell zu dir auch guten herren
>>
>>103268737
holy fucking shit
>Located in: ShenZhen, China
I smell hogwash
>>
>>103269217
All talent left and they are hitting the wall of diminishing returns while raking up debt. They made a fucking CoT tune and marketed it as innovation. A fucking CoT tune.
>>
File: itsalive.png (342 KB, 748x977)
342 KB
342 KB PNG
no one seriously believes these things are alive, do they?
>>
>>103269328
My boomer parents do.
>>
>>103269328
Some people think the earth is flat, others think that you can sustain yourself with just sunlight. Believing that some enhanced text prediction model is sentient is one of the less egregious cases of retardation
>>
>>103269328
jewlywood portraying ai this way is to blame.
>>
>>103269328
I had this illusion during the early days of c.ai, but it was quickly gone.
>>
File: 16.png (75 KB, 920x798)
75 KB
75 KB PNG
Looks like INTELLECT-1 is finally done training. From what I can gleam, it should be released in a week and is currently going through post training with something called Arcee AI.
>>
>>103269217
It doesn't exactly help that their talent left (actually it might even make it worse, since all their investors are probably looking to see if they can recover from their exodus lmao)
Still kinda crazy to see how far their lead is slipping. I still remember when OpenAI had GPT-3 and all we plebians had was GPT-fucking-Neo-2.7B
Now DALL-E 3 is mogged by Black Forest Labs, GPT-4o is mogged by Claude in the intelligence department and Gemini in the human preference department, o1 already has a competitor, and Sora is basically MIA
>>
>>103269402
>Arcee AI
That's mergetkit people with Charles O. Goddard
https://github.com/arcee-ai/mergekit
>>
>>103268540
?
Did you misread it thinking they are sorted top to bottom? It beats many top corporate models.
>>
Which local model if I want to try out those new meme IDEs?
>>
>>103269402
Who's going to do the red teaming and rhlf?
>>
>>103269455
Arcee
> Arcee AI empowers businesses to train, deploy, and continuously improve proprietary, specialized, secure, and scalable small language models (SLMs) within their own environments, revolutionizing data privacy and security.

>Their all-in-one system enables pre-training, aligning, and continuous adaptation of small language models.

>This ensures security, compliance, and enhances model relevance and accuracy.
>>
>>103269466
>aligning
kek they're gonna cuck the model, it's ova
>>
>>103269466
>no, goys, you can't have the base model, that's too unsafe for you
>here's (((aligned))) instruct
>>
>>103269402
>alignment
It's DOA.
>>
>>103269466
kek @ whoever paid for an aligned model
>>
>>103269419
>the guy who's responsible for the shitty merge era is now offering alignment services
This guy is a grifter and a net negative for local
>>
>>103269483
Y'all love censored models though, from all the shilling I've seen here.
>>
>>103269466
>>103269402
>1st ever fully open source model
>aligned to fuck and we don't even get the base model
scam
>>
>>103269328
llama 3 7b is sentient
>>
>>103269328
Claude is the closest to have that 'ghost in the shell' feel
>>
>>103269508
>turkish rapebaby tranny balkanoid does his low effort trolling again
hi petr*
>>
>retards freaking out over the word "alignment"
>>
>>103269514
Safety and tolerance are the most basic values of Open Source and its community.
>>
File: file.png (49 KB, 799x531)
49 KB
49 KB PNG
>>103269466
>Arcee
They sure got big tho
Others include
>AWS and Intel
>>
>>103269541
Right, like that hasn't meant practically only one thing since the word became commonly used.
>>
>>103269571
Releasing an unsafe and offensive model would only hurt the image of open source.
>>
>>103269550
Yes, xister! Free software should be replaced by ethical software to own le chuds! #RemoveStallman
>>
>>103269550
That's actually true, considering /g/'s love for establishment and queer e-celebs.
>>
File: file.png (55 KB, 903x922)
55 KB
55 KB PNG
>>103269586
Correct
>>
>>103269550
2/10 ragebait
>>
>>103269571
you haven't realized that every single big release pays lip service to the concept of alignment regardless of how censored they end up being
>>
>>103269616
>>103269603
But sure, if you want to be hyped for something that'll 100% be ultra-corpo safe go ahead.
>>
AI isn't real and everybody who's making money off this field knows this but pretends otherwise. Get that bag and gtfo before the bubble blows up. It's okay to be a bystander who just wants a local smut autocomplete, a bunch of h100s will be liquidated.
>>
God I hope R1's weights are actually released. This model is legit better than closed source.
>>
>>103269627
Bro you don't understand, sama has made GPT5 smarted than a human, it's fully multimodal AGI or even ASI! Please invest.
>>
>>103269642
It's pretty entertaining to see it's thoughts but it just badly fucked up a coding problem that even free chatgpt solved for me. And it's coding knowledge seems to be really outdated.
>>
>>103269666
? Its the only model that got some stuff only claude 3.5 and qwen2.5 32B coder did before. Maybe got a bad "reasoning" roll?
>>
>>103269688
lol, no way. reasoning doesn't do shit for coding abilities.
>>
>>103269714
And why wouldn't it?
>>
File: 1708365016268048.png (67 KB, 865x182)
67 KB
67 KB PNG
Speaking of alignment, I wonder what this last line is supposed to be about?
I don't have anything about flags or alignment in my prompt or the card description.
>>
>>103269740
I'd say it's the name "Naomi" pulling all kinds of CoTs / jbs in the garbage logs your model was finetuned on.
>>
>>103267649
>feeling of stepping on feces
>>
>>103269804
I'm happy to see that your relationship with tranx qwxxn is going great, keep us updated.
>>
File: upset.jpg (20 KB, 600x600)
20 KB
20 KB JPG
>>103269550
Fuck that shit.
>>
https://x.com/ltxstudio/status/1859964100203430280
Local video generation now down to a single 4090
>>
>>103266757
All of Drummer's Small tunes seem to have the bloated tokenizer.
>>
>>103269898
*except Cydonia v1.0
>>
File: 1716129419410379.webm (2.24 MB, 768x512)
2.24 MB
2.24 MB WEBM
>>103269883
yeah, can confirm, this shit is pretty good, and it's really fast
>25 fps, 129 frames (5 seconds), 50 steps
>01:10<00:00, 1.42s/it
>13gb VRAM peak (during the vae decoding)
>RTX 3090
>>
>>103269883
>Local video generation now down to a single 4090
We've had that since Mochi 1
>>
>>103269989
but with mochi you had to wait for 30 mn to get a single 5 second video, for that one it's only 1 mn because they managed to efficiently compress the VAE
>>
>>103269883
>https://github.com/Lightricks/ComfyUI-LTXVideo
>https://github.com/Lightricks/LTX-Video
> first commit 6 hours ago
yeah this is an obvious shill for anon Guinea pigs
>>
File: 1730858495297125.png (578 KB, 512x512)
578 KB
578 KB PNG
the chinks will save us all
>>
I wonder what kind of AI models Aliens use.
>>
>Product Security Engineer @ Red Hat- AI Security, Safety and Trustworthiness
>https://huggingface.co/posts/huzaifas-sidhpurwala/601513758334151
>As AI models become more widespread, it is essential to address their potential risks and vulnerabilities. Open-source AI is poised to be a driving force behind tomorrow's innovations in this field. This paper examines the current landscape of security and safety in open-source AI models and outlines concrete measures to monitor and mitigate associated risks effectively.

>https://huggingface.co/papers/2411.12275
We need more of this! Much more!
>>
>>103270001
Near real time on a 4090. First video model worth using because of it. Prepare to start seeing porn finetunes of it.
>>
File: 1704377820835720.webm (1.96 MB, 768x512)
1.96 MB
1.96 MB WEBM
>>103269981
I like that one
>>
>>103270112
https://arxiv.org/abs/2411.12275
>>
File: Base model.png (21 KB, 593x237)
21 KB
21 KB PNG
>>103269514
Looks like they actually will be releasing the base model, as well as the post trained model.
>>
>>103270150
sounds like they aren't as retarded as I thought, that's cool
>>
>>103270150
oh wow lmg was dooming over nothing who would have thought
>>
>>103270150
who cares, the chance of this model being better than llama2 7B is slim.
>>
>>103270176
for once /lmg/'s doomerism was wrong, usually we get fucked in the ass pretty hard
>>
Update on sarashina2 8x70b...its pretty unhinged with good temp/minp. I'd say its the jap ERP king after doing some completion on existing chats. Super spicy.
The initial release had a busted tokenizer_config.json, but after requanting it works properly.
>>
>>103270185
maybe you do
>>
What is the best oobabooga preset (or parameter values like temperature, min p, etc) to use for the best Roleplay(mainly erotic but I care very much about characters following the scenario and not going OOC) experience in Mistral Nemo 12B finetunes?
>Use DRY
I don't have that yet, will get around to that.
>>
>>103270274
Depends heavily on the model, but i like to start with temp 2.6 and minp 0.008 and then back off until I get an amount of insanity that's appropriate for what I'm trying to achieve.
>>
>>103269714
Kek. Not sure what level of coding you've done, but I have to assume you either: (a) are just starting out and have somehow Dunning-Kruger'd yourself into thinking you're an expert, or (b) the only coding you've done has been via prompting an LLM
Reason I say this is because generally, unless you're truly doing basic toy shit, you generally don't get very far before you get fucking steamrolled (or, on the off chance it does work, write horrifically inefficient code vomit) if you don't know what you're doing
If you disagree, I invite you to check out TAOCP, Concrete Mathematics, and Algorithm Design by Kleinberg and Tardos
>>103269688
r1 has some pretty heavy variance. It generally ranks below o1 in some of my tests of programming / algorithm problems, though it definitely often comes a lot closer than Claude and Qwen. I don't think it would fully replace o1-preview for the people that use it, but it would make the ridiculous prices OpenAI charges at the moment quite a bit more questionable
>>
>>103270322
Top p 1, top k 0, typical p 1, right?
And repetition penalty at?
>>
>>103270347
>urrr durrr skill issue
Stop being stupid anon, I'm talking about the kind of reasoning that these LLMs do. If you think I'm wrong, why does o1 gets mogged by Claude 3.5?
>>
>>103270434
It genuinely doesn't though, it's just better at the easier stuff. If you disagree, you can test it yourself. Here's the problem: https://atcoder.jp/contests/dp/tasks/dp_j
Pop that into Claude and see what it gives you. Here's the (correct) o1 solution for reference, which was its first attempt
>>
>>103270542
Claude test for reference (got murdered by a division by zero)
>>
repetition penalty should be deprecated
literally just exists as a newfag filter at this point, way too easy to go wrong and use retarded values that turn your output into adjective/adverb spam because every glue word got penalized into nonexistence
>>
what's the current best method to have a chatbot that 1. can "read" images, so if i post an image it can describe it (within current model limits of course), and 2 (optional) i can tell it to prompt and gen an image?
using sillytavern/koboldcpp backend currently, but not sure how i'd go about it there.
in other words, i want to chat with teh ai about images and if i post one it can talk about it, and have it suggest prompts
>>
>>103270696
>remove feature with occasionally niche value because retards don't understand it and use it in the wrong way
No, that's the spirit of proprietary software, not open source
OSS does mean footguns for newfags sometimes but that's a price worth paying. Fuck outta here with your dumbing-down suggestions
>>
>>103270886
Open webui if you don't mind it raping your RAM.
>>
>>103270992
what niche value does it have over presence / frequency penalty (the same thing but with sane scales and less retarded implementations) or more advanced repetition samplers like DRY or w/e? it's just a super primitive and very poor sampler that sucks ass at its job. it's bad. there are NO pros to it. rip that shit out.
backends can keep it for compatibility's sake but frontends should not be putting that garbage in front of a user's face unless they very specifically request it for some deluded reason
>>
>>103271048
NTA but simply updating ST's default presets would solve this.
>>
>>103270365
I keep top p and typical p at 1, but I crank top k all the way up to 200.
I don't use rep-pen. If a model is to repetitious in a way I don't like I just don't use it.
>>
anyone tried out the vision support in exllama2?
>>
>>103271200
exllama supports vision now?
>>
>>103271102
Thanks anon I will see if it works well for me.
>>
i've also given up on rep penalty stuff, xtc, dry. sure they can help reduce overused slop but they also introduce errors when the model wants to say a shirt is red, but can't, so it picks another color which is wrong. so out of the choice of more slop or inaccuracies, i'll deal with the slop. low min p + adjusting temp is all i use these days
>>
>>103271276
I concur with this assessment. At first I thought Largestral wasn't that good, until I realized XTC was making it retarded and never went back.
>>
Where do you reckon the tech will be in five years? ten years?
>>
>>103271236
I saw this in tabby
https://github.com/theroyallab/tabbyAPI/pull/249
>>
>>103271276
I agree on rep pen and especially XTC (that can REALLY make a model retarded...turns out lower probability tokens are lower probability for a good reason)
but I find DRY is basically risk-free regarding the model's intelligence as long as you're not applying it to single tokens (so allowed length of 2 or higher)
>>
>>103271011
i have 128gb ram and 24gb vram, does that work?
>>
>>103271331
By then nvidia will release $300 24gb cards finally, and we will be able to run local o1 on it.
>>
>>103271345
NTA but
>(so allowed length of 2 or higher)
This should be at least 5 or it starts banning uncommon names. Also {{user}} and {{char}} in sequence breakers (persona and character names should consist of first name only).
>>
i think the site died
>>
>>103271647
ayy finally, after like 4 attempts

>>103271345
out of the three (rep pen, dry, xtc) i liked xtc the least. it just seems like a horrible idea to chop off the top tokens because that token could be a name, color, any kind of detail. dry seemed ok but could also introduce errors.

>>103271473
is that something you setup already or just speculating on? i'd try it
>>
>>103270176
You think they were using good training data? Only copyrighted data is good training data, and them being open source means they will have zero of that.
>>
>>103271712
nice unrelated pivot
>>
>>103271712
Issue isn't the quality, it's the count
>>
>>103271678
>is that something you setup already or just speculating on? i'd try it
With --debugmode on in KCpp, I saw it trigger a lot on first syllables of non-English names. I guess I'm speculating a bit here: those names would pop up if necessary ("He was born in _") but otherwise not.
>>
Any threestral finetunes yet?
>>
>>103271871
when i was using dry i noticed it fucked up jap names a lot. like it'd get through half a name and then just go nuts. (tsukino becomes tsukAKAK)
i thought it was the model at first since i never saw the same issue with normal english names, but it went away when i turned dry off.
>>
discord.gg/aicg/
although we are chatbot focused we have many channels meant for prompting and ai art which includes dall-e, flux, stable diffusion, pix art, someone even hosts a proxy’ come join us!
no lurking
>>
>GPT-4o no longer topping any leaderboard, tried to top Google only to get smacked back down
>China is about to take away what little value o1 had
>OpenAI ran head long into a "fuck you" sized scaling wall that's turning any further upgrades into side grades (the more recent GPT-4o to try to top the LMSYS leaderboard is worse)
>Anyone with any competence to save OpenAI from itself has long since left
>Musk has power now and is out for Altman blood
>OpenAI still dealing with a fuckton of lawsuits from NYT and Pajeets, "accidentally" deleted their datasets which makes them look more liable
>Still billions of dollars in the hole with investors getting antsy for a return on their buck
It's like watching a train wreck
>>
>>103272323
>faggot noises
lol no
>>
>>103272363
>Anyone with any competence to save OpenAI from itself has long since left
The inertia is spent. altman is finally paying for his hubris.
>>
>>103272363
was about fucking time, I alaways hated this fucker, his fear mongering of AI has done a lot of damage to the community
>>
>>103272363
i don't even consider openai to be relevant at this point. claude passed them months ago and its remained the same. now local has gotten so close on benches like coding which is amazing given the assumed size difference (qwen 32b vs whatever the fuck 8x+ monster gpt/claude is). openai's reign was over months ago, its just taking people a while to realize it
>>
>>103272417
Have we really started to take Chinese model benchmarks at face value? Come on now.
>>
>>103272363
Trust the plan Altman says AGI is coming in 2025. Strawberry is going to blow us away.
>>
>>103272363
Musk recently said he would make AGI by 2026 and xai is building a massive server farm at a unprecedented pace.
>>
>>103272363
OpenAI still has branding and first-mover's advantage. For most people AI = ChatGPT.
>>
>>103272435
i wouldn't even mention a benchmark if i didn't use it myself, i know how they game shit and especially china they lie and steal everything. but yeah, its a good model, its the first one to not shout at me in chinese half way through a message. not just qwen though, nemotron is also very good. hell even codestral is amazing for its size. local is eating well and the gap has shrunk insanely in the last year.
>>
>>103272449
America Online also had branding and first mover's advantage
>>
>>103272449
>AI = ChatGPT
That's why I mentioned inertia and referenced the brain-drain quote. You can only keep first-mover if you are close enough to state of the art to keep yourself relevant vs. people discover superior services.
Its a serious advantage, but must be defended, especially since AI is in its infancy in the public imagination.
>>
File: MGS6.jpg (271 KB, 1529x857)
271 KB
271 KB JPG
>>103264382
To some extent the human brain contains dedicated centers, but those centers have extremely wide interfaces pumping a shitload of data between them. Bandwidth is the limiting factor for virtually every interesting computation. So what you describe will never work with English text or tokens or whatever as the shared language for the centers- the interface is too narrow, so it'd at best be ultra inefficient.

It also just won't work well with the Von Neumann architecture.
>>
>103272323
What a sad end
>>
>>103272488
back in high school when aol was sending billions of cds to everyone, you could even find them at burger king, we used to shove hundreds of them through the vents of lockers so when you'd open it, 500 cds spill out
great fun until its you that opens the locker
>>
>>103272499
This. First mover's advantage isn't going to save you if your services are inferior / more expensive than competitors (like, say, charging 0.50 per o1 query). It might delay your fall into obscurity, but eventually people will move on if you don't have something good enough to offer them. Unfortunately for them, OpenAI also happens to be in the position where it's costing a lot more than it's bringing in and it needs a plan to turn a profit fast.
>>
>>103272581
>people will move on if you don't have something good enough to offer them
my co-worker moved to anthropic a while back (understandable since IT), but my kid's friends are already moving to perplexity, so they could give a rat's ass whats on the back end. And this is in a rural area without any kind of tech sector presence.
Also probably a majority of companies are using copilot branding via their MS EA, so the name brand is already severely diluted for knowledge workers.
>>
>>103270886
>sillytavern
Just use the attach button, at least with the custom OpenAI API and using vLLM as a backend, it just works. I suppose now tabbyAPI can be used for that too.
>>
>>103266622
>artificial intelligence is artificial
Oh wow.
Everyone
This guy is so smart
Holy shit
>>
>>103266622
>ai isn't real it's jsut word associatomn and statistics
and what are we? our brain is just working thanks to a set of little electricity shocks
>>
>>103272323
Probably a troll honeypot server. So naturally I'm going to join out of morbid curiosity
>>
>>103272807
electricity shocks powered by God
>>
>>103272840
Aww it's a fake link. No friends for me :(
>>
>>103272842
if God created the world, then he also created the AI, CHECKMATE
>>
>not real intelligence
>But it knows about the hallway birds
>>
Is there a model that's free to use commercially? It doesn't have to be gpt level, just needs to string a few sentences of text together.
>>
>>103272976
A ton of openly licensed models can do that.
>>
q2 behemoth or 5km midnight miqu?
>>
>>103272886
those are just government drones that don't fly
>>
>>103272995
Thanks I'm retarded, I'll use T5.
>>
>>103272997
miqu for slop, q2 for tardation
>>
>>103272363
>Musk has power now
My hobby didn't deserve it. It is a good thing it id dead anyway.
>>
>>103272997
Go back to the Kobold Discord.
>>
>>103269586
I'd say fuck your optics, concern troll, but what you say isn't even remotely true. There needs to be space for hobbyists and tinkerers to collaborate on uncensored models. If that is not allowed, then you can be sure you don't live in a free society.
>>
File: 1705186488029659.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>103273159
based
>>
>>103267649
Ah so it's completely useless for translating visual novels because it avoids anything offensive or adult in nature.
>>
>>103273094
dunno what that is
>>
I hate Qwen. Largestral is too big. Nemo is too retarded. Nemotron wants to give me lists instead of being normal. What am I supposed to use?
>>
>>103273292
money
>>
>>103273292
Magnum v4 72B
>>
the google colab hag gives me a hard on every time.....
>>
File: 1731529268244600.webm (1.36 MB, 576x566)
1.36 MB
1.36 MB WEBM
>>103268024
>Wanna host? Request access to weights (huggingface login), then run huggingface-cli login in the terminal

iirc this was the issue. It's like saying "Run your media bittorrent-style. Provide your Netflix login to get started!"
>>
>>103272363
He is about to get what he fucking deserves (I will never forgive him for withholding GPT-3 and forcing people to eat shit for 2 fucking years).
>>
>cheap radeon pro v620 32gb on ebay
worth it?
>>
>>103273292
Wait 2 more weeks
>>
>>103274158
More like 2-4 months unless deepseek drops R1.
>>
>>103265207
>https://rentry.org/lmg-lazy-getting-started-guide

I followed this guide and its still censored.

koboldcpp/Mistral-Nemo-12B-Instruct-2407-Q4_K_M
koboldcpp backend
mistral v3 tekken context and instruct
etc etc, won't do anything uncensored. Should I get a different model is that what I did wrong?
>>
File: file.png (1.89 MB, 1500x2060)
1.89 MB
1.89 MB PNG
One day after the AI goldrush dies completely some guy will have too much free compute in some of the big companies and he will just drop some discord rp dataset in the main training + will ease off the censoring a little and we will get a 7B that just gets everything.
>>
>>103274275
Spare compute for sure, but I don't know about that mythical rp dataset, chief
>>
>>103274275
>>103274320
You don't really need it. Poe AI with a good enough prompt on GPT3.5 did some ungodly nasty things with me.
>>
>>103274275
I mean, no that's never ever going to happen.
reasons for this:
1. the dataset when training a model, really, really matters. so much so, that anthropic created Constitutional AI, which uses another AI to create the dataset. Having a junk dataset really harms the output.
2. Many models have already done this already, take a look at ArliAI, they're not perfect, not by any means.
3. the goldrush will get replaced by something better LLMs are step one, there will likely be better shit in two more weeks.
>>
>>103274366
>ArliAI
>Training Duration: Approximately 3 days on 2x3090Ti
>Epochs: 1 epoch training for minimized repetition sickness
Ah so you just don't know what you are talking about.
>>
>>103274386
yeah and you're the expert clearly, dumbass.
>>
>>103274366
>anthropic's constitutional AI
>clear improvement on the same model every new checkpoint
>meta's SPIN
>benches keep maxxing yet nobody can tell any difference
>>
>>103269328
they literally are. simulating thoughts are thoughts because thoughts are simulation
>>
I am getting a ton of 404s on HF for model cards that were there last week.

Was there a purge or was I just looking the wrong thing?
>>
>>103274623
Yes. That one was removed, but not the other ones. Except that other one.
Just post the fucking links of you want someone checking them for you, retard.
>>
>>103274652
I included links for everything I wanted checked.

ohhhh... I forgot to include links.

Please see below.
>>
>>103274486
>yet nobody can tell any difference
Filtered
>>
https://github.com/danny-avila/LibreChat
has anyone used this?
I'm just looking for a lightweight interface for chatgpt, claude and others
>>
>>103274785
I never heard of it. Most people use SillyTavern.
>>
in ST, how do I set up an author's note that won't force a large portion of my context to be processed again when I either edit it or it gets inserted, currently I've got it at depth 0 insertion frequency 4 and for some reason it's going 6k deep and completely defeating the purpose I'm using it for (summarizing a very long chat that will take over 2 hours to process since I'm on cpu)
>>
File: 58265577_p1.jpg (76 KB, 694x1000)
76 KB
76 KB JPG
Hey Magnum anon here?

Thank you for giving me the pointers yesterday. I had problems with Magnum 70b occasionally acting up and sometimes defaulting back to Qwen's, outputting slop or gibberish, then I copied the System prompt exactly from the hugging face page and that changed everything completely.

Apparently having the same system prompt as what the model was tuned with MASSIVELY reinforces the tuning and makes it abundantly clear for the AI that it is not a helpful and polite AI assistant anymore. Copy and pasting the system prompt from the tuner's page completely changed the model's behavior and erased all traces of the pozzed censorship, bias or purple proze, now the waifus are coherent and wild as fuck.
>>
>>103274902
I think sillytavern only works with API keys
I want to use my regular accounts but with a UI that doesn't take up 700mb of memory for a fresh tab like Claude
to be fair maybe I should look into using APIs directly but I imagine it's more expensive than regular premium for a power user
>>
File: 3625656456.png (2.91 MB, 1280x1418)
2.91 MB
2.91 MB PNG
>>103275049
Do ask the model tuners to provide the exact prompt and format they were tuning with, it's incredibly powerful.

And if the model tuners are here - attack your system prompts to your model page.
>>
>Qwen2.5
>DeepSeek R1
>Marco-o1
WE BUILD FOR CHINA
>>
>>103275122
China will release them open in order to undermine US companies as long as the US has the lead. The moment China is in the lead they will go closed source.
>>
>>103275139
>The moment China is in the lead they will go closed source.

Then the west would go open, isn't the competition a beautiful thing?

Cold war was the reason the technology was developing quickly back in the 20th century, no competition = no progress.
>>
bfloat16 is a meme
https://arxiv.org/abs//2411.13476
>>
any igpu enjoyer here?
https://www.reddit.com/r/LocalLLaMA/comments/1gheslj/testing_llamacpp_with_intels_xe2_igpu_core_ultra/

should I go for intel or amd?
>>
>>103275056
>regular accounts
At least for Claude, I think people use this:
https://gitgud.io/ahsk/clewd
Which makes a custom proxy that forwards the API calls to the web app. People used a similar method for Slack in the past.
>>
>>103275254
Do the modern integrated GPUs have limited ram allocation?

Can i have a "tpu at home" if i bought a cpu with an igpu and allocated 120 gb of ram to it?
>>
>>103274785
Can it be used with local?
If not then fuck off.
>>
>>103275312
>Can it be used with local?
>If not then fuck off.
>>
>>103275287
afaik ryzen apu can address up to 64gb
https://www.reddit.com/r/LocalLLaMA/comments/1efhqol/comment/lg24yh5/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button

not sure about intel
>>
>>103275287
As far as I know it doesn't work with AMD.
>>
>>103275312
yes, you gigantic retard, it can be used with local models
I tried running it with docker but had errors with mongodb and i cba
doesn't look bad though
>>
>>103275312
Yeah it can connect to any openai-compatible API which most local LLM servers can do.
>>
>>103275312
God you're so painfully retarded
Do you even listen to yourself?
>>
The "local models are finally very good, but painfully slow" era is much more annoying than I thought it'd be
>>
>>103275413
Xe can't cuz cloud stuff lives rent free in xis head.
>>
I'm getting extremely similar, near-deterministic outputs on every reroll with mistral large 2411 even with 1.2 temp. I have not touched any other samplers/params, it's all vanilla.

Any idea how to fix it?
>>
>>103275602
are you using an old llamacpp variant you didn't update for a while
iirc there was a brief period where sampling wasn't working properly unless you were using the HF loader, so it would act deterministic with any settings
>>
>>103275627
Latest koboldcpp with sillytavern, with default settings (other than temp).
I have been out for quite a while so everything is freshly downloaded.
>>
what's teh best gguf of Midnight-Miqu for 24gb? i'm using 70B-v1.5.i1-IQ4_XS (34.6GB) and it's about one word/sec on 3090ti
can i go to one of the smaller models (3M/3S/3X/3XXS) without making it crap out too much? or is there a better alternative? i find this model to be smart enought to keep a casual conversation going for a while
>>
>>103275653
>>103275627
Also Q4_K_S from here:
https://huggingface.co/bartowski/Mistral-Large-Instruct-2411-GGUF
>>
>>103275653
do you get gibberish/word salad if you crank the temp to 2.0 with all other samplers off
that's the easiest way to test if sampling is actually working or not (gibberish means it is working)
>>
>>103275602
Frequency Penalty 0.13 and Presence Penalty 0.2 seems to be working for me. I was just cranking both up to 1/1.5 and halving down until Lagestral 2411 became less repetitive.
>>
File: working.mp4 (405 KB, 406x468)
405 KB
405 KB MP4
>>103275697
Yes, I am getting word salad at 2.
I guess I'll just have to find some better sampler settings.
>>
>>103275763
temp 5 topK 3
>>
>>103275518
Accelerate
>>
>>103275518
Cloud models are smaller than you'd think. Current local models are just too big to justify their performance levels
>>
>>103276223
I think that's true in some cases, but Claude Opus (which most coomers think is the best coom model) is clearly a genuine behemoth based on its slow token generation rate.
>>
>>103276250
You'll never know with cloud models
>Studies show that users associate a lower token generation rate with a higher perceived intelligence of the model
>>
>>103276361
Yeah but in this case Sonnet 3.5 has been their flagship "smart" model for half a year at this point.
>>
File: NorthKoreanMikuKnockoff.png (1.4 MB, 1248x800)
1.4 MB
1.4 MB PNG
good night, /lmg/
>>
>>103276557
Good night, Miku and friends
>>
File: 1674516896750610.jpg (56 KB, 800x533)
56 KB
56 KB JPG
>>103276557
>>
>How could I possibly relax when my body is still humming from what just happened?
>hum
Bros I want to know what the fuck the context is from the data poisoning source. "Shitty erotica" yeah I know but who what when why how exactly is it used when written by a human?
>>
>>103276727
there's a large market for commissioned smut, particular among furries (who have notoriously high levels of disposable income), and a lot of 'authors' just mass-produce that slop by copying and pasting chunks together and using find-and-replace to add names and pronouns in afterwards; I'm betting a lot of that made it into datasets, along with all the commercial erotica that's probably produced in a similar manner
we need a model trained exclusively on Ao3, ff.net, and maybe some of the quest forums (sb, sv, qq, etc)
>>
File: 1702576289666593.jpg (67 KB, 640x701)
67 KB
67 KB JPG
>Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
> Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. Sparse autoencoders uncover meaningful directions in the representation space, these detect whether the model recognizes an entity, e.g. detecting it doesn't know about an athlete or a movie. This suggests that models can have self-knowledge: internal representations about their own capabilities. These directions are causally relevant: capable of steering the model to refuse to answer questions about known entities, or to hallucinate attributes of unknown entities when it would otherwise refuse. We demonstrate that despite the sparse autoencoders being trained on the base model, these directions have a causal effect on the chat model's refusal behavior, suggesting that chat finetuning has repurposed this existing mechanism. Furthermore, we provide an initial exploration into the mechanistic role of these directions in the model, finding that they disrupt the attention of downstream heads that typically move entity attributes to the final token.
https://arxiv.org/abs/2411.14257

>>103276557
>>
>>103274275
7b is coping will need at least a 34b
>>
I got bored with my suno credits and made this with mostly Suno V4 (and some post)
https://voca.ro/1n6LYL5sb8GU
Dedicated to you guys. UwU
>>
File: 1712519454715559.jpg (182 KB, 850x1274)
182 KB
182 KB JPG
Poorfag here.

I have a laptop with a Ryzen 5, 8 GB of RAM and no dedicated GPU. I could upgrade the RAM up to 32 GB though. Is that enough to run a local model (for coom reasons) or would it be too slow to be useable?

Pic unrelated.
>>
>none of the local models know about the "bakery" fat ass joke
Does everyone just train on the same CommonCrawl from 2 years ago?
>>
>>103276957
I'm running Cydonia on an R5 5600 and 32GB of DDR4, get about 1t/s until getting really deep into context, would definitely recommend DDR5 if you can get it.
>>
>>103276927
kek
>>
>>103276984
Nothing wrong with that
>>
what's a good, small and performant model (preferably uncensored)?
>>
The age of rasperry starts now. R1-lite is only the first step.
>>
Went back through my recent models, testing each one. I think 70b Hanami is my favorite.
>>
>>103275161
>isn't the competition a beautiful thing?
yes anon, it's really beautiful, without that we wouldn't advance at all
>>
>>103278046
>R1-lite is only the first step
do we know what will be the size of that thing? and are we sure they'll release it locally?
>>
>>103278069
And I think you're piece of shit that only came here to spam Sao's models. Go fuck yourself, asshole.
>>
>>103278167
you think something that's wrong then
I just coomed to it and wanted to share the positivity, schizo
>>
>>103278193
Go buy a fucking ad, asshole. I know you're just about to start spamming that model because you're a fucking shill.
>>
>>103278209
Specifically, I compared it to Nemotron, Magnum, Gemmasutra, and EVA-Qwen. Each of those made frequent errors which demonstrated that they didn't "understand" what was going on. Hanami, on the other hand, would write nice, long progressions of the scene that even made anatomical sense. Not that it was perfect, but I'm definitely going to keep using it for now.
>>
Used CLIP to organize my 4chan and porn folders and the Tkinter GUI and troubleshooting was made by Qwen-2.5 Coder 32B, I love local models
>>
>>103278249
Made up crap that only serve as an excuse to shill because that's how you make money. When the next thread is 90% filled with your shills we're supposed to think it was organic word of mouth, right? Go fuck yourself.
>>
>>103278268
My previous favorite was Euryale. I'd also been using Magnum v2 and v3 off and on. Magnum v4 just sucks every time I try it.
>>
>>103278280
What? Do you want another excuse to keep shilling? Go ahead. Reply to this post. You're leaving money on the table if you don't.
>>
File: facepalm2.jpg (404 KB, 1022x1080)
404 KB
404 KB JPG
>DRY just causes the model to intentionally misspell words so it can keep repeating them
>>
>>103278069
Going to try this model, thank for sharing :)
>>
>>103278305
Eat it up goy.
>>
>>103278292
I'm getting rich here, yeah. Also, I got noticeably fewer llama-isms. Shivers down my spine, breath hot on my ear, eyes gleaming with ____, voices barely above a whisper.
>>
>>103278316
That's good to know. Reply to this post again to tell me more about it.
>>
>>103278159
>and are we sure they'll release it locally?
I mean, they stated they will, that's the 2nd most assuring thing they could do
>>
File: Screenshot_215242354.png (19 KB, 300x136)
19 KB
19 KB PNG
>>103278321
I'm really running out of things to say, though. Let's see... pic-related is my current system prompt. The warning part was for when Nemotron started being faggy, but I'm sure I could take it out now.
>>
>>103278350
Why am I supposed to care about your system prompt?
>>
>>103278361
It might affect model outputs? I actually haven't tested changing it with Hanami, so I can't be sure. Mostly I was just looking for things to say, which I already mentioned.
>>
>>103278384
So what you're saying is that you would rather do anything else rather than showing how the model actually writes? That's quite concerning...
>>
>>103278394
yeah, if he doesn't want to show the output that means that the model is ass, that's probably a shill
>>
>>103278394
It's a pain in the ass to show logs. I tend to use my own name, which I'd want to change. I also tend to tweak things to fit my fetishes, so a lot of the final replies aren't pure machine output (more like 95% model, 5% human).
>>
>>103278410
>schizo bambling
yeah definitely a shill
>>
>>103278423
bambling isn't a word
>>
>>103278427
That's your opinion, shill.
>>
>>103278435
kek
>>
>>103278410
This is the way to use LLMs. Stronger models will lift heavier, but in the end there's not a single one that can give you what you want perfectly. Back when I used Claude Opus I had to wrangle pretty hard too
>>
>>103278435
no, that's objective
You seem to have trouble thinking logically. Like tranny-tier in that words don't have meanings except in their use as rhetoric. Are you a tranny by chance?
>>
>>103278462
>You seem to have trouble thinking logically.
says the shill who want us to try his model based on a "trust me bro" evidence
>>
>>103278441
You basically have to narrate at least some of the other character's actions to steer them in the right direction or simply make things make logical sense. You can do so in a hinting, indirect way sometimes and it has the intended effect.
>>
>>103278467
>who want us to try his model based on a "trust me bro" evidence
This is certified /lmg/ hood classic.
>>
>>103278468
A problem I keep running into is female characters gradually becoming more aroused from an activity that does not involve genital stimulation and they eventually just magically orgasm out of nowhere.
Like, no, it doesn't work that way.
I have to narrate "{{char}}'s hand finds its way into her panties" or something to make the whole thing make sense.
>>
>>103278441
>I don't know how to prompt and I have to cope by writing my own outputs
>>
>>103278503
That's a sloptune issue, or prompt issue, or both desu. The current sloptune datasets are like 70% smut, most of which were generated from sex-mode jailbroken claude
>>
>>103278515
>I don't know how to cope -
Right.
>>
Holy fuck, buy an ad schizo having a meltie
>>
>>103278497
true, I remember the L2 era with the endless finetunes, downloaded so much models it destroyed my ssd :'(
>>
>>103278525
Mistral Nemo Instruct does it.
>>
>>103278525
>>103278598
Mixtral Instruct also did it.
>>
>>103278503
Also, if it's a femdom character, she'll just order me to cum while my cock is not being stimulated in any way.
Like, no, it doesn't work that way.
>>
>>103278503
You have no idea how women work.
>>
>>103278619
Neither do you.
>>
>>103278441
I have like five roleplays that haven't progressed in months. I pick a model and run it through each of them to see what it says, then focus on one to autistically iterate on until I'm finished. Repeat with the next model.
>>
>>103278619
A great deal of women factually can't even orgasm WITH genital stimulation let alone without it.
>>
Fixed my gpu crashing, we're so back... to running quanted garbage because 24gb isn't worth shit in this vram-inflated llm economy
>>
>>103278646
it'll get better during the 5090 era, I'm surprised that Nvdia went for 32gb, that's a lot when you know how stingy they are with their vram
>>
>>103278267
>folders
accept hydrus tags as your lord and savior
>>
>>103278467
I ignore people who post logs. It's always some 2-message garbage where the AI character says and does like ten things without any input from the player. They're totally useless as evidence of how it will perform, which speaks to the intellect of those who post them and/or want them.
>>
>>103278658
The devil is in the details. The 5090 will be like 4-5 slots and eat 600W, with maybe the potential of power limiting it to 450W without losing too much performance. All of that at $2k+ most likely.
Even building a small 96GB VRAM rig with three of them is going to be a pain in the ass and scaling them beyond that will be even harder.
>>
>>103278658
Yeah but just like >>103278703 said, it'll be overpriced, power-hungry garbage
I'm a student, not a consoomer
>>
>>103278810
>>103278810
>>103278810
>>
>>103269445
I'm just looking at mistral large 2407 in the screencap, it almost loses to qwen 32B
>>
>>103278687
I can look for more complex concepts with CLIP, I just wish there was a bigger model (1-2B instead of the 400M OpenAI CLIP)
>>
>>103278867
Yea, qwen2.5 is really smart. Mistral large writes better though.
>>
>>103278658
I really look forward to AMD next cards.
If they keep putting more vram on it someone is going to make them work for AI eventually.
nvidia has the advantage for now but won't be long.
>>
File: 1731887883476718.png (996 KB, 1760x746)
996 KB
996 KB PNG
>>103279374
>I really look forward to AMD next cards.
anon, AMD was a company made just so that Nvdia wouldn't be sued for AntiTrust monopoly
>>
>>103279411
How the fuck is corporate collusion a sign that we're living in a simulation?
Fuck Twitter for giving double digit mitwits a soapbox to speak on
And fuck you for posting it here and making me read it
>>
>>103279452
keep coping retard, AMD isn't gonna save you, it's only role is to save Nvdia
>>
>>103272363
The only thing he needed to do was keep open-sourcing GPT models. That would prevent others from wasting billions on training new models and allow for improvements to the GPT models, guaranteeing a monopoly.
For a jew, he is a massive ratard.
>>
>>103279535
Your idea is worse. If he open-sourced GPT-3 and GPT-4, their competitors would just take and finetune their models and provide cheaper alternative platforms since they did not have to invest in training their own models.
>>
>>103279563
this, OpenAI managed to get a monopoly for almost 2 years because they decided to keep the secret sauce to themselves, but this is now over, other companies can train their models better than then, oh well, RIP in peace bozo you won't be missed
>>
>>103279563
That's the point. His competitors would stay on GPT and wait for OpenAI to release new GPTs.
Effectively murdering any competition.
Today, there would be no Claude or Gemini. A few years of monopoly is nothing in the long run and they could have licensed the same way Epic licenses Unreal Engine, making billions easily without even running their models and wasting a shit ton of money on that as they do now.
>>
>>103279601
>That's the point. His competitors would stay on GPT and wait for OpenAI to release new GPTs.
>Effectively murdering any competition.
why? their competitors would continue the pretraining or finetune their GPT models in a way that it would beat OpenAI, doing that would even make it easier for them
>>
>>103279615
>why? their competitors would continue the pretraining or finetune their GPT models in a way that it would beat OpenAI, doing that would even make it easier for them
And? They would be forced to open-source their models and pay money to OpenAI after x amount of revenue.
OpenAI's massive losses don't come from training models, they come from running their models.
>>
>>103279631
>They would be forced to open-source their models and pay money to OpenAI after x amount of revenue.
If they make the license too restrictive, it's the same as keeping them close source. Their competitors will be forced to train their own models. All open-sourcing them would do is make us happy and make it easier for their competition to catch-up because they can just look at what OpenAI did in their latest models and use the same techniques themselves.
>OpenAI's massive losses don't come from training models, they come from running their models.
Bullshit.
>>
>>103279652
>If they make the license too restrictive, it's the same as keeping them close source
There’s nothing restrictive about requiring people to pay after a certain point. Companies would gladly spend tens of millions of dollars on OpenAI’s GPTs rather than billions to train and operate their own models, which would cost even more.
Why do you think Microsoft or Apple aren’t spending billions to develop their own models? It’s because they essentially own OpenAI’s models. However, if competition overtakes OpenAI, they could easily turn to Claude or Google instead, and that would be the end of OAI.

Dominance over the market should always come first.
>>
>>103279740
>Why do you think Microsoft or Apple aren’t spending billions to develop their own models?
>>103268360
>It’s because they essentially own OpenAI’s models.
Microsoft* essentially owns OpenAI's models. Apple had to rely on OpenAI because they had nothing of their own. They recognize this is a problem, and are planning to train their own by next year.
Which is what everyone would do if OpenAI licensed their model weights to everyone with fees for corporate usage.
They would just be giving their competition a stop-gap until they had their own models ready.
>>
>>103276927
heh
>>
>>103279803
>They recognize this is a problem, and are planning to train their own by next year.
They already trained smaller models and they performed terribly. It will take a few years before they reach anything similar to the current level of OAI. They don't even have the infrastructure for it.
At best, they will use upcoming llamas, and at worst continue using OAI for some time and then switch it.
>>
>>103279535
There's a certain irony to the fact that his antics are likely in part what led to our current era of French and Chinese models and the west basically eating shit
Remember he didn't just close off the weights - he closed off the research after GPT-3 instruct too. There's a lot of shit we could have learned about much earlier than we did. Instead, he decided to burn everyone to try to get a slight lead in a race that was always going to be his to lose anyway
>>
>>103279944
I don't think he cares at this point, he won 40 billions by changing his company's structure kek



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.