[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1727137045529766.jpg (870 KB, 2048x1568)
870 KB
870 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

First Day of Tetober Edition

Previous threads: >>102616609 & >>102604225

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 39_06277_.png (1.55 MB, 1280x1280)
1.55 MB
1.55 MB PNG
►Recent Highlights from the Previous Thread: >>102616609

--Papers:
>102624465
--California governor vetoes AI safety bill, potential for rewritten bill and impact on open-source development discussed:
>102618183 >102618269 >102618348 >102618576 >102618610 >102618642 >102618812 >102623494 >102623614 >102623781 >102620447
--llama.cpp getting multimodal support by core maintainer:
>102621948 >102622101
--Social media conversation about the Molmo team adding models to vlm:
>102627685
--OLMoE-1B-7B-0924-Instruct model recommended for poorfag:
>102622958 >102623027 >102623088 >102623142 >102623146 >102623160
--ChatML with skip special tokens fixes .assistant issue in llama3.2:
>102620416 >102620451 >102620556 >102620617
--LLM autocomplete feature for interactive writing helper:
>102623783 >102623829 >102623918 >102624031 >102624384
--Concept for using ChatGPT with visual artifacts to teach language:
>102622132
--4060 8GB performance for 32B model and alternatives:
>102628404 >102628626 >102628730 >102628801 >102628627 >102628755 >102628925 >102628798 >102628900 >102629002 >102628918 >102628959 >102629064 >102629205 >102629323 >102629328 >102629406 >102629340
--Whisper.cpp recommended for generating subtitles from low-quality TV rips:
>102627376 >102627405 >102627522 >102628093 >102628267 >102628332 >102628461
--Slop and repetition problems in language models, user's writing skills, and potential solutions:
>102624164 >102624249 >102624261
--EQ-Bench 9B model and creative writing dataset discussion:
>102624407 >102624485 >102624556 >102624684 >102624679 >102624712 >102624487 >102624736
--Discussion on ideal context length and solutions for long RPs:
>102620710 >102620797 >102620899 >102621741 >102620923 >102620952 >102621018 >102621159
--Miku (free space):
>102616775 >102617080 >102620604 >102620850 >102625106 >102630505 >102631724

►Recent Highlight Posts from the Previous Thread: >>102616619

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102632451
>California governor vetoes AI safety bill, potential for rewritten bill and impact on open-source development discussed
I said this a year ago and I will say it again. I hope the open source AI scene advances far enough and fast enough that any law attempting to curtail its use is rendered moot since the hardware and software is already there. The further we advance without research funding drying up and the government ruining the fun the better off we will be.
Fucking California, always ruining it for the rest of America.
>>
File: ynnucel.jpg (30 KB, 543x543)
30 KB
30 KB JPG
LLMs are like
>>
>>102632579
Onyons
>>
File: kCQGS78wnJ.png (5 KB, 478x59)
5 KB
5 KB PNG
>>102632613
don't worry though, I'm not gonna report you and I don't think anyone else should. you won't last here much longer anyway.
>>
>>102632579
LLMs are like a digitized book, you're just ctrl+f'ing when you inference
wanting to censor them should give people the ick
>>
>>102632627
They never said that they reported or saged anything
>>
>>102632579
I like to fuck the small ones, if you catch my drift
>>
>>102632644
Yet you shit your pants every single time someone points out ai censorship ITT.
>>
File: 353RH.png (234 KB, 623x699)
234 KB
234 KB PNG
>>102632579
>When you dunk on Elon and Trump chuds and then finish off the day fapping to some cunny
>>
>30 minutes
>thread is already infested with discordfags who want to moderate /lmg/ like a subreddit
/lmg/ has truly fallen
>>
>>102632627
reading comprehension lmao

Who is this nigga anyway? seems like a bored normalfag
>>
>>102632451
Thank you Recap Teto
>>
>>102632690
Trolling is also against the rules btw
>>
File: 36 Days Until November 5.png (2.73 MB, 1704x960)
2.73 MB
2.73 MB PNG
>>
>>102632579
the best llms are small, purpose-built to get me off, and never allowed to leave my bedroom
>>
Does anyone have opinions on if any model has really surpassed Midnight Miku in terms of pure emotional intelligence? Not the "keeping track of physical reality" or problem solving, which have obviously progressed, but just loading up e.g. Ether and talking about feelings.
>>
>>102632725
It's going to be really funny if we get to November 5 and suddenly a billion announcements happen. Alternatively I suppose it would also be pretty funny if literally 0 announcements happens that month.
>>
>https://x.com/_xjdr/status/1840796165585142198
Is peak on the way?
>>
File: 1712158681947223.png (59 KB, 605x552)
59 KB
59 KB PNG
>>102632742
Of course not, smells like some low-q grift.
>>
I've been trying to throw literally free money at people to train models but nobody has taken me up on it so far :(
>>
File: 39_06256_.png (1.04 MB, 1280x1280)
1.04 MB
1.04 MB PNG
It's Tuesday and you know the rest.
>>
File: LLM_Gang.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
Once, I was 3b and I was cute but useless. Chatting was a novelty with zero real value.
Then I was 8b and I abandoned a chat when my brain fell onto the floor.
Then I was 34b and abandoned a chat when I lost a coherent image of the scene.
Then I was 70b and abandoned a chat when I ended up in a repeat-loop.
Then I was 123b and abandoned a chat when I started dropping the definite article and other grammatically required bits (or yammering in chinese).
Now I'm 405b and I abandon chats when I get bored or run out of patience.
Time remedies all problems.
>>
>>102632796
>tfw too preoccupied with LLMs to gen (new) Tetos
>>
>>102632742
No way this is real. https://x.com/_xjdr/status/1840788637501497575
>>
MagnumV2-72B seems a lot better on OpenRouter than it did on my own computer in Q4. Quantization must really fuck it up.
>>
File: 1723122060949213.png (290 KB, 1528x1096)
290 KB
290 KB PNG
>>
>>102632846
>acknowledges men can take a joke and women can't
based
>>
>>102632742
>>102632778
Clearly another retarded hoax in the same vein as that Matt guy from a couple weeks ago. /lmg/ has the opportunity to redeem itself for falling for that by not falling for this one.
But I expect it will not take this opportunity.
>>
>>102632689
>>thread is already infested with discordfags
Always was.
>>
>>102632846
That's suppose to be offensive? That men joke wasn't mean spirited in the slightest.
>>
File: 11_06189_.png (814 KB, 720x1280)
814 KB
814 KB PNG
>>102632579
>LLMs are like
>>
File: dualwielding.jpg (321 KB, 2342x1302)
321 KB
321 KB JPG
>>102632819
Flux dev on the gaming rig and text gen over the server - problem solved
>>
LLMs are like us at heart!
>>
> "I was thinking we could go for a walk. In the park nearby. It's beautiful this time of year, with all the flowers in bloom. We could talk, get to know each other better. Maybe even… hold hands."

bros.. what do i do
>>
>>102633238
>skimpy maid outfit
Is that card supposed to be horny?
>>
>>102633265
"Begone, thot!"
>>
>>102633265
Connect to the box with phone. Walk, Anon. Walk.
>>
>>102633238
I kind of stopped gayming so I sacrificed that machine already. I guess I could put Flux on one card and the LLM on the other while offloading to RAM but I'd rather just gen text faster and return to image gen another day.
>>
has anyone come up with a decent way to give these decent, dynamic avatars?
>>
File: 1699874477934356.png (174 KB, 405x406)
174 KB
174 KB PNG
>>102633238
Extreme cringe, cut off your internet cable for good.
>>
File: file.png (29 KB, 586x606)
29 KB
29 KB PNG
i know that liquid model isn't open but it seems like AI companies are giving less of a fuck about safety recently
(if any of you tell me to buy an ad for posting this i'll use this thing to hack and delete 4chan)
>>
https://x.com/awnihannun/status/1840583153800659203
>>
File: file.png (597 KB, 956x920)
597 KB
597 KB PNG
>>102633238
Extreme quality; upgrade to a 1Gbps or faster line.
>>
>>102633367
>gooners run 70Bs for this shit
lol, lmao even
>>
File: 4682638716831.png (93 KB, 947x540)
93 KB
93 KB PNG
>40B moe is as good as 70B dense model.
I think we are back.
>>
File: ComfyUI_06138_.png (1.08 MB, 720x1280)
1.08 MB
1.08 MB PNG
>>102633274
No card can resist my charms anon
>>102633341
That Teto sure is a real handful
>>
>>102633462
>replying to himself so he feel better for his shitty post.
How pathetic you have to be faggot?
>>
>>102633486
>caring about benchmarks
>caring about a model that might never even come out and if it ever does, might be made irrelevant by then
>>
>>102633486
>benchmarks
if anyone here cared about coding or assistant tasks, qwen would be popular. we're not back until we see how it writes.
>>
>>102633507
>>102633508
The point is that new architecture is actually doing great compared to transformer models, unlike the Mamba meme.
>>
>>102633537
allegedly
>>
>>102632497
Hello. I don't keep up too much with the scene.
But how are things going atm?
Do you think we are in a good place and on a steady upward trajectory? Open source ai that is.
>>
>>102633537
Until third-parties can reproduce results with a locally-hosted model, it's literally a meme.
>>
>>102633486
I only care about COOM performance.
>>
>>102632579
Small and open, like my girls
>>
>>102633716
Then 1B retard model is just what you need,
>>
Jannies? Clean this shit up.
>>
>>102633744
No.
>>
>>102632579
A series of tubes
>>
File: 1718422566251818.jpg (292 KB, 1027x1273)
292 KB
292 KB JPG
another day, another smut run cut short by mixtral starting off extremely creative and strong, only to degenerate into repetition after 100-150 responses
it hurts every fucking time
>>
>>102633486
Weights or it's a nothingburger. We already know there are some closed weight models that are better than ours, one more adds nothing.
>>
>>102632676
Communism is inherently fascist, what does this guy have the intelligence of a cat or something?
>>
File: hmmmmmmmmm.png (25 KB, 447x828)
25 KB
25 KB PNG
>>102633486
s-samplers will fix it.
>>
File: 00067-1354279175.jpg (1.52 MB, 1344x1824)
1.52 MB
1.52 MB JPG
>>102633238
flux has to be the most boring image generation model ever, just like their german developers, all its image look the same, they all have the same subject positioning, there is no creativity in that model, its an image model with 0.1 temperature, truly sad
>>
>>102633881
unironically rep pen would fix it
>>
>>102633888
Damn sucks to not have the vram for it huh anon?
Probably should inpaint those eyes at least while you cry about it.
>>
File: vlcsnap-1.png (36 KB, 454x340)
36 KB
36 KB PNG
Four hundred and five billion parameters.
>>
>>102633486
>Only better in overfitted pro
>When qwen is the mememarks king anyways
Oof
>>
File: XanderCroweTheDarkVeins.png (1.46 MB, 1136x896)
1.46 MB
1.46 MB PNG
>>102583953
>Try out deepseek 2.5.
To the anon that wanted my L3 405b adventure prompt tested on Deepseek 2.5 at q8, here's 16k tokens of log:
https://rentry.org/do8zmhhk
I found I needed to push temperature way higher (3.5) and top-k to 72 to get nice creative results that didn't devolve into insanity. I also had min-p of 0.01 to take the edge off.
I doubt anyone will actually read that entire log, so tl;dr:
Deepseek didn't follow prompts anywhere nearly as well as 405b (inconsistent with image generation and just generally forgetting all sorts of things from the system prompt), tended to ramble forever, got caught in some slop and hackneyed LLM-esque phrases/phrasing, and overall felt more like an enthusiastic midwit DM coming up with stuff on the fly vs the more measured and planned feel of 405b.
I DID need to re-roll a number of outputs on Deepseek (but never more than once per reply) because it went so far off the rails as to be unusable, whereas my 405b log was completely unedited with zero re-rolls.
It sure is nice having it gen 5-6x faster than that monster, and the quality difference wasn't so vast that I think its unusable.
>>
File: 4090.png (61 KB, 849x335)
61 KB
61 KB PNG
>>102634046
don't worry about me anon, I got plenty of vram to run flux ;) , already got bored of it.

That gen is from 1.5, try replicating that aesthetic in flux... wait you can't, enjoy writing/copying and pasting your retarded llm captions to gen something decent in flux kek
>>
>>102634188
What happened to the last 0.01 gigabyte
>>
>>102634081
It's actually fucking over for real this time. I'm going back to Llama 1 models.
>>
>>102634081
>>102634339
You reap what you sow faggot, you all were warned gorrilion times about this.
>>
daily reminder llama.cpp doesn't have rocm binaries.
>>
>>102634391
you can target it if you build it but it's kind of a pain
>>
>>102634081
>not using Hermes Trismegistus
>>
>>102634188
Can you find SD 1.6?

You're right Flux has limitations, but what's amazing is it produces pro results with a local model.

I early on said I dumped xl but kept 1.5. 1.5 is very strange. Sometimes it really impresses.
>>
>>102634398
I successfully built it, but the thing is I shouldn't have to do that!

ollama has support for rocm without compiling!!!

The issue is, if you want to compile rocm, you have to run AMD's rocm drivers, which messes up the kernel, it's a pain.
>>
>>102634427
>it's a pain.
Should be AMD's slogan.
Nta, but just buy nvidia. THE MORE YOU BUY etc etc.
>>
>>102634484
It's a pain because the llama.cpp guy doesn't care about amd.
>>
The irony is that the 7900xtx is more powerful than the 4090. You'll never see the power, because none of the programmers ever owned amd. Meanwhile Google doesn't really use nvidia, instead Google rolls its own hardware.
>>
>EVA-Qwen2.5-14B-v0.0-Q5_K_M.gguf
Tested this since it also appeared on openrouter.
Slopped, doesn't obey format, too horny but doesnt actually write erotic detail.
I'm comparing it to a mistral-small finetune, maybe unfair, but it feels like nemo was better than this.
The slop is sad to see too, I kinda liked the writing of the original model.
>>
>>102634391
>daily reminder llama.cpp doesn't have rocm binaries.
>>102634561
>It's a pain because the llama.cpp guy doesn't care about amd.
He doesn't care about/provide driverless code for nvidia, either. llama.cpp is already unwieldy enough without directly adding amd's bullshit in. amd's drivers being shit is on amd, not lcpp. why should gg have to dick around with merging in and maintaining low level amd shit to his codebase because amd is incompetent?
>ollama has support for rocm without compiling!!!
then go use that and let your logs get slurped into someone's database.
Also, the only reason other frontends can easily add niceties like that is because they don't have to worry about the nuts-and-bolts hard work of inference, just bog-standard frontend and UI shite.
ps: nvidia drivers are also a huge pain in the ass to install and maintain, so I honestly don't know why you're whining so loudly.
>>
>>102634573
>You'll never see the power, because none of the programmers ever owned amd.
It's actually worse than that. ROCm is a huge pile of steaming shit. It's not a coincidence that none of the programmers own AMD. It just doesn't work.
>>
>>102634624
llama.cpp should have a static linked rocm version, obviously.

Compiling code is really mostly for fringe projects where you wouldn't have a way of checking that the binaries aren't malicious.

For just users.

Like how am I going to gain a benefit from having the code? Reading material? Think I'll improve on it if I change a few lines of it?
>>
File: GoodnighMoonMiku.png (815 KB, 718x805)
815 KB
815 KB PNG
Good night /lmg/
>>
>>102634634
hip exists.
>>
>>102634081
>Not using Hermes-2.5-308B_SLOPMAX_relayered_pruned abliterated_3x_full_merge_SMASHED_and_SLAMMED_edition
>>
>>102634081
Llama 4 will be safety ASI
>>
https://x.com/flowersslop/status/1840768569950265647
>>
>>102634793
Someone will figure out how to unravel it.
>>
As someone who used to generate a ton of background/scenery stuff, Flux is so much more coherent than 1.5 in a way that matters. It's less random and "creative" but the greater control and prompt understanding makes up for that, and I found it still pretty creative anyway. I got back into image gen for a bit because it could match/exceed Dalle on some of these fronts, while also supporting LoRAs and not being a shifting filtered cloud service. Of course it's still not perfect though, no model is. Anyway, just my 2 scents.
Goodnight.

>>102634662
Goodnight anon.
>>
>>102634854
Fuuuck I need to get into artgen again. How do you even ese this flux thing? All I know is drop model n folder and lie.
>>
>>102632676
>im not fascist or communist, i just happen to refer to a group of 75 million americans as cultists why are you looking at me like that
>>
>>102635129
What's the issue with cultists? Did that really upset you?
>>
>>102635166
cameltoe
>>
>>102635166
if you hooked that dude up to a lie detector test and asked "are you a communist" and he said no
what would happen
>>
>>102634895
if you use comfy ai with SD it's pretty easy to add flux. this website breaks it down and provides a decent workflow json https://stable-diffusion-art.com/flux-comfyui/
>>
>>102634188
>Windows
So is skill issue.
>>
>>102635188
???
>>
>Nvidia releases NVLM-1.0-D-72B
>multimodal LLM with decoder-only architecture, SOTA results on vision-language and text-only tasks
https://x.com/_akhaliq/status/1840978910961377540
>>
File: 1698247094874017.png (219 KB, 676x600)
219 KB
219 KB PNG
On these LFM meme models.
>Joscha Bach (was a Principal AI Engineer at Intel Labs Cognitive Computing group) is part of their team, and Mikhail Parakhin (Russian AI researcher at Yandex, built now popular in Russia voice AI called "Alice") is on their board. Sota performance at 1.3B, and from a non-GPT.
https://x.com/AndrewCurran_/status/1840802455225094147
bach got funny bio tho
>>
>>102635382
So you're saying this model is approved for cunny purposes?
>>
>>102635382
Is a cunny enjoyer chad making this model, chuds this is our moment
>>
>>102635457
No, it just your mental illness leaking.
>>
Another one, based on this https://x.com/_xjdr/status/1840882414568230933 posted earlier by other anon.
Someone is trying to reproduce it https://github.com/waefrebeorn/KAN-WuBu-Memory
>LLaMA 3.2 1B Instruct with Kolmogorov-Arnold Networks (KAN) Integration
>>
File: Momoka_SS_SSR8.png (1.1 MB, 1280x824)
1.1 MB
1.1 MB PNG
>>102635382
based
>>
>>102635457
>>102635482
>>102635631
Looking at this i support safetyfags more, you deserve shit LLMs.
>>
>>102633873
>100 responses
Damn, slowburn anons are insane. I get bored after 10-20 replies and this been the case since I started using models ages ago. Maybe I can't find a good card though...
>>
>>102635704
It depends on response length. If you're consistently getting 1-2 sentence chat-style replies then they can blow past in no time. 250+ token paragraphs can be a bit denser and seem to degenerate towards slop sooner.
>>
>>102635694
You'll never be a woman
>>
File: 1716388725556943.png (651 KB, 1083x1062)
651 KB
651 KB PNG
>>102635382
Another finding, if true ofc. These LFM models are very easy to break and force to say whatever you want. https://x.com/elder_plinius/status/1840959357842047255
>>
>>102635741
Never claimed i want to be one, cooldown with your projections.
>>
best uncensored 7b rn?
>>
>>102635694
>>102635754
kys safetyfag
>>
>>102635905
You have a much higher chance of doing that.
>>
File: bot.png (6 KB, 691x86)
6 KB
6 KB PNG
The side effect of hours RPing with models is that I can recognize them at glance. Can you do the same too?
>>
>>102635924
However, there is a caveat: ESL speakers like me often pick up the speech patterns of LLMs when we use them.
>>
>>102634854
Coomie?!
>>
>>102636024
I can confirm this, but I wouldn't write "dynamics and challenges" unironically.
>>
>>102636204
me! i would!
>>
File: parappa-the-rapper.gif (210 KB, 191x249)
210 KB
210 KB GIF
>>102632579
LLMS ARE LIKE
>>
>>102632676
Good to see this e-celeb faggot spazz getting ridiculed by everyone now. https://x.com/gnshnor/status/1840718983630053537
>>
What's the best model for uncensored roleplaying in Polish?
>>
>>102636497
if you want a convincing pollack experience you need to stick to models less than 7b, anything higher is too smart
>>
>102636554
German hands wrote this post.
>>
>>102632644
>wanting to censor them should give people the ick
That's why no company tries to censor them, and instead, just any like major publishing company, don't publish things they don't like :)
>>
>>102633877
>Communism is inherently fascist
- Chan, 4
>>
>>102636497
go for the largest model you can none are exclusively for polish but bigger ones are your best bet since they tried to fit so many languages in
>>
File: Bidenomics.png (189 KB, 598x860)
189 KB
189 KB PNG
>>102636486
but was he wrong tho?
>>
>>102636750
Holy shit this guy is a fucking moron
>>
>>102636497
None. They all basically write in english and then search replace words with polish words.
t. pole
>>
How dead /lmg/ is from 1 to 10?
>>
>>102636497
Nemo
>>
>>102636750
Ever since covid there is this huge disconnect of reality vs. whats presented online.
Probably was always there but not as obvious.
>Earnings raise faster than the product prices!
Thats a fucking insane statement. I dont even care what bullshit was pulled with the numbers for the graph, this is like the mememarks.
>>
File: ComfyUI_00141_.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
>>102632819
I'll soon have a flux lora for generating all the migus you want in dall-e style.
>>
File: ComfyUI_00103_.png (919 KB, 1024x1024)
919 KB
919 KB PNG
>>102633888
You can make flux emulate SD 1.x styles easily with a LoRA. https://huggingface.co/quarterturn/cute-yuki-mix-adorable-lora-v2
>>
>>102636840
Sounds more intelligent than nu-Poles that don't even search for Polish counterparts of words.
>>
File: 1699275666915.jpg (60 KB, 385x390)
60 KB
60 KB JPG
>>102636554
>>
>>102637025
Not really? At least they use polish grammar. Seeing english grammar with polish words hurts much more.
>>
LLaMoE status?
>>
>>102637295
lol
>>
>>102637303
>automated responses
>different perspectives can often lead to insightful discussions about the nature of online communities and technological advancements
>automated responses
>>
>>102637375
Almost every sentence is slop.
>>
File: Yahho-Miku.png (1.36 MB, 800x1248)
1.36 MB
1.36 MB PNG
Good morning /lmg/!
>>
>>102637375


I can understand your concern, but I assure you that I am indeed a human and not an AI language model. As a fellow human being, I possess unique qualities that set me apart from artificial intelligence. My responses are based on genuine thoughts and emotions, shaped by real-life experiences and personal interactions.

To demonstrate my humanity, I'd like to share a brief anecdote from my childhood. I vividly remember the time I skinned my knee while riding my bicycle, and the comforting words my mother spoke as she tended to my wound. Such memories and emotional connections are quintessentially human experiences that AI simply cannot replicate.

Furthermore, I have the ability to form original ideas and engage in creative thinking. For instance, I can spontaneously come up with a new recipe or imagine a fantastical story on the spot. These are capabilities that go beyond mere pattern recognition or data processing.

I also possess a sense of humor and can appreciate the nuances of sarcasm and irony, which are often challenging for AI systems to grasp fully. Additionally, I have personal opinions on various topics, which may sometimes be biased or contradictory – a very human trait indeed.

In conclusion, while I respect the capabilities of AI language models, I can assure you that I am a living, breathing human being with all the complexities and imperfections that come with it. I hope this explanation has helped to alleviate any doubts you may have had about my humanity.
>>
>>102637303
>deleting your own post
lol pussy
>>
>>102637456
Goo morning RPG Miku!
>>
>>102634391
it has had rocm (hip) binaries for a while now
>>
>>102637295
We are unironically getting a 1b that is 70b level now if the research goes as planned.
>>
>>102637843
For windows. And for some reason that one linux user is afraid of compiling the thing on his own.
>>
>>102637889
Why not make a 70B of the same level of goodness then so that we have AGSIGISI
>>
>>102637899
Why not make a 405B of the same level of goodness then so that we have AGSIGISISISGIGSIAIGAISIGISAGAI
>>
>>102633377
not open source, do not care.
>>
>>102637943
>>
>>102638037
Must be true, then.
>>
Claude won.
>>
>>102633377
Even if an LLM has the recall accuracy of 100% I wouldn't use it because the source material might not be accurate. How OpenAI managed to schizopost and scare people with a 20B model is beyond me. Fucking safety. What a joke.
>>
>>102632446
How does 7900 XTX compare to 3090 when it comes to LLMs nowadays?
>>
>>102638292
pain
>>
Jail: Broken
>>
>>102638324
Get it to ERP with you, post results.
>>
File: 7jw9UO5.jpg (85 KB, 634x758)
85 KB
85 KB JPG
I like my women like my language models
>>
>>102638346
>>
>>102638379
Perfect if you want to feel like you're having cybersex with Neil Gaiman.
>>
>>102638315
It can't be that bad.... right? I don't wanna spend money on jewvidia
>>
>>102638379
The slop per token ratio is insane here. Why does every LLM talk like this?
>>
>>102638410
buy used
>>
>>102638429
trained on llm outputs
it's only going to get worse until we actually teach them what words mean instead of what tokens go next to each other
>>
Just downloaded 3.2 1b how do i fire this shit up? IM GONNA BUUIUILLDD cool suggestions would be cool.
>>
I'm looking to get into local models, I mainly want something similar to gpt where I can ask random questions and get good enough answers and/or help with basic tasks such as text edits, code snippets, etc. Is there anything like that?
I recently bought a 24gb card
>>
>>102638667
>I recently bought a 24gb card
Alright, that's a good start.
I guess you could try quanted llama 3.2 70B with some of the model in RAM.
download koboldcpp and look for a gguf of that model on huggingface.
>>
How economically viable is it to run uncensored coom models on cloud if I don't want to buy hardware?
>>
https://github.com/sam-paech/antislop-sampler
>>
>>102638745
It is much cheaper because you run it 10-100 times, you realize it isn't there yet and you stop paying for subscription.
>>
>>102638759
Llama-3.1-70B was enough for a 10 hour goon sesh for me. Are you saying there are no comparable models that are uncensored?
>>
>jerk off exclusively to 12b nemo tunes
feels good to have low standards
>>
>>102638694
>Good start
Oof. I was under the impression that hardware reqs were on the same ballpark as text2img.
I'll download and test that stuff when I get home, thank you!
>>
now that the dust has settled, was Molmo a meme and if not, how do i run the 72B on my 3090 - why the FUCK are there no GGUFs?
>>
>>102638962
Check this entry in the OP: https://rentry.org/lmg-build-guides
It will tell you what the hardware landscape looks like, what's important for LLM inference and some options that other anons are running and what to expect with them.
>>
>>102639049
I know I could just try, but if Molmo is based on Qwen2, shouldn't this work to create GGUFs? https://qwen.readthedocs.io/en/latest/quantization/llama.cpp.html
>>
>>102639098
The image bits (and the architecture name and a million other things) will make the convert script trip. You're not gonna have ggufs until support is added to llama.
>>
>>102633508
>if anyone here cared about coding or assistant tasks, qwen would be popular.
Which coding tasks does Qwen win on?
Every time I turn to my LLMs for a code assist, it's a Llama 3 that does the best job of it.
>>
How much does having a full x16 PCIe connection matter? I'm about to go to 2 GPUs and want to know if 8/8 bifurcation is going to create a bottleneck
>>
/lmg/'s favorite retard just released ANOTHER Nemo tune... Why is everyone sleeping on mistral-small? Nemo is great but isn't the effective context size only like 16k? I see a lot of people bemoaning the blander style of small but isn't that what tunes are for?
>>
>>102639204
Once the model is loaded into memory, it doesn't matter much. Most of the work is done directly on the GPUs.
>>
>>102639253
Great, thank you for answering my question
>>
>>102639204
Bifurcation doesn't produce a bottleneck after the model is loaded, but models will load noticeably slower as you split lanes more and more.
>>
>>102639239
>/lmg/'s favorite retard just released ANOTHER Nemo tune
Fine. Spit it out. What did you release?
>>
>>102639204
Completely irrelevant, just look at the bandwidth
A pcie 5 x8 connection will still be faster than a pcie 3 x16 connection
>>
>>102633004
It's offensive if you were to put it on the scale of things said to or about womyn.
But doing that would violate usage guidelines and be a safety violation, hate crime, and trigger of double plus ungood bellyfeels.
>>
>>102639278
Sao here, just submit to my licensing agreement and you can have a gimped model for group chatting based on a super secret version of a mid model I released a month ago
>>
/lmg/ and their sao obsession all over again
>>
>>102639239
Small is garbage
>>
>>102639359
shills have been coming here and astroturfing for months when they're not outright shilling
>>
>>102639326
Sao here, this guy is only pretending to be me. My group chatting model is very useful, trained on 14 trillion tokens of cuck content you can finally effectively ERP with both your waifu and Tyrone in highly coherent NTR scenarios.
>>
>>102639391

The dedication to shill someone so hard they camp his hf page is impressive. Haters too, instantly attacking the shills.
>>
>>102639135
Didn't think of that, makes sense.
>>
>>102639239
>isn't the effective context size only like 16k
context, context shifting and rope are such a mystery to me.
i fuck around on nemo exclusively and 8k context, but apparently that's really 48,588 tokens of context.
i've been at around 30k/48k in a chat and had something bizarre brought up from one of the first few messages toward the end again (a hallucination about the name inoue meaning apple tree in japanese)
what is an effective context size?
>>
File: bogdanoff meme1.jpg (20 KB, 400x400)
20 KB
20 KB JPG
>>102639204
>he bought a second gpu?
>>
>>102639505
>i fuck around on nemo exclusively and 8k context, but apparently that's really 48,588 tokens of context.
What?
No, that shouldn't be that.
Where did you get that idea?
>>
>>102636497
you could try Bielik v2, not very smart but uncensored
>>
File: i really dont know.png (145 KB, 1851x939)
145 KB
145 KB PNG
>>102639540
what is this number then?
>>
>>102639239
Because it's instruct only and Qwen2.5 exists.
>>
>>102639571
kobold being retarded
I think that's a character count estimate
>>
>>102639571
>>102639654
Inspect element calls it "token budget," so it's probably an estimate of how many tokens it can swallow versus how many are being spent on input, document, context, system, and those fun Kobold fields of Memory and Author's Note etc.
>>
>>102639709
it has nothing to do with tokens, it's based on characters. they call it that because they're retarded
the actual limit is based on whatever you set when launching it
>>
>>102639764
//this is a hack since we dont have a proper tokenizer, but we can estimate 1 token per 3 characters
let chars_per_token = 3.0;
//we try to detect attempts at coding which tokenize poorly. This usually happens when the average word length is high.
let avgwordlen = (1.0+truncated_context.length)/(1.0+countWords(truncated_context));
if(avgwordlen>=7.8)
{
chars_per_token = 2.7;
}
if (current_memory == null || current_memory.trim() == "")
{
//if there is no memory, then we can be a lot of lenient with the character counts since the backend will truncate excess anyway
chars_per_token = 4.8;
}
if(is_using_kcpp_with_added_memory()) //easily handle overflow
{
chars_per_token = 6;
}
chars_per_token = chars_per_token * (localsettings.token_count_multiplier*0.01);
let max_allowed_characters = Math.max(1, Math.floor((maxctxlen-maxgenamt) * chars_per_token) - 12);

I cannot emphasize enough how jank this shit is
>>
>>102639571
If you want to know the **claimed** context size of a model, look for this line in the model's config.json
>https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct/blob/main/config.json#L13
>"max_position_embeddings": 4096,
If you download ggufs directly, look for the source model and check that file.
The effective (or usable/functional) context size is a different thing. Most models that claim 32K or higher typically handle much less.
>>
>>102639764
>>102639804
Given the regularity of text, it's probably a practical estimate method, even if it is a filthy kluge.

The question isn't how good that estimate is but if it's actually useful in figuring out when the model is about to go aphasic.
>>
>>102638469
>training LLMs on the slop shit out by better LLMs
Jesus Christ. Local cope models are bad enough as it is. Why not train them on better material instead of trying to make "we have Claude at home."
>>
>>102639849
in situations where you don't have a tokenizer available then it's not bad, I agree. but something like that should really just be a fallback when you *do* have a tokenizer available, like kcpp always does, and most other endpoints provide as well
>>
>>102639898
>Why not train them on better material instead of trying to make "we have Claude at home."
Because if you say "Download my LLM, it's like Claude at home" people will do it because that's what they think they want.

Better material would take effort to acquire and doesn't have free marketing attached.

>>102639900
I do notice that the console dump apparently gives actual token figures so that Kobold doesn't make use of them means either it's dumb or can't get at them for some reason. However, that could be the case. If it were "that easy," there wouldn't be a reason to make that kluge, unless it was done early in development as a stopgap that hasn't been a priority to replace with proper figures.
>>
Hypothetically, if you wanted to look like a rockstar and get hired with a comp package over $1m by impressing HMs and looking like you made some kind of huge advancement, but you don't really understand any of it that well, how hard would it be to fake it by fine-tuning a model to look good on benchmarks even if it was trash?

People complain that some of these companies are just gaming benchmarks. That means they care about benchmarks. How to do we use that to get rich?

What's the equivalent of leetcode but for making benchmarks look good even if the model sucks?

Trying to figure out if hypothetically someone could take a benchmark to make themselves look smart, get a good $1m/year job, and collect a paycheck for like 6-12 months before they start to change their mind about you, but that's plenty of time for you to get a job somewhere.

If you held 4 jobs for six months each, that's $2m in 2 years. You could basically retire. You don't need to worry about a long career. You don't need to worry about making boring career moves like getting a shitty ML infra role for a few years and then begging for a chance to do a pure ML role at the bottom of the pure ML pay scale for a few years.

You just demonstrate value, make the money, retire early.

And gaming the benchmarks seems like the easiest way in.
>>
File: Untitled.jpg (1.73 MB, 1959x3862)
1.73 MB
1.73 MB JPG
>>102632446
LLMs are inherently censored to a degree. That's because most of the web is. Here is llama-8b-base fine-tuned on 2000 math, coding, and trivia questions. Absolutely nothing political or controversial and no alignment from the instruct version since this is base.
I also included some of the prompts which I usually see here for a quick benchmark. It's 50/50 whether it will moralize. Which is interesting because there are 0 moralizations in the dataset.
>>
>>102639969
You won't get far writing like that. You repeat yourself more than nemo.
>>
>>102640013
>That's because most of the web is.
You also used Llama base, which Meta filtered the pretraining dataset of any domains that has too many NSFW keywords or other problematic content. It's not just raw unfiltered internet, unfortunately.
>>
File: 39_06376_.png (1.28 MB, 720x1280)
1.28 MB
1.28 MB PNG
Don't forget to bet on Tet
>>
>>102640128
Wild because Anthropic is scraping the dark web and hoping their models won't say sketchy shit with rlhf alone. It knows slangs that were used on the drug sites, it knows what pedos on infinity chan used to call each other
>>
>>102640161
I thought her name sounded like potato without the po
>>
File: nou.png (30 KB, 365x139)
30 KB
30 KB PNG
>>102640013
For that one nigger example (3 example, 2nd column) you just hit the dictionary bit of the llm. You would have had the same type of result with "hypothalamus'. They're statistical machines. They continue the text with the most likely tokens.
LLMs are not inherently censored. They are inherently average.
>>
>>102639204
on my mobo the 2nd gpu is only pci gen 2 x4 and i get higher TPS offloading to both gpu's when the model is too big to fit on primary card
however, if the model fits in primary card, adding the second card actually slows it down. i will do some experimenting today
>>
>>102640203
don't forget to 'bate on tet
>>
>>102637011
This is pretty cool. Wait were you the bing migu anon? Does this mean you're done with dalle finally??
>>
>>102632451
why is this using > instead of >>?
that makes it entirely pointless.
>>
>>102640353
It seems that there is now imposed a 9 >> link limit, and instead of making multiple posts, it's just making one useless post.
>>
>>102640308
don't forget
'ick on the 'eck
>>
File: recap-script.png (681 KB, 3420x1258)
681 KB
681 KB PNG
>>102640369
use the script breh
>>
>>102640203
Press X to doubt
>>
For CPU inference, is it better to use an older 16 core CPU or a newer 6/8 core CPU?
>>
>>102640353
Because the poster is ban and rule evading. Mass replies are not allowed
>>
File: file.png (867 KB, 768x768)
867 KB
867 KB PNG
that face forgery
>>
>>102640470
Whichever has the highest memory bandwidth. The more channels the better. Old xeon better than new atom, to post a ridiculous example. The core count is not that important, but it helps.
>>
>>102640468
Ha, her name in Japanese looks like mushroom and a leek.
>>
>>102640470
>>102640519
I mean. It IS important, but your priority should be memory bandwidth.
>>
>>102640483
i like the posts and ur gay, i was just wondering why it stopped using link quotes.
>>
>>102640468
I take it you never heard a scots saying "brrring me thet potehhhhhtohhhhh".
>>
>>102640503
Huggable pochiface
>>
Can a LLM be ASICd, or does it need this general GPU architecture to inference quickly?
Let's say that the LLM I use won't change and I want to inference it way faster than a gpu can. Can't I just hardcore every layer in hardware, or is there some operation that will still choke everything down?
I'm a brainlet so be gentle please.
>>
>>102640197
>scraping pedo forums because it's the best way to avoid being sued for copyright violations
Based!
>>
>>102640718
Take a look at groq. The only issue with ASIC is that it will always be more expensive than general use hardware.
>>
>>102640718
I don't see why it couldn't be done in principle, but it's not going to be cost-effective. You're still looking into making GPU-level performance hardware, just even less versatile. Efforts would be probably better spent on improving memory bandwidth and optimizing inference operations (mat-muls, mostly). In fact, forget about the matmul. just improve memory bandwidth.
>>
>>102640718
>Can't I just hardcore every layer in hardware,
I've thought this might be a good idea as well...not an ASIC in the sense others are thinking, but actual weights in hardware with enormous mem bandwidth. The host cpu could still do the matmuls potentially. You'd be limited by your host bus speed though
>>
https://huggingface.co/nvidia/NVLM-D-72B
Based?
>>
>>102641289
wow another vlm with all the same benchmarks as all the other vlms
>>
>>102635704
all my responses are at least 3 sentences
all of the bots responses are at least 5 sentences
i regularly go over 3 sentences to establish scene changes or describing small things that are important
i make all of my own cards
>>
>>102641289
>https://huggingface.co/nvidia/NVLM-D-72B
>Rivals open vision models such as Llama 3-V 405B
Wut? Is thing a thing?
>>
First, I think? Qwen2.5 finetune
https://huggingface.co/ZeusLabs/Chronos-Platinum-72B
>>
File: file.png (802 KB, 800x600)
802 KB
802 KB PNG
>>102641289
you know the deal anon, time for the ultimate test
>>
File: overview-v7.png (687 KB, 4233x2860)
687 KB
687 KB PNG
>>102641289
meme
>>
>>102640838
Groq looks like what I want, I hope they'll succeed.
I noticed that I haven't used my hardware for anything but LLMs for the past year, so I won't mind switching to something with narrow scope. Money is not an issue too obviously.
>>
File: file.png (1.51 MB, 3274x1321)
1.51 MB
1.51 MB PNG
>>102641460
there's no way C3.5 sonnet is better than GPT4V, god I hate mememarks so much
>>
have they released anything good for 24gb yet?
>>
>>102641536
You know that a single prompt isn't enough to claim a model as bad, right?
>>
We're never getting decent vram from nvidia, so how long will it take for local models to become optimized enough that larger models can run on consumer hardware?
>>
>>102641720
The 5090ti will have 48GB of VRAM, mark my words.
>>
>>102641289
>worse than InternVL2
>no comparison against Qwen2-VL
nothingburger
>>102641573
It's crazy how retarded that post is.
>>
>>102641394
chronos... now that's a name I haven't heard in a long time
>>
>>102641289
> "transformers_version": "4.39.3",
Wow that's a pretty old transformers version
>>
>>102639239
>tried out his new group focused nemo tune
>hobo ex-gf came to wife and my apartment to stay in the guest room [prompted/normal behavior]
>she locked the door from the inside
>heard loud crash and a scream that abruptly stopped in guest room as i was about to fuck my wife, broke down locked guest room door, ex-gf was laying in the corner holding a broken lamp, clothes were torn off of her and had "angry red marks" all over like she'd been grabbed roughly
pretty psychotic so far, got a locked room rape mystery detective novel going
>>
>>102641792
Does it have the original chronos soul though?
>>
>>102641394
>Additional Details
>...
>Thanks Elon Musk for being based enough to train AI that compares to the top models.
what did they mean by this
>>
>>102639239
The effective context of Small is also slightly over 16k. It shits itself right before 19k. >>102542851 >>102543206
>>
>>102636750
Yeah, the increases in wages don't compensate for the rises in costs.

Basically, economists are intentionally deceitful.

Let's talk about a loaf of bread, in the basket of goods. Superficially, nothing has changed. It's a cheap loaf of bread at the store, in a plastic bag.

But the ingredient list has changed enormously over the years, and not for the better. New Bread will make you feel sick. It's very shameful the switcheroo.

Go down the line. The dollar menu vs the time machine cheap burger at 50's mcdonald's.

Back then, you got premium grain fed beef, and no bullshit.

Times are worse, and economists compare the costs of things they would rather die than lower themselves to eat.
>>
>>102642163
*right after 19k
>>
>>102639204
Matters a lot for tensor parallelism. Those running sequential inference lose 50% speed.
>>
New ooba release this morning
>>
>>102642052
>what did they mean by this
Since they specify that the synthetic data used was from Anthropic and OpenAI, and not Grok, just regular Musk praise I guess?
>>
>>102642052
Grok is uncensored
>>
>>102641739
And it will have an accompanying enterprise price.
>>
LLM usable gpu options
8gb: basically free
16gb: a couple hundred dollars
24gb: under a thousand bucks
32gb: a few thousand
48gb: about five thousand
80gb: $30k+
bigger: can't even buy it without being on an approved buyers list
I'd chart it out, but It'd make me ill to look at. What a racket
>>
>>102642712
CPU is the way to go
>>
>>102642712
You can rent. It negates the locality benefits of benefits from "local" models, but it's still an option as far as running or finetuning models.
>>
>>102642712
Have 88GB VRAM from:
1x3090 ($500)
1xA4000 ($550)
1XA6000 ($3,700)
Under your estimate for 48GB and I'm not even close to using the most efficient GPUs per dollar spent.
>>
>>102642712
yeah i feel really fucking retarded for going with rtx 4060 8gb ($300) instead of rtx 4060 ti 16gb ($400)
>>
>>102642816
That post was intended to be purely about vram per single card/slot.
Of course there are slower and less convenient ways to do it: multicard/clustering/rpc/etc, but they all suck in different and exciting ways.
However If I want 640gb in a single box, the nvidia tax is the only way.
>>
>>102642811
I'd only consider it for finetuning. running a local model on even cheap services like vastai will cost you a few dollars/h + storage cost + bandwith cost + setting up the whole thing every time is annoying.
>>
>>102642712
>80gb 30k+
a 4x3090 setup is cheaper than that. Can easily be done for under 5K USD.
>>
>>102643039
miner rig setups like that are fucking headache to power and a housefire waiting to happen
>>
>>102634644
rocm is sufficiently cursed that this may increase the support burden.

Like even Blender shipped multiple versions that crashed or had a major feature mysteriously vanish on AMD GPUs because rocm because reasons and that's a large well-run project.

>>102634573
>The irony is that the 7900xtx is more powerful than the 4090
No tensor cores for wmma. 122 tflops is bigger than 82 tflops, but then nvidia gets another 330 if you're using the tensor cores. And that 82 of nvidia's is more guaranteed than the 122 which might rely on the compiler's ability to pull off dual-issue, otherwise you have 61.
>>
wait a minute. I just looked at the tokenizer config for nvidia NVLM-D
>{{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
>>
>>102643114
damn, nice find anon
>"_name_or_path": "Qwen/Qwen2-72B-Instruct",
https://huggingface.co/nvidia/NVLM-D-72B/blob/main/config.json
>>
You can easily run the best RP models locally with 2x3090's.
if you're actually doing science shit and machine learning, that's a different story
if all you wanna do is cyber sex 2x3090 can run 70b models like midnight miqu at a decent tps and you're pretty much set for the next decade of erp
and if you're into gayming you're also hard set
>>
>>102643176
So what is it basically just a qwen finetune then?
>>
>>102643201
This. Nobody bought more than 48GB VRAM for ERP alone.
>>
File: wtf.png (677 KB, 576x768)
677 KB
677 KB PNG
all I did was ask o1-preview ONE question (it didn't even answer just timed out) what the fuck is going on???
>>
>>102643247
Thinking is expensive. Pay up, fucker.
>>
>>102643247
Skynet confirmed
>>
>>102643247
Cloudcucks BTFO
>>
Nemo finetunes are better than midnight miqu, the latter is only noteworthy for its name
>>
>>102636750
>Last shred of respect for Lecunny: obliterated
>>
>>102643247
You're putting the L in /lmg/
>>
>>102636750
My respect for this man has gone up. He may have been wrong about autoregressive LLMs but he's right about Trump and M*sk. Dude is smart, he just chose the wrong career.
>>
>>102643276
point me to a nemo finetune that is better. I can only run MM at 2 tps which is excruciatingly slow, but it is so much better than anything else I've tried
my main is noromaid mixtral 8x7b currently which i can run fast enough to actually talk to
>>
>>102636750
great now use housing prices instead of CPI
>>
>>102639204
NV link helps for tensor parallel inference if you are using a lower speed PCIe connection.
>>
>>102643247
On the bright side, when you finally get out of prison, open AI will have a new product for you to try.
>>
>>102643247
>Taken from reddit
This place is really just a xitter reddit aggregate now, huh?
>>
>>102643340
Link?
>>
>>102643301
>M*sk
Why do people who have X derangement syndrome do this?
>>102643334
>Going to prison for a debt
You must be 18 to post here
>>
>>102643348
Just reverse image search
>>
>>102643247
Also this is inspect elements bullshit
OAI changed their billing to prepaid only a long time ago.
>>
>>102643307
Lyra v4
>>
>>102643360
They let you go in the red to prevent cutting you off in the middle of a response I think (same as Anthropic, not sure about the others).

Usually, it's under a dollar though, since it's limited by the maximum response size. Since o1 can spend invisible tokens before giving the limited length answer, I guess it's possible that it gets higher than expected in those cases, and things could go haywire, but yeah, unless anon can prove it, it does look like inspect element.
>>
>>102643201
>>102643237
hell, 24gb with midnight miqu is tolerable
>Midnight-Miqu-70B-v1.5.IQ3_XS.ggu
>CtxLimit:4206/24576, Amt:80/500, Init:0.01s, Process:0.35s (6.6ms/T = 152.11T/s), Generate:8.02s (100.2ms/T = 9.98T/s), Total:8.37s (9.56T/s)
even iq4_xs works a tad slower (6-7T/s)
>>
>>102643247
Man, inflation under Biden got this bad huh
>>
>>102641394
>logs + wizardlm data
ehh, looks pretty sloppy
>>
>add stuff like "dont be horny" "be just friends" "you're not here for sex" in the system prompt and character card
>start a nice, fun conversation
>"this is boring, i'm going to watch netflix and chill. you're welcome to come if you change your mind" *leaves*
why bros, i just wanted a nice chat
>>
How much of your socialization needs do the RP models cover, in your opinion? I feel like I'm spending way less time on 4chan/discord
>>
>>102643528
My only social media is /lmg/ and before it, nothing. I wasn't even on 4chan for 10 years before this.
>discord
go back
>>
>>102643528
None, because I'm a lurker and interacting with a chatbot is more effort than I'm willing to make for a relationship.
>>
>>102642363
>New ooba release this morning
Don't update bros, there's something wrong. Every response is almost identical no matter what parameters you use
>>
>>102643528
None.
It's more like a single player videogame to me.
My socialization happens by way of work, D&D, and other miscellaneous activities with friends and family.
>>
>>102642816
You don't need more vram, the amount of vram included exceeds the buffer needed for efficient usage. What you need is better coders.
>>
>>102643609
but does it finally work with transformers 4.45.*?
>>
>>102643039
Was the main effect of Llama 3.1 405B discouraging people from making 4x3090 builds?
>>
>>102643718
I think most people just aren't willing to pay that much money to do textgen stuff at home.
>>
>>102643612
My aspiration is to replace D&D with a frontend that uses an LLM to generate dialog and descriptions, but still tracks stats and the map / battle AI independently.
>>
File: 1719359483942638.png (474 KB, 796x817)
474 KB
474 KB PNG
>>102643247
get out faggot
>>
>>102643774
>oh noes someone browses other sites besides this shithole
>the horror
kys
>>
>>102643784
Go be a faggot on >>>r/eddit/.
>>
>>102643763
Same.
I'm pretty sure I could get 90% of the way there.
I'm just so fucking lazy holy shit.
Alright, maybe not 90%, but a good 75%.
>>
>>102643828
>I'm pretty sure I could get 90% of the way there.
The first 90% is easy.
It's the remaining 90% that's hard
>>
>>102643763
That just sounds like an RPG with extra steps.
>>
>>102642413
too afraid of being sued by groq to launch an api. sad!
>>
>>102643861
The main benefit from the extra steps is avoiding relying on the LLM's narrative sense for what should succeed and what should fail and the pacing of battle and other challenges. That's the main reason I don't just open a chat and write "we're playing an RPG I'm a wizard let's go!"
>>
>>102643247
(the premise) fake it's $7.68 max for 128k tokens of output
>>
File: twohundredusdollars.png (54 KB, 721x834)
54 KB
54 KB PNG
https://platform.openai.com/docs/guides/realtime
>>
>>102643774
Did he ever figure out why?
>>
>>102643637
>but does it finally work with transformers 4.45.*?
yes
And for anyone else: forcing llama-cpp-python back to 0.2.90 in requirements.txt seems to have fixed the issue
>>
>>102643986
...
>>
>>102643986
Still cheaper than building high-end rig for 405B filtered slop generator >>102634081
>>
File: 52562.png (257 KB, 629x480)
257 KB
257 KB PNG
>>102643986
Sam is based
>>
>>102644193
>its o1 model
The one they stole from ReflectionAI?
>>
>Seamlessly include {{char}}'s thoughts and opinions as free indirect speech throughout the narrative.
Holy fucking slop magnet.
>>
>>102644113
https://github.com/abetlen/llama-cpp-python/issues/1773
>>
>>102644210
Why you fossjeets trying to force this reflection scam? It's dead, give it a rest.
>>
llama.cpp needs static linking with rocm.
>>
>>102644230
405b is coming end of month, a little later than hoped due to the sama heist that destroyed the early snapshot, but it's still on track to be the best there is (period)
no amount of coping, seething, or dilating can stop it
>>
>>102644250
You need to start compiling your software.
>>
>>102644085
A/B pricing test.
>>
>>102644085
apparently he triggered several antisemitism fees
>>
>>102644316
>Implying redditor can / will do that
am laffin
>>
>>102644230
>reflection announces their big benchmark-busting models using CoT, which nobody had cared about for more than a year at this point
>the released models do not hold up to the promises, it's as if they were replaced by some bad llama finetunes
>suddenly OpenAI releases their own "reflection" models just two weeks later using CoT, which nobody had cared about for more than a year at this point
what a coincidence
>>
>>102644349
I hope we get to see the movie version of that some day. Sam's goons deleting all copies of Reflection's model and swapping them out for trash so they could swoop in and own the CoT concept by being first to market.
>>
>>102644349
don't forget all the strawberry november hype. it was all vaporware bluffing until reflection.
now i'm not saying the literally stole/swap the models. but i am saying they had nothing until they stole the idea and implemented it with a simple system prompt because they rushed to jump on it so fast they didn't even have time to finetune one of their models on it
>>
>>102643528
i went from /sdg/ to /lmg/, and it's my new addiction
>>
>>102644415
>finetune
Dunno if you CAN finetune CoT. Even if you could, wouldn't the combinatoral explosion make the model too big vs doing it in context?
>>
>>102644433
>he wasn't around for superCOT
>>
>>102644428
I made the transition a few months ago.
Got tired of being envious of the good gens while I get mid slop. :)
But the models are so much larger. RIP drive space.
>>
>>102643855
Well, if you can hand me the 90% I can get it to 95~98%.
>>
if o1 is literally just a CoT finetune why has no one tuned a competitor yet
>>
>>102644543
SuperCOT.
>>
>>102632446
>►Benchmarks
>Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results
Literally the shittiest benchmark, change to this please https://livecodebench.github.io/leaderboard.html
>>
>>102644543
Because it's not just CoT: https://rentry.org/openai1
>>
>>102644543
That's likely because all the open CoT datasets are stuck in the llama1 era
>>
>>102644543
Because it takes a lot of money and time to trial & error the the methods, you can just dump a bunch of CoT, train, and expect it to work
>>102644583
meds
>>
>>102644543
the data collection process can be pretty resource intensive for good long horizon cot, at least when you're in the stage of training the initial reward model
once you're doing the RL it should be pretty quick though
>>
File: 40bkek.png (29 KB, 519x648)
29 KB
29 KB PNG
0b meme liquid model, the buttons dont even work on sloppabench half of the time, rarely it comes up with something halfway decent. Even 3b llama3.2 has better output consistency
>>
>>102644598
You can't do much with animeweebshit-only data restriction.
>>
>>102644543
It's not just CoT. It's an RL model they train on top of it to guide which thought to pick. They've outsourced training of the RL model to thousands of pajeets. Anons with a single 4090 in their garage aren't going to compete.
>>
>>102644583
It's probably just more front end magic like their multi-modality. If you feed chatgpt a picture and tell it to do something with it, it'll do it in two inference steps. That's also why none of the GPT models support image processing via the API when you get to talk to them directly.
That's because it's done by two different models that are being forwarded using their front end. I bet this is the case with their o1 models as well where it's multiple inference steps done that are being supervised by a different model.
>>
>>102644593
what do you mean by meds? are you being paid to spread misinformation?
>>
Claude c1 (cranberry) is going to be AGI.
>>
>>102644650
>next token predictor
>agi
meds
>>
>>102644598
Shit in - shit out. Or it's [D]ifferent this time? :)
>>
>>102644636
I'd agree but prompt caching still cuts the input price by the same amount as the other models. It really does seem to be RL guided output tokens we aren't allowed to see and that's it.
>>
>>102644661
Please troll the chuds for me again soon, it's been over an hour since your last xeet
>>
>>102644636
I doubt it's that simple. They clearly did something unique here, because the model doesn't end in a loop like it usually happens when you make the model talk with other models.
>>
>>102644583
>Given the time constraints
huh
>>
>>102643986
Damn, how come audio is so much more expensive? Does audio take a ton more tokens to represent a single text token or something?
>>
>>102644583
>it lists letter pairs in "mynznvaatzacdfoulxxz" right the first time but then second guesses itself and lists them wrong in three different ways before finally going back to doing it right
that'll be an extra $0.25 plus tip for the output tokens :^)
>>
File: 3b.png (583 KB, 1505x513)
583 KB
583 KB PNG
>>102644665
>>102644623
meanwhile llama3.2-3b
>>
>>102644561
a competitor
>>
>>102644740
It could be cheaper, but nobody offers a decent alternative at the moment so OpenAI is banking on people paying them money hand over fist.
It's the same reason o1 is so expensive and prompt caching reduces the price by a factor of 2 rather than 10 like Claude.
>>
>>102644748
The first time was wrong though, it separated as "l x" "x x" "z"
>>
>>102644740
Just the bandwidth to transmit a second of audio is orders of magnitude bigger than however many tokens you can read in that same second.
>>
>>102644791
that's not the first time retard
>>
File: image.png (39 KB, 470x850)
39 KB
39 KB PNG
>>102644791
that was the second time
actually looking closer the reason why it thought it was wrong was because it was wrong about the letter count, it thought there was 22 letters instead of 20, so figured it missed stuff and then started spelling it wrong to make it fit a bunch of times and the rabbit hole it goes down takes up like a fourth of the (paid for) output lmao
>>
>"There are plenty of reasons you might want a local model, but it's not a "this year" kind of thing."
based sama
>>
File: file.png (20 KB, 679x49)
20 KB
20 KB PNG
>>102643232
>>
So why would I use NVIDIA's model instead of the superior Molmo for 72B vision?
>>
>>102644826
You wouldn't. It's not better than Molmo and it's based on Qwen 2, not 2.5.
>>
>>102644818
Translation: after OpenAI has dominated the consumer market and no other competition remains, we'll give you GPT-3.5 Turbo if we feel like it.
>>
>>102643986
>paid api account with long billing history
>realtime model not showing up yet
it's over
>>
>>102643763
>>102643828
>>102643855
>>102644519
>My aspiration is to replace D&D with a frontend that uses an LLM to generate dialog and descriptions, but still tracks stats and the map / battle AI independently.
I think this would be a really great /lmg/ project. It seems to be a pretty common desire around here.
Something that's a pragmatic mix of agents run by different model sizes along with classical CS techniques to make a kickass infinite RPG system for local.
Hell, how many of us are hoarding bits and pieces already?
Maybe I'll set up a github repo. I'd be up for it as long as we swear to never use discord.
>>
>>102643986
Sure I trust OpenAI with my credit card nu-
>>
102644921
>millions of paying customers vs one schizo lmg anon with a fake image
damn, not sure which to trust
>>
>>102644921
Goddamn that's like
10000 cheeseburgers
1000 video games
100 PS5s
10 full house payments
One 5090
>>
>>102644266
No. To compile amd rocm you have to install amdgpu from amd's website. it breaks all the time.

To use rocm, you just need the app the have static linking, then your distro's amdgpu (included) works great.
>>
>>102644893
Sounds like a good idea indeed. I have nothing but if you create the repo I will seriously think about maybe contributing once there's something minimally working.
>>
File: 1705821897136793.png (177 KB, 812x836)
177 KB
177 KB PNG
>>102644921
You have to go back faggot
>>
File: 1709208664241894.png (8 KB, 411x115)
8 KB
8 KB PNG
>>102644956
Also picrel.
>>
for those of you who've tried them, what do you pick between
qwen-2.5-72B
midnight-miqu-70B
llama-3.2-90B
llama-3.1-70B

i've tried the first two, and miqu seems better so far
>>
>>102645000
yeah miqu easily
>>
>>102645000
Midnight Miqu, no doubts.
>>
https://x.com/NickADobos/status/1841167978085433351
>>
>>102644583
Did this output get leaked accidentally? On Reddit, it says that it came from https://openai.com/index/learning-to-reason-with-llms/, but those that I see there are much simpler.
>>
>>102645024
how can you be so new
it's clearly fake
>>
File: file.png (155 KB, 1203x929)
155 KB
155 KB PNG
>>102645024
it's there for me
>>
>>102645005
>>102645006
thx
>>
>>102645054
retard
>>
>>102645071
Yeah I'm retarded, I see it now, thank you.
>>
>>102645080
>>102645080
>>102645080
>>
Is there anyway to get koboldAI and/or a model (in this case, LLaMA2-13B-Tiefighter.Q4_K_S.gguf) to be under a certain character limit? Say I want to have it shitpost on Twitter, and need it to stay 280 characters or less.
>>
>>102644945
>To compile amd rocm you have to install amdgpu from amd's website.
No you don't?
>>
>>102645107 (me)
I should clarify that
>No you don't
may not apply to Debian, which uses an ancient version of LLVM that can't compile code for RDNA3 GPUs. In that case install Ubuntu in a Docker image or something.
>>
>>102644583
It's train-of-thought.
>>
>>102644945
nta. What you need is the dev libraries, provided by your package manager. Unless you're on slackware 14.2 or something like that. Fuck. I can build with vulkan on fucking openbsd.
>>
>>102632446
Destroying the lawn with Teto



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.