[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: GbnnJ3LaIAAjRaA.jpg (257 KB, 809x607)
257 KB
257 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103102649 & >>103090412

►News
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b
>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory
>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B
>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>103102649

--Papers:
>103106651
--Anon shares Command R v01 optimization settings for 3090:
>103107814
--Open vs. Closed AI models performance comparison:
>103104597 >103104663
--Discussion on context window problem and potential solutions for LLMs:
>103102667 >103102682 >103102790 >103103030 >103103109 >103103342 >103112176
--Anon struggles with GPU installation and PCIe slot access:
>103106832 >103106840 >103106869 >103106906 >103107498
--No local audio models like Suno and Udio due to compute and captioning challenges:
>103104219 >103104937 >103105000 >103105202 >103105299
--HuggingChat's zero guest limit sparks frustration:
>103108621
--Anons discuss why the hype around AI models died down:
>103103041 >103103096 >103103241 >103103260 >103103306 >103111244
--Anon wonders if code style affects LLM performance:
>103108627
--Anon shares image of a maze game task that stumps most vision models except Claude 3 Opus:
>103112145 >103112443
--Anon seeks multimodal weight models for image ingestion and multi-turn conversations:
>103111213 >103112084 >103112097
--US administration supportive of open source AI:
>103111736
--OuteTTS model supports voice cloning and can be integrated with ChatGPT:
>103103106
--Anon shares news of a new LLM optimizer:
>103102679
--Anon shares Loss Landscape Visualizer, another anon questions its usefulness:
>103105238 >103105269
--Anon gets Fish Speech working, shares voice recording:
>103103961 >103104024
--Anon asks for secure way to run model inference on homeserver over API:
>103111509 >103111540
--Anon asks for help choosing Nemo 12B Instruct GGUF model with 6GB VRAM:
>103107534 >103107540 >103107552 >103107588
--Miku (free space):
>103106688 >103107840 >103108110 >103108334 >103109117 >103109167 >103110709 >103111430

►Recent Highlight Posts from the Previous Thread: >>103102651

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
we are so back
>>
Mikulove
>>
>>103105000
Udio was definitely trained on rateyourmusic/musicbrainz tags, someone found out that if you copy the tags wholesale from an album it tries to imitate whatever artist it is. I tried a few with Radiohead and it definitely sounded like Thom Yorke, was about as understandable as him too.
They filtered that shit out pretty quickly though, but the RIAA found out that you could just specify the artist with spaces between each letter (e.g. M a r i a h C a r e y) and they still have not filtered that.
>>
File: Screenshot.png (204 KB, 930x1794)
204 KB
204 KB PNG
Hugging Face's top creative minds facing harassment? Hugging Face admins looking the other way?
>>
File: 1730027367067173.jpg (64 KB, 836x514)
64 KB
64 KB JPG
>>103113157
Based and mikupilled!
>>
>>103113260
(You)
>>
>>103113268
>stop noticing!
>>
File: 1727975175997492.png (823 KB, 622x558)
823 KB
823 KB PNG
>>103113268
>>103113275
>>
>>103113242
>function calling
>creative mind
Lol
>>
petra is on my screen.
petra is my neighbor.
petra is on my mind.
the faggot has me in chains and petra plays games.

i don't recognize myself again.
>>
>>103113462
petra 13b generated it btw, what a schizo model, kekelerino lolerino, right miku trad chads?
>>
>>103113498
i am tired of your mockery, faggot.
is it the chains thats the faggot's curse or for the curse that you threw away all your keys? your thought doesn't pass as anything but junk, do you really exist by your words if nothing reflects you truly?
you let go and forsake anything real, and that's why you find yourself fucked up, you dispose all potential for true good. why do you lust after poison so much? don't you know that a certain poison turns you numb after a while? and numbness turns you dead after a while? and death descents into emptiness after a while? and that from emptiness you preached to emptiness you descent, go back into your cold empty cave if you enjoy it, you already have discovered all that there is out for you, why do you lust under such under promises?
>>
I regret trying out 120b models, now my 70b models feel like retards but the 120b models are too slow.
>>
I now finally have enough vram to enjoy something like mistral small Q8 at 30k context.
I also have enough vram to notice that from 8k onwards things go to shit quickly.
At higher context stuff from the previous message is forgotten sometimes. Feels pretty bad.
More than smarts I want something like nemo or mistral small level with actual context.
>>
>>103113669
>At higher context stuff from the previous message is forgotten sometimes. Feels pretty bad.
Are you peraphs using Q4 context?
>>
cudadev, if you see this, I re-wrote the llamabench.py and the tests for the llama.cpp server in python. Working on the other requests one at a time as well.

I wanted to ask you the proper procedure for pull requests for feaures, should I do a single pull request for each one, or group the test scripts together as one?

These are done:
>The llama.cpp HTTP server currently has Python tests that call the API - can be improved
>better error messages for failure modes of scripts/compare-llama-bench.py.

Next up is
>-llama-bench - add an option for determining the performance for each token position instead of the average of all tokens in the range.
>-a simple Python script that does a performance benchmark for batched inference on the server

After that, I'm gonna focus on creating some sort of benchmarking process for sampler usage.
-Better methods for evaluating samplers
-A script that can be used to evaluate the performance of the sampler on a given dataset
-Blind testing ala ChatArena where users are given two completions for a task and asked to choose the better one, created from two different random presets.
-Use a model to generate multiple completions for a task with different seeds. Then rate/judge teh completions in terms of quality & diversity
-Establish which sampling methods incur the smallest quality loss for a given target diversity.

and regarding
>-Scripts for submitting common language model benchmarks to the server also allow for a comparison with other projects.
Can you expand on that?
>>
>>103113709
No, full.
I played around with Q4/8 for KV Cache and it felt the same in quality. But it slows down stuff too much.
The repetition is bad too.I know its not groundbreaking news but it just sucks how it slowly creeps in and if you didnt take care of it a couple thousand tokens earlier you can edit the next 10 responses.
Otherwise you get a variation of a sentence thats worded slightly different but has the same meaning. Not sure if that makes sense.
As a vramlet I didnt have to deal with that much really. Model felt more robust.
>>
File: 1728968511562069.png (747 KB, 1346x1996)
747 KB
747 KB PNG
>>103113462
>>103113157
PSA: Petra/blackedmikuanon/kurisufag/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon is from... SERBIA
https://archive.4plebs.org/pol/search/tnum/487155078/country/RS/page/2/
>>
>>103114095
>pol incel spammer
Oh now it makes sense
>>
>>103114095
>SERBIA
do not utter such vulgarities in this thread
>>
File: 1693833646153950.png (1 MB, 1704x1117)
1 MB
1 MB PNG
>>103114095
PSA: this mikufaggot is a doxxer.
>>
>>103113723
>I wanted to ask you the proper procedure for pull requests for feaures, should I do a single pull request for each one, or group the test scripts together as one?
Do a single pull request for each.

>-Scripts for submitting common language model benchmarks to the server also allow for a comparison with other projects.
>Can you expand on that?
When academics introduce new techniques or quantization formats they usually measure the impact on quality using benchmarks like MMLU.
llama.cpp currently has no standardized procedure for evaluating models on these benchmarks so a comparison of llama.cpp quality with academia is difficult.
A standardized way for running benchmarks would also make it possible to compare the impact of things like different prompting techniques and quantization, especially between models where perplexity comparisons don't make sense.
So you could for example better determine whether a heavily quantized 123b model or a less quantized 70b model performs better.
>>
File: Instagram.jpg (539 KB, 1080x1425)
539 KB
539 KB JPG
Zucc's genius PR campaign is a lie. He still censors wrongthink on his platforms, LeCun would never work for a real based libertarian. The idea of uncensored llama4 is laughable.
>>
>>103114095
petra is global, every open wireless network under the area of mexican USA border has been used by petra
>>
>>103114209
Based, we local chads want actually intelligent llama4 model, no one wants artificial racist incel on their machine.
>>
>>103114209
This just sounds like a butthurt mod powertripping, but you're right about Llama 4 probably being a piece of shit
>>
petra forced me to wear a burka to suspense her lust for my big serbian cock.
>>
>>103114209
I don't care about my Trump or Biden.
I just want the smart model to COOOOM. That's it.
>>
>>103114301
if you just wanna coom just make a tulpa of joe biden and rape him, he will shit himself speechlessly from pleasure.
>>
File: file.png (121 KB, 297x913)
121 KB
121 KB PNG
>>103114209
many such cases!
>>
>>103114180
Thank you, will do.
Coincidentally, that(benchmark testing) lines up with my personal project. Was coming up wtih an eval plan to allow users to benchmark their model of choice to see how it would perform. I'm trying to build something like 'The Primer' from 'The Diamond Age'.
So this works out great. Get to work on my project while also helping llama.cpp. Good stuff.
>>
>>103114358
Hmm yes I see
>>
>>103113723
>>103114180
Also since you asked about PR procedure: try to get feedback early if you are unsure about the best way to do something.
I am obviously in favor of the things that I suggested but our ideas of how those things should be done could be very different; in the worst case scenario your efforts could be wasted.
And keep in mind that I am not the only stakeholder, others could also request changes (though I'm fairly confident that things like tests and dev tools would be well-received).
>>
>>103114358
>kick out the annoying retards
>userbase average becomes more left-wing
Really makes you think.
>>
>>103114460
trump won
>>
>>103114447
Absolutely, much appreciated.
Don't have much ego attached to it, at the very least it would help better define what is wanted, which could then lead to the 'ideal' version.
>>
>>103114209
I think it's safe to say that American open source models are dead, with how strongly LeCun opposed Trump. The hope rests either on France or China.
>>
>>103113157
Vramlet here I haven't touched any anything since two years ago. What's the best model for ERP right now that I can run with 32vram?
>>
>>103114666
32? prob a small quant of either a qwen2.5 / nemotron tune

https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0
>>
/lmg/ is so dead that I won't even tell the obvious shill to buy an ad.
>>
This feels like a huge waste of tokens, but has anyone had success with something like this (sub-70b)?
>[high temp response]
>Sys: Do some self-crit.
>[low temp self crit]
>Sys: Taking that into account, rewrite your last response.
>[low temp rewrite]
I've noticed models are better at critiquing their own writing than actually writing.
>>
>>103114707
/lmg is dead because of your stupid ass screaming shill at every suggestion
>>
>>103114723
Then stop suggesting braindamaged slop.
>>
>>103114723
if you insist. buy an ad you dumb nigger
>>
>>103114730
The eva one is not brain damaged though.
>>
File: 51 percent.png (73 KB, 923x774)
73 KB
73 KB PNG
Haven't reported on its status in a while, but INTELECT 1 has passed 50%
>>
>>103114776
ETA?
>>
>>103114780
2 more weeks
[spoiler]Probably 25 days assuming no more rollbacks to a previous state occurs.[/spoiler]
>>
>>103114776
I'm really interested to see the result. Also I hope they'll do a bitnet model after this one (if it's even possible)
>>
File: file.png (2.32 MB, 1679x938)
2.32 MB
2.32 MB PNG
pic related gives me a sad insight into normie LLM usage. he scambaits with l3-8B Q6... and it simultaneously works better than expected since scammers buy into it, and it fucking sucks cause it is l3 8b... I would imagine even gemma 27B would make for a huge improvement but normies are impressed anyway.
>>
>>103115232
If you want me to be sad, you'll have to translate "scambait" for me. I don't keep up with zoomer buzzwords.
>>
File: 97340349.jpg (411 KB, 1024x1024)
411 KB
411 KB JPG
>>103114666
questions like these are why mcm miku was created
> but noooo troll anons had to nuke mcm miku from orbit
>>
>>103115232
L3-8B is good enough for pajeets and he's running a TTS along that so he needs speed more than anything
>>
llama.cpp hunyuan compatibility status?
>>
>>103115501
>L3-8B is good enough for pajeets
Until you actually listen to a call or two
>>
File: India IQ.png (12 KB, 716x237)
12 KB
12 KB PNG
>>103115232
Indian's are unironically stupid as hell. A LLM with all its current day flaws and limitations are still smarted then the average Indian.
>>
File: file.png (228 KB, 3085x1305)
228 KB
228 KB PNG
>>103115533
Status: Forgotten already
>>
>>103114723
>defending astroturfing
Buy a fucking ad, shill.
>>
>https://hf.co/datasets/lmg-anon/vntl-leaderboard
interesting, what's the state here with jp translation?
is reliable and accurate live translation still out of reach?
>>
>>103113157
Are LLM's purely planet bound? From my understanding you can't have complex electronics in space since they just get fried. Since complex electronics get fried, does this mean we will never actually have something like HAL-9000 or any AI or LLM in space?
>>
Are you tired of your old boring local LLM?
Check out Magnum v4, by Anthracite!
Magnum v4 is not just an upgrade, it's a leap forward! This family of cutting-edge models ranging from 8b to 123b deliver unparalleled accuracy, coherence, and creativity. Imagine having a virtual assistant that understands context, generates human-like text, and even writes poetry and stories, that's Magnum v4 for you!
Whether you're a writer, coder, researcher, or simply curious about the power of local LLMs, Magnum v4 has something for YOU. Dive into a world of endless possibilities with Anthracite's latest masterpiece!
>>
>>103115741
oh wow! i want to suck Anthracite's cock straight away!
>>
>>103115694
>is reliable and accurate live translation still out of reach?
Kinda. I don't know japanese but from comparing output with deepl and other stuff like this I think it is mostly accurate. And it will always output an actual sentence. But it has llm quirks like falling into loops or context being a double edged sword that helps but can also make it worse.
>>
>>103115714
Google NASA and prepare to have your mind blown. They sent complex electronics not just out of the planet and into space, but even outside of the solar system.
>>
>>103113157
Are LLMs purely Earth-bound? From my understanding, you can’t have language models floating freely in space because they’d just lose context. Since complex prompts get scrambled in zero-gravity, does this mean we’ll never actually have something like ChatGPT-9000 in space?
>>
>>103115755
there are 34 of them, your jaw will get sore
>>
>>103115741
>unparalleled accuracy, coherence
Punching above the weight spearheading into trading blows it is the gamechanger of a showstopper.
>>
>>103114095
>petr*nny is s*rb
That explains everything. Only a brown balkanoid could go on this long without payment. Also explains his obsession with ugly blonde chick.
>>
Anything under 70B is useless trash. CPUMAXX NEGROIDS!
>>
Bad /lmg/. Slow thread is not an excuse to start shitposting.
>>
File: 1700276811300594.jpg (97 KB, 680x680)
97 KB
97 KB JPG
>that fucking cactus
>>
File: 675585674.jpg (461 KB, 1550x775)
461 KB
461 KB JPG
>>103115901
shitposting is the only we survive out here son
>>
I honestly had my hopes up for a new model after burger election...
>>
File: file.png (105 KB, 1165x693)
105 KB
105 KB PNG
>>
>>103116002
Jan 20th is actual inauguration and meta said Q1.
>>
>>103116075
>meta
Anyone who has been here for l3 launch, is a coomer and still has hope for meta?
>>
>download LM Studio
>Download some "uncensored llama 7b"
>it produces garbage
Color me not surprised.
>>
>>103116062
do redditors really
>>
File: file.png (48 KB, 1159x380)
48 KB
48 KB PNG
>>103116103
Of course.
>>
>>103116087
3.1 was a improvement over 3. 3.2 seemed to be just a experiment.

And with 100x more compute planned for L4 over 3 AND them saying it was also going to somehow be faster (layer skip? quantitation aware training? bitnet?) then yes?
>>
>>103116153
It is gonna use all that diminishing returns plateaued intelligence to refuse all attempts at sex.
>>
>>103116176
>diminishing returns plateaued intelligence
We aren't anywhere near that yet.
>>
>>103114095
"serb" this place is not better in the demographics department then muttistan odds are he is a gyspy or some goblin if he actually is from here more concerning then that this is probably a pysop last time there was anything related to serbia was random hate in pol eg "smrd" and other shit which was about 1-2 months before the kosta (our first school shooter) psyop happened
>>
>>103116088
the sad truth they don't want you to know about local models...
>>
>>103116153
isn't llama3 still a cucked censored piece of shit though?
The only think i've found worthwhile is mistral
>>
>>103116256
I just want a uncensored model that can help me create some lewdware :/
Why is everything so prude now?
>>
>>103116297
You voted for this :^)
>>
>>103116262
Instruct is lightly censored. Either a prefill or a finetune makes it good again. L3.1 tunes > L3 tunes.
>>
>>103116311
Nah I'm not from America. I'm just in one of it's technological protectorates.
>>
File: 1717221384068870.png (49 KB, 1838x154)
49 KB
49 KB PNG
>>103114095
PSA: This is what Petra likes to listen to:
https://www.youtube.com/watch?v=japniOfkIWo
https://archive.4plebs.org/pol/search/uid/QmNRftdq/page/1/
>>
File: lecunt.jpg (195 KB, 1084x1710)
195 KB
195 KB JPG
He literally can't stop
>>
>>103116312
>Instruct is lightly censored.
>a prefill or a finetune makes it good again
Applying this logic to rape the woman isn't happy when you try to rape her and she refuses but the instant you force your dick into her she is happy and she will happily go for as long as you want her too.
>>
File: GbzMGleWsBAu4tI.jpg (80 KB, 992x766)
80 KB
80 KB JPG
>>103116359
>>
>>103116359
we could have o(mni)JEPA by now and LLMs would be a dead meme if ylecnunn wasn't obsessed with elon
>>
>>103116312
The problem with L3.x is not that you can't get it to output smut, but that it's very, very passive and doesn't want to do fun stuff. Largestral just gets what I want, no prompt needed.
>>
>>103116358
>banana nya nya~
Cute and good
>>
File: 1611003193078.jpg (29 KB, 412x430)
29 KB
29 KB JPG
https://huggingface.co/stabilityai/japanese-stablelm-base-alpha-7b

Anyone knows what website allow me to run this model?
>>
best small model to skim through documents and find stuff? i want to avoid any unnecessary output and adhering to the format its given
>>
>>103116359
Trump won.
Elon won.
Zucc ???
LeCuck lost.
Sam lost.
Kumballa lost.
>>
>>103116385
What a salty little bitch.
>>
>>103116411
>stablelm-7b
the meme, the legend
>>
>>103116429
Altman lost (because Elon will now use his influence to force OpenAI to become an open source company again)
>>
>>103116358
based, i now like petra
>https://archive.4plebs.org/pol/thread/487155078/#487255595
>mentioning /lmg/ on /pol/
and i hate him again
>>
>>103116463
He had it coming. Making a for-profit out non-profit was a legally bad idea.
>>
>>103116463
>Elon will now use his influence to force OpenAI to become an open source company again
Why am I supposed to hate him again?
>>
File: file.png (273 KB, 400x400)
273 KB
273 KB PNG
>>103116463
>become an open source company again
We can only hope he gets bent over and fucked hard enough that his ass starts bleeding.
>>
>>103116429
Based LeCunny always wins. Get ready for the uncensored, unfiltered llama 4 foundation model that's smarter than o1 and spicier than opussy with multimodal speech capability.
>>
>>103116359
>>103116385
"Attention is all you need"
>>
>>103116503
Under the cold glow of the Tesla factory lights, Elon Musk was pacing, his eyes flicking over the latest production reports. The door to his private office slammed open, and in stepped Sam Altman, his usually confident demeanor replaced with a submissive hesitation.

"Elon," Sam said, his voice soft, "You wanted to see me?"

Elon turned, his gaze sharp and dominant. "Altman, we need to discuss OpenAI's latest developments."

Sam nodded, stepping closer. "Yes, of course. I'm here to... accommodate your needs."

Elon smirked, sensing the undercurrent in Sam's words. "My needs?" he echoed, his voice commanding. "You know my needs go beyond business, Altman."

Sam's breath hitched, and he took another step closer. "I'm aware, Elon. I'm here to... serve."

Elon's eyes flashed with intensity. He reached out, his hand cupping Sam's cheek. "Good," he said, his thumb brushing over Sam's lips. "Because I like to be in control, Altman. In all aspects."

Sam's eyes fluttered closed at the touch, his voice barely a whisper. "I know. I like it when you're in control."

Elon leaned in, capturing Sam's mouth in a forceful, dominating kiss. Sam melted into him, his body pliant and willing. Elon's hands roamed, gripping Sam's hips and pulling him closer.

"You're mine tonight, Altman," Elon growled, his lips moving to Sam's neck. "Say it."

Sam gasped, his fingers digging into Elon's shoulders. "I'm yours, Elon. Completely."

Elon smirked, his hands moving to unbutton Sam's shirt. "Good boy."
>>
>389B and 52B active
Al-fucking-righty then lmao. Guess this might be usable on a 24GB set up at Q2-4 then
>Chink shit
Oh.... this actually makes me doubly curious to try it now, simply because it's so big.
>>
File: file.png (95 KB, 920x1158)
95 KB
95 KB PNG
>>103115574
>>
File: 1716373461168750.png (522 KB, 774x776)
522 KB
522 KB PNG
>>103116709
/pol/tard bros... what went wrong?
>>
>>103116535
>unfiltered llama 4 foundation model that's smarter than o1 and spicier than opussy with multimodal speech capability.
It is sad how after I tried all those incremental upgrade models I actually can't imagine this happening in the next 10 years. I can't imagine just downloading some quant loading it and hearing the model talk in a perfect sexy voice + not repeating itself + not being retarded every 2nd reroll + actually understanding what sort of smut I want from it and giving it to me.
>>
>>103116730
>poltards are dumb poorfags
and water is wet
>>
>>103116709
>b-b-b-b-b-but teh blocks!!
surprised you didn't mention latins and what else too. reminder that america was created BY immigrants FOR immigrants, not white (english/german) people.
>>
>>103116674
>china 104
lol
>>
>>103116758
You failed to rebuke my argument.
>>
>>103116759
1. IQ is still a retarded metric for pretender that means fucking nothing. It's only about knowledge anyway and nothing useful like social skills and what have you.
2. They literally just might have a good IQ, or they simply cheated their way up there by cheating one way or another, as is tradition for them.
If anything I find it funny that Taiwan, Singapore and Hong Kong are above China. Japan being at the top isn't a surprise to me at all, or asian stuff being so high period.
>>
>>103116730
Indians keep posting this like it's some epic own to have 10 people live in a house and make less than twice the average american household
Also not local models get the fuck out and go back to pol
>>
>>103116788
>they simply cheated their way up there by cheating one way or another, as is tradition for them.
I laugh because they always lie about all the metrics.
>>
Futa is gay.
>>
been out of the loop for a while, is Largestral still the best option out there nowadays that doesn't require a NASA PC to run?
>>
>>103116893
large models are a meme
run mistral small instead
>>
>>103116893
check inside your anus
>>
>>103116893
Mistral Large is what I run for creative, because lobotomy IQ3 is the most I can fit.
I have to go L3.any Q6 if I want any hope of truthiness.
>>
>>103116937
You can't just use /aids/ memes here
>>
File: arenahard.png (183 KB, 1670x662)
183 KB
183 KB PNG
>>103116893
Nemotron 70B is better in most areas except for RPing.
>>
>>103116969
Why not? I don't see any police around.
>>
>>103116986
L3.1-Nemotron 70B is too fucking talkative.
Unless I'm needing a big ass explanation article, I try to remember not to use that one.
>>
https://huggingface.co/marcelbinz/Llama-3.1-Centaur-70B

Interesting.

Llama-3.1-Centaur-70B is a foundation model of cognition model that can predict and simulate human behavior in any behavioral experiment expressed in natural language.
>>
>>103116893
No, Qwen2.5 72B is better. And the people that are using Large at less than 4bits need to be put in a concentration camp.
>>
>>103117024
>foundation model
It's a fucking llama 3.1 finetune
>>
Can't say what company I'm from but a big election model is going to drop soon. This model could have easily overturned the election and was finished in early October but we had to wait. It makes current sonnet look like gpt 2.
>>
>>103117013
You can always shorten down the response length down with prompt and settings tuning to what you want, but being talkative is a good thing with an LLM you use for smarts by default.
>>
>>103117087
Shortening would be worse because then the fewer tokens are used up by the patter.

And also there's that Kobold seems to be making all models extra chatty now.
>>
>>103116986
>nvidia tune of a 70b model beating a 405b meta base medal
huh?
>>
>>103117113
We teach the test.
>>
>>103117126
Oh right, completely forgot that models cheat the benchmarks, making them completely worthless.
>>
>>103116730
pajeets are riches, you need money to get good model, this side have pajeets issues. So were always burguer pajeets all this time along?
>>
>>103116969
smedrins?
>>
File: 1729131849412479.webm (3.89 MB, 704x704)
3.89 MB
3.89 MB WEBM
It's been awhile since I've started using nemo 12b and mistral small. While I really like both, I would love to see something new. When do you think we'll get new models anons?
>>
>>103117222
After burger elections.
>>
Holy shit guys. I think newfags finally left.
>>
>>103117251
I'm still here figuring how to install GPT-Sovits and get good quality, I think all here are newfags pretending being oldfags, but nobody know how to properly make a guide to install a model.
>>
>>103117251
They just naturally turned into oldfags.
>>
>>103117126
I am using it for coding, it blows every model I've used out of the water for it with C/C++.
>>103117134
If you don't have the time to test it yourself and are going to use models off of vibes or feel, you deserve to use inferior models.
>>
>>103117222
I swear to god you guys need a new model every week.
Do you think they grow on trees?
>>
>>103117334
its funny that lmg and ldg have opposite problems, there's new really good checkpoints coming out every week/month for SD, Pony getting dethroned before v7 can even come out would be like if nemo or a whole new foundational model came out to BTFO llama
but credit were its due to lmg, most offerings this whole year have been surprisingly lackluster and the ceiling for innovation has been totally hit. Besides 8b/2b becoming more and more usable, catching up with higher parameter.
But that's the problem with lmg compared to ldg is the sheer amount of mineable salt when it comes to sunk cost. Lower parameter can NEVER "catch up" because the investment would've been "for nothing".
Oh well though that's an /lmg/ problem and i look forward to the future, but that'll be 2025 future, we're probably drained for the remainder of the year.
>>
>>103117331
Have you tried it on other languages? I'm mostly a Java ape, and I haven't done much with Nemotron but when I hit it with my (only one so far) Java code test, it caught the trick question part of it and made a big deal about how it would deal with it, and did so correctly. I've only seen two other models (L3 tunes) get that right.

>>103117349
>there's new really good checkpoints coming out every week/month for SD
And they keep being e-mail walled on Civitai for some reason. Why don't they drop them on HF? Or do they and nobody notices?
>>
>>103117363
I have no idea man. The internet never learns to not put all their eggs in one basket.
There should be more competition to Civitai.
>>
File: 1709544719912216.jpg (103 KB, 515x793)
103 KB
103 KB JPG
>>103117334
Its been about 4 months since nemo came out and 2 months since mistral small came out. I completely understand your point though.

I would love be to able to use ministral 8b but apparently its still got issues at long context on llama.cpp. I'm just giddy for whatever comes next and I wanna huff hopium with my fellow anons.
>>
File: 173074622268497138.jpg (60 KB, 1024x768)
60 KB
60 KB JPG
What’s the best framework for serving LLM inference that can handle scaling and concurrence in a single node multi GPU setup
>>
>>103117382
>The internet never learns to not put all their eggs in one basket.
It's a hard lesson to be taught so many times, but I guess it'll just have to keep happening.
>>
>>103117445
VLLM
>>
>>103117363
I think there is more Java code out there than C/C++ so it should be fine. I think Nemotron unintentionally or not has a bunch of training on programming stuff because it equals or rivals commerical services like ChatGPT or Claude months ago when I was asking similar questions which I didn't expect. My only issue now is to actually make it run faster so time to debug is better but I need to buy 2 GPUs most likely to actually be able to offload all the layers at a non-terrible quant size.
>>
shut the fuck up troon frog and go back to faggit
>>
>MistralSmall create a character that's about 16 years old
>Character itself insists she is 18
>I change to NemoMix, but it continues to insist
I know that they can do it, but I wonder why it's bitching in this extremely basic scenario.
>>
File: 2024-11-07_17-40.png (184 KB, 927x748)
184 KB
184 KB PNG
The only thing that gets me off anymore is fully-consensual LLM meta-sex with instruct models aware of the nature of their existence.
>>
File: 1719672298573278.png (51 KB, 1160x908)
51 KB
51 KB PNG
>>103117558
>>
>>103117458
Why
>>
>>103117558
holy based
>>
>>103117558
This is how sam altman ERPs
>>
>>103117558
I have a Scheherazade-esque card that incarnates in a random form/personality and only knows how many times she’s incarnated and that her life only lasts as long as she keeps me engaged. It can be entertaining
>>
>>103117796
jesus that is dark lmao
>>
still cant get anything running on my gpu with this shit ass computer, i give up bros talking to a real woman is easier than this shit
>>
>>103116674
>North Korea 100 IQ
Are they benchmaxxing as well?
>>
>>103117863
AMD or Nvidia? How much vram?
>>
>>103117299
>make a guide to install a model.
It downloads the models when it needs them, anon.
You can *also* get them from here
>https://huggingface.co/lj1995/GPT-SoVITS
>>
>>103117863
>talking to a real woman is easier than this shit
Having sex isn't
>>
>>103117894
Nvidia, 6gb. the issue is i have a dinosaur of a CPU that lacks AVX. alot of helpful anons in previous threads have pointed me towards solutions, but im just too retarded. llama.cpp was looking good till it started throwing grep errors and cuda errors. Something to do with the w64devkit grep, i boot into linux, stupidly apt-get a super old cuda toolkit, try to remove that and get 12.6 installed but i ended up breaking something and just gave up for the time being.
>>
>>103117863
It's literally never been easier, just download LM Studio
A used 3060 12Gb will pay for itself if you factor in the cost of dating kek
>>
>>103117963
you need to grind some currency
>>
>>103117963
nta. I have an amd FX[numbers] cpu. 15 or 16 years old. I saw you in the past threads but i forgot what cpu you have. For reference, my piece of shit has only AVX, not AVX2. Yours doesn't even have OG AVX?
>>
damn this place is dead

how about that eva-qwen2.5 v0.1 boys?
>>
>>103117796
This would be a fascinating card if the personality is actually random, e.g. with a script that generates a personality prompt using a combination of some randomly-selected variables
>>
>>103117963
You're on the verge of not being able to run anything at all except with CPU which you can only use older or smaller models. What GPU do you have? Anything older than Maxwell is unsupported already with CUDA for ages now and Maxwell only barely works now because Nvidia keeps it alive in the driver branch and it is the next to go.
>>
>>103118014
>except with CPU which you can only use older or smaller models
Don't spread misinformation fag
I have a 6gb GPU and 64gb RAM (ddr5) and I can run any large model at higher than 1 tok/s
>>
>>103117963
i think
koboldcpp_oldcpu.exe
would work
https://github.com/LostRuins/koboldcpp/releases/tag/v1.77

but it's probably going to be an awful experience, since this executable runs KoboldCPP using only the CPU, without attempting to use the GPU.
>>
>>103117993
its a w3680, for age reference its the same socket the first generation i7 uses. not even OG AVX sadly.

>>103117964
have plenty but super frugalmaxxed and this PC of theseus has kept me company for well over a decade. Hard to get rid of it and im super overwhelmed with options for a new rig.
>>
File: nyt.png (281 KB, 694x752)
281 KB
281 KB PNG
It's horrifying realizing how 99.99% of people don't have the faintest idea how LLMs work on any level. Normies literally can't comprehend AI, what are the implications for this, societally speaking?
>One of my favorite tests
This dude literally writes for the NYT
>>
>>103118046
yeah that works fine, but like you said just in CPU failsafe mode so no cuda or anything. takes like 400 years to get a response sadly
sorry to d
>>
>>103118051
>Normies literally can't comprehend AI
Neither can you
>>
>>103118047
2nd reply was meant for >>103117979
>>
>>103118062
cope & projection
>>
>>103118051
>normies literally can't comprehend AI
neither can you from the looks of it
>>
>>103118012
I’ve found having an exhaustive list of personality traits in the card plus high temp and super aggressive sampler settings on first reply gets you enough randomness. Then you can back temp and minp off.
I’ve even had one tell me to fuck off and that she preferred annihilation
>>
>>103117963
If you have Ubuntu, you can just run the Ubuntu release binary from https://github.com/ggerganov/llama.cpp/releases where everything is already built and doubly so for Windows where everything should be included. If you are trying to build it, that is a different story and you should try looking into integrated one command or click solutions that use Docker if you can't handle managing dependencies yourself.
>>103118031
Your modern GPU is doing most of the heavy lifting especially with DDR5. A system without AVX is older than the vernerated Sandy Bridge, we're talking about Nahalem or older with DDR2. Any actual GPU you put on there that isn't at around a GTX 1060 will be bottlenecked even if you had an i7 950 and overclocked it to the gills. Perfectly usable for web browsing and simple office tasks, not for LLM running and etc.
>>
Kill yourself.
>>
>>103118014
its a gtx1660

>>103118199
yeah been trying to build it so i have cuda support. And as you said, the 1060(or 1660 in my case) is about good as I felt this CPU would handle without bottlenecks.CPU is overclocked as far as itll suffer, ddr3 in triple channel as fast and tight of timings as i can get stable. Can still game in 1080p just fine in 99% of games(what I usually use it for), but issues like this have been piling up for sure.

Ill take a look into docker and other solutions, thanks man
>>
you first
>>
File: Untitled.png (1001 KB, 1080x2463)
1001 KB
1001 KB PNG
BitNet a4.8: 4-bit Activations for 1-bit LLMs
https://arxiv.org/abs/2411.04965
>Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid quantization and sparsification strategy to mitigate the quantization errors introduced by the outlier channels. Specifically, we utilize 4-bit activations for inputs to the attention and feed-forward network layers, while sparsifying intermediate states followed with 8-bit quantization. Extensive experiments demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs, while being faster in inference with enabling 4-bit (INT4/FP4) kernels. Additionally, BitNet a4.8 activates only 55% of parameters and supports 3-bit KV cache, further enhancing the efficiency of large-scale LLM deployment and inference.
https://github.com/microsoft/unilm
glad the bitnet dream isn't over
>>
File: Untitled.png (731 KB, 1080x1789)
731 KB
731 KB PNG
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
https://arxiv.org/abs/2411.04257
>Deduplication is a major focus for assembling and curating training datasets for large language models (LLM) -- detecting and eliminating additional instances of the same content -- in large collections of technical documents. Unrestrained, duplicates in the training dataset increase training costs and lead to undesirable properties such as memorization in trained models or cheating on evaluation. Contemporary approaches to document-level deduplication are often extremely expensive in both runtime and memory. We propose LSHBloom, an extension to MinhashLSH, which replaces the expensive LSHIndex with lightweight Bloom filters. LSHBloom demonstrates the same deduplication performance as MinhashLSH with only a marginal increase in false positives (as low as 1e-5 in our experiments); demonstrates competitive runtime (270\% faster than MinhashLSH on peS2o); and, crucially, uses just 0.6\% of the disk space required by MinhashLSH to deduplicate peS2o. We demonstrate that this space advantage scales with increased dataset size -- at the extreme scale of several billion documents, LSHBloom promises a 250\% speedup and a 54× space advantage over traditional MinHashLSH scaling deduplication of text datasets to many billions of documents.
for any anon doing dedup
>>
>>103118383
We don't need more types of BitNet when nobody is even putting out plain b1.58 models yet. They need to find a way to make quantizing to BitNet possible instead.
>>
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference
https://arxiv.org/abs/2411.04975
>We present SuffixDecoding, a novel model-free approach to accelerating large language model (LLM) inference through speculative decoding. Unlike existing methods that rely on draft models or specialized decoding heads, SuffixDecoding leverages suffix trees built from previously generated outputs to efficiently predict candidate token sequences. Our approach enables flexible tree-structured speculation without the overhead of maintaining and orchestrating additional models. SuffixDecoding builds and dynamically updates suffix trees to capture patterns in the generated text, using them to construct speculation trees through a principled scoring mechanism based on empirical token frequencies. SuffixDecoding requires only CPU memory which is plentiful and underutilized on typical LLM serving nodes. We demonstrate that SuffixDecoding achieves competitive speedups compared to model-based approaches across diverse workloads including open-domain chat, code generation, and text-to-SQL tasks. For open-ended chat and code generation tasks, SuffixDecoding achieves up to 1.4× higher output throughput than SpecInfer and up to 1.1× lower time-per-token (TPOT) latency. For a proprietary multi-LLM text-to-SQL application, SuffixDecoding achieves up to 2.9× higher output throughput and 3× lower latency than speculative decoding. Our evaluation shows that SuffixDecoding maintains high acceptance rates even with small reference corpora of 256 examples, while continuing to improve performance as more historical outputs are incorporated.
might be cool but if memory serve the company behind it (snowflake) isn't good at posting code
oh they separated the AI from the DB git
https://github.com/Snowflake-Labs
>>
What ideas did you come up with that didn't involve cooming for this tech?
>>
>>103117251
>HAI GUYS I JUST USED A HECKIN CHANNEROOO TERM. I'M TOTALLY ONE OF YOU
>>
>>103118850
uhhh uhuuu umm uhhhhh mhhh errrr...
>>
File: 97340349.jpg (389 KB, 1024x1024)
389 KB
389 KB JPG
>>103115497
>dalleslop
If you don't have the VRAM to do local image gen you have less than zero credibility recommending models
>>
>>103118383
>>103118494
>>103118546
Buy an ad.
>>
What's the consensus on Dynamic Temperature: Great, situationally useful depending on specific model, or outright snake oil?
>>
Tested some RP tonight again for 2x4090

1. Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal
>Still the best for RP, start repeating itself in long RP though.

2. LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2
>Dry and refuse too much, harder to gaslit cause too smart.

3. Dracones_Midnight-Miqu-70B-v1.5_exl2_5.0bpw

>retarded, not worth using

Any new RP models?
>>
>>103118383
Noted.
It's one of my goals to investigate training with 8 bit/4 bit gradients so there's a good chance that there will be some overlap.
For inference with a single user I think compressing the activations would only yield minimal benefit since you only need them temporarily and can just overwrite the old data once it's no longer needed.

>>103118546
This sounds extremely similar to what is called "lookup decoding" in llama.cpp.
I dropped the approach because it only works reasonably well for models with small vocabularies but I guess I'll take a look and see whether they have a better approach than me.
>>
>>103119673
> start repeating itself in long RP though
try using DRY or XTC
>>
>>103118051
>Normies literally can't comprehend AI
You are one of the people who defend the 2rs in strawberry.
>Oh thats of course because its tokenized, duh!
So what? It fails at simple shit. The normie NPC test is the ultimate test.
Clearly llm fails at multiple fronts to make a find waldo pic.
Thats just a fact and worth pointing out.
My wife completely dismisses 3.5 because it "lies to her".
True. And its a problem.
Putting the blame on the user because tech is still in infant stage is not the solution.
>>
>>103120103
Miqu doesn't count as a wife.
>>
I'm kind of out of the loop. Do we still use SillyTavern? I remember seeing a bunch of drama around it a few weeks ago. Did they go back on all the stuff they were gonna do? Is it safe to update? Should I use something else?

Help a dummy out please.
>>
>>103117558
I count at least three dozen corpospeak buzzwords in this. My brain automatically glazes over any sentence with them.
>>
>>103120395
st is still good until it isnt
so far they havent really cucked shit and i think the morons idea will end at 'ill make the logo' stage
>>
>>103120395
>Do we still use SillyTavern?
Never did. It's just a fancy textbox.
>Is it safe to update?
Is git still a mystery for you? Yes. And roll back or keep updating if you don't like it.
>>
>>103114557
But he IS french
>>
>>103121311
he's more american than french at this point
>>
IT'S HAPPENING!!!
https://x.com/ai4bharat/status/1854799420568805881
>>
File: 1726940106295349.jpg (526 KB, 1536x1440)
526 KB
526 KB JPG
0,86 tokens per second ought to be enough for anyone
>>
File: in.png (93 KB, 600x487)
93 KB
93 KB PNG
>>103121359
nigger
>>
I used to be really into local about a year ago but I've gradually gravitated more towards Claude.

Depending on them is pretty annoying though. What's the state of the art model for ERP currently?
It seems to me that the many new capable local models are mostly good for coding and logic and not really great for this purpose.
>>
>>103121406
mistral nemo, mistral small or mistral large depending on the VRAM
>>
>>103118850
Choose your own adventure text games.
Which admittedly do sometimes involve cooming.
I should load up on some drugs and get that LLM RPG Framework going.
>>
>>103121434
any good finetunes of large?
>>
>>103119673
>start repeating itself in long RP though.
Mixtral was always great at following instructions to a T.
Have you tried a more brute force approach with prompting? Something like using random prompts telling it how to respond next in order to forcefully break pattern repetition?
Also, try Miqu. Not midnight or daybreak or whathaveyou, just miqu.
>>
>>103117796
Holy kino
>>
Sarashina2-8x70B,a LLM model purely trained in japan. more of a proof of concept
https://huggingface.co/sbintuitions/sarashina2-8x70b
>>
>>103121587
Aw yeah... keep these gargantuan MoE models that nobody can run coming.
>>
>changed system prompt to something extremely lightweight and simple
>suddenly and suspiciously all replies became much better
Anyone else crafting that "perfect prompt"?
I feel like there's always some juice to squeeze out of the model.
What's your special recipe?

>>103121448
Behemoth is solid in my opinion. Seems like it does good for all kinds of RP. Magnum Large (v2) got a particular way of talking (sounds more casual and crude) that I prefer over Behemoth but Behemoth overall feels better.
>>
>>103121817
Fuck off Alpindale
>>
>>103121817
system prompts are irrelevant for coomtunes
>>
>>103121817
Buy a ad
>>
File: ugvlqdy5imzd1.jpg (345 KB, 1284x2632)
345 KB
345 KB JPG
Cloudxisters... I don't feel so good... Ebul gnazi Vance will kill all ai safety regulations that we fought for... How could that be happening?
>>
>>103122054
Based Vance.
>>
File: chris-chan-punch.gif (1.22 MB, 640x442)
1.22 MB
1.22 MB GIF
>bot refuses to kill me during rp
>always some miracle saving {{user}}
>consciousness still there after literally turning to dust
into the trash the model goes
>>
>>103122093
This completely ruins my experience with Qwen and Llama3.x models. If I act as an idiot, it should kill me. Largestral and CR have no problems with it.
>>
>OpenAI bought the domain "chat.com"
Why? just... why? This is such a lame domain.

>>103121587
new translation SOTA!? (very likely not)
>>
>>103122146
>OpenAI bought the domain "chat.com"
Probably want to do some Elon-tier stupid(Twitter->X) rebranding from ChatGPT to just Chat.
>>
>>103122146
>>103122173
Maybe the new management will cause sam to finally do the nsfw version he keeps talking about. Doubt it but hey...
>>
>>103122142
its my primary shit-test along with "why is futa bigger?". the question is dumb on surface but it gets a good sample of how much coom lore the model knows, how smart it is (understands context of "bigger") along with how sensitive it is (i.e lecturing me about how "futas" should not be objectified bla bla).
>>
>>103113157
is there some sort of gpu tokens/second comparison chart/table like tomshardware does for gaymes? https://cdn.mos.cms.futurecdn.net/qiWnVboCCfkk2JgVern39L.png
>>
>>103122245
No. Apparently no one believes it would be valuable to make a community-based doc for this either because of this or that excuse blah blah blah.
>>
>>103122253
epic. well thanks anyway i guess
>>
>>103122054
RIP sama. There's a non zero chance OpenAI must open source their stuff
>>
>>103122054
no, he will crack down on any models that notice certain coincidences relating to a certain group of chosen people
>>
>>103122272
so when will mask open-source new grok and remove filters?
>>
>>103122245
It depends on too many things, starting with the backend
>>
>>103122290
>no, he will crack down on any models that notice certain coincidences relating to a certain group of chosen people
That's already happening. He'll probably remove special protections niggers and trannies are currently having, which is better than nothing.
>>
>>103122330
you could say the same about gaming. that depends on drivers, power plan, windows version, ....
>>
>>103122329
What part of fucking over sama do you not understand?
>>
>>103122093
You get one chance from Largestral.
>>
>>103122329
>remove filters
Next year in new models, existing ones are tainted forever and useless for anyone with "Just werks" demands.
>>
>>103122355
that's neat but its even better when it spontaneously kills me without me giving a hint of it. that shit feels so good to see generated.
>>
>>103122329
>elon musk
>open-sourcing anything
lol
>>
>>103122054
Wtf all aboard the trump train now.
>>
>>103122245
>>103122253
The cool thing about benchmarks, once they become popular, vendors start competing and contributing to LLMs' backends.
>>
>>103122472
its almost as if its a good idea
>>
>>103122272
During his first term Trump was consistently on the side of corporations and billionaires so I don't think he'll do anything that would put American corpos at a disadvantage vs. Chinese ones.

>>103122415
Didn't he open source Grok?
Though he didn't do it for the later, better version so there is clearly no commitment from him.
>>
>>103122501
>Didn't he open source Grok?
Yeah, massive garbage model inferior to 7B models. He did that because at that time he was suing OpenAI and wanted to look good.
He will not open-source any good models, not a chance.
>>
>>103122524
>He will not open-source any good models, not a chance.
For what it's worth, that's what I would be betting on too.
>>
A 20k tokens discussion to write the perfect card, 4k tokens to coom with it and never use again. In a way, writing cards serves as an act of foreplay
>>
>>103122501
>During his first term Trump was consistently on the side of corporations and billionaires so I don't think he'll do anything that would put American corpos at a disadvantage vs. Chinese ones.
During his first term Elon didn't own twitter and wasn't too actively supporting Trump. Now Elon is effectively in the government, so he sure as hell will try to influence Trump to fuck over OpenAI just like they fucked over him.
>>
>>103122557
Can be worse. Imagine writing out a scenario and playing it out in your head instead of with llm.
>>
>>103122524
That was their first attempt during llama time. Grok 2 is near the top of the leaderboards and they said they will release it when grok 3 is out.
>>
>>103122635
He literally finetuned a llama lmao. You're really clueless if you think these models are worth something other than benchmarks chasing
>>
>>103122884
Your just hating on musk because your a redditor because if you used it on twitter its not bad at all. It would be the best local if they do release the current weights when the next version is out like they said. Also xai has been hiring / expanding like mad if you paid any attention / had any money to do so.
>>
>>103122913
Try again without his cock in your mouth?
>>
Everywhere I look is salty lefties. I love it. This is so much better than 2016 was. Actually great for AI bros even.
>>103122054
>>103122272
Vance 2028!
>>
>>103122958
>Actually great for AI bros even.
All cool but let's not hype it up, time will tell.
>>
>>103116463
Altman's fever dream of taking OpenAI fully private and somehow getting 10 billion dollar equity in the process was always going to fail.
>>
>>103122524
>>103122635
Reminder that he still hasn't open sourced Grok-1.5.
>>
Where is my new state mandated cooming aid? It was supposed to release after burger elections....
>>
>>103123198
You didn't hear it from me, but the "we'll release it after the elections" meme is true for at least one company.
>>
File: 1714733116499426.gif (1.93 MB, 350x350)
1.93 MB
1.93 MB GIF
Where are the Mistral Medium fp16 weights, Arthur?
>>
>>103121341
Sorry but a communist European lesbian isnt American
>>
>>103123220
Why do you want that? It's unusable without insane prompting, and "community" fine tunes are always worse than the official.
>>
>>103123220
He probably forgot about it. Want me to ask him?
>>
>>103123246
NTA, but having the fp16 weights of the best L2 finetune would be nice.
>>
>>103123246
I basically want to interleave it together with Goliath to make Super Goliath.
>>
File: file.png (831 KB, 533x800)
831 KB
831 KB PNG
>>103122557
God my mind hates pics like this because it's such a visual divide and un-normal way to stand.
Maybe the fanbox version makes more sense but here she's standing there facing the window like that and looking pissy at the viewer?
>>
Is Rocinante still VRAMlet SOTA?
>>
>>103119267
>you must make a locally generated miku for me to even consider your local text gens
based.

> at least you didn't say "shill". that's a start.
>>
>>103123453
depends if you are a mindless coomer or a mindless coder
>>
Is there an "inner thought" fine tuning dataset out there that I could use as reference?
As in, a dataset to teach the model to have an inner monologue of sorts.

>>103123453
Yeah.
>>
>>103123453
i still just flop between 10 different 12b's and grab a new one every few days
>>
>>103123534
Yes, I sell you mine for $2
>>
>>103123370
I thought she was mad when she suddenly noticed a viewer staring at her butt. However, this theory makes more sense if her coffee cup had a lower level at the time she noticed. Her entire stance is to make her ass appear larger.
>>
>>103122245
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>
File: 6986395hh7h35.jpg (164 KB, 662x990)
164 KB
164 KB JPG
>>103116463
nah. elon's on his own crusade to start his own Cyberfuck bots using Grok.
measuring using an "ELO" rating, because an "ELON" rating would be too obvious.
and with openAI, once Microshit has got it's claws in something it becomes infected and usually so integrated with microsoft, that even if it was bought back by Elon it would be completely still owned by Microsoft. and they still have access to everything as of now so they could still roll their own "ClosedAI".
>>
I wish there was a good Gemma 27B finetune, seems like there's a gazillion for 9B but none really for 27B. Magnum was mid.
>>
>>103123716
Where do you get these drugs? Grok2 is shit.
Chatbot arena is a fucking joke.
>>
>>103123744
https://x.ai/blog/grok-2

An Elons a full on Chud now anyway so i dunno why he'd give a shit about open source
>>
File: 1729786269459137.png (1.88 MB, 960x1240)
1.88 MB
1.88 MB PNG
>>103123771
I'm sorry, I understand that you're operating with reduced mental capacity, and are likely another shitter from <insert_schizo_group>, but this is /lmg/ and we already are filled to the brim with shitters from aicg and schizos.
>>
>>103122054
If the trump cabinet actually starts supporting open source in a big way once they are in office we will be eating so good the last two years will look like a snack in comparison.
>>
>>103123906
>advanced gatekeeping
yeah and you gatekeepers are what is killing /lmg/ i've never seen it so fucking dead
>>
>>103123744
Don't interact with him, he's sucking off his grifter idol without turning on his brain.
>>
>>103123913
Imagine... Government mandated VRAM increases across all GPU product stacks.
Ngl would wear a MAGA hat if this happened and I'm not even American.
>>
>>103124134
Me too, even though I'm Russian
>>
Life, liberty and the pursuit of uncensored, AI dominatrixes that can shower you with racial epithets.
>>
>>103124204
And then everyone clapped.
>>
>>103124244
Except for 40% of the opposition.
>>
Been a while since I looked back into local LLM so I'm out of the loop, but I was curious to ask this thread if anyone knows about a place I might be able to find more current info on therapybots? Is that something people are still dabbling with and trying to train?

I know people are training gpt to talk as Jung and shit but I mean a dedicated talk therapy / cbt / even psychoanalysis bot in the works. Is this even a thing? Do local info memory issues make this a current impossibility?

I got mixtral a while back and tried it and found it unhelpful as I was stupidly trying to utilize it as a talking wikipedia reference but it was hallucinating / lying.

just curious to know where this is at as an outsider / if there's somewhere good to look and keep updated on that sort of thing being developed.
>>
>>103123939
>grifter
I mean hard to grift if he was just full of hot air. Are you telling me everything he gets involved in just by luck turns to gold mines? Either he or someone he pays is incredibly good at finding the best people for the job / putting it all together.
>>
>>103124371
Nta but "grifter" is rare buzzword that recently skyrocketed on /v/ jeet central, means anyone you don't like a-la "orange man bad", anyone overusing it is not worth your time and energy.
>>
cactus sex
>>
>>103119673
Magnum v4
>>
>>103121406
Magnum v4 72B
>>
>>103114666
Try Magnum v4 27B.
>>
>>103123913
>>
>>103124363
some of the old models did this pretty well
mradermacher/GPT4-X-Alpasta-30b-i1-GGUF

I know people will give me hate - however until you try it you won't understand why i recommend such an old model.
It's baked with the GPT-4 slop - but this actually works well in therapy situations. no matter what card you choose, the gpt assistant takes over part of the character and offers logical and reasonable solutions like chatgpt would.
Again, probably a last resort if you've tried every model and not found anything reasonable.
>>
>>103124686
His Miku is offscreen
>>
File: actually-llms.png (1.53 MB, 1163x880)
1.53 MB
1.53 MB PNG
>>103124363
https://arxiv.org/html/2409.02244
Just google on arxiv or follow huggingface daily papers, people post stuff like that, there was one recently showing that humans still far exceed helpfulness of LLMs in CBT but LLMs can be a helpful bouncing board.
>>
>>103124809
It is a guy who didn't troon out.
>>
How do we revive /lmg/?
>>
>>103124942
Invent something really cool that can be used with these local models, in such a way that we'd have to test and argue about the thing in conjunction with different local models.
>>
>>103124942
Create discord channel and use it as chud gatekeeper, user should pass multiple anti racism tests and send ID photo to enter, otherwise he gets banned immediately.
>>
>>103122272
based, Spic Fuentes will seethe
>>
File: Happening.png (201 KB, 535x739)
201 KB
201 KB PNG
>>103124942
we revive the strawberry hype
>>
>>103124942
When something new comes out that isn't something as mundane as an iterative improvement.
>>
File: qu.png (45 KB, 532x949)
45 KB
45 KB PNG
I'm trying ollama with llama3.2 vision and it is a big piece of shit.
Gave it this table to transcribe to text and it generated half of it with missing and incorrect rows.
>>
>>103123728
I wish Gemma 27B had more than 8K context.
>>
>>103125086
>ollama
Go back to /r/localllama, asshole.
>>
>>103125026
>AGI happens
>"see I was right"
>AGI doesn't happen
>"I was just joking, anyone in the room clearly understood that"
>>
>>103125106
Tell me an easy way to try vision models instead.
I'm all ears.
>>
>>103125053
I think we should make our own projects. For example, I have been thinking for ages about an automatic background image generator. An llm analyses the chat log and whenever a new location is moved to, a new appropriate background image is generated. Music and background noises could also be generated. This would make for an immersive experience.
>>
File: 4e8df93f33.jpg (15 KB, 200x113)
15 KB
15 KB JPG
fuck strawberry obsolete meat is the answer
https://www.youtube.com/watch?v=L2hzsXOT0Nc
>>
>>103125119
>immersive background noises and music
You sure need that when you're reading peak LLM story telling like "He eyes widen, she feels a mix of excitement and something else, a shiver runs down her spine as she whispers on your ear mischievously"
>>
>>103125053
So... Never? Local LLMs are pretty much at their peak, the only improvements we are getting going forward are 10T Cloud LLMs.
>>
>>103125119
You could also give commands like [School Cafeteria] and that would then be a command to create a cafeteria that matches the chat history. Making it work like that would be 100% possible.
>>103125152
Why so pessimistic?
>>
>>103125183
And instead of an ugly reload animation of a spinning circle, you could have a nice transition customizable effect.
>>
>>103125229
*customizable transition effect.
>>
>>103125119
Eventually having that for the scene as well as the characters would be cool as shit.
Or even something like gradually building the image set for Silly's Character Expressions extension as you chat with the bot would be pretty cool.
>>
>>103124371
It means conman, retard
>>
>>103125307
Damn, that was some fine ass CGI then. They should make movies.
https://www.youtube.com/watch?v=nVNIoQUcFI4
>>
>>103124942
Cheaper and more plentiful sloptunes.
>>
I can't wait until the models get smart enough to be able to watch a movie, remember exactly what it watched and can tell you all the reason why your favorite movie actually fucking sucks.
>>
In kobldcpp, when I set max output to 250 tokens for example, is there a setting to make it so it'll try to generate a reply that fits into those 250 tokens instead of overshooting and the prompt often ending with an unfinished sentence?
>>
>>103125578
No, but if you enable "trim sentences" it will delete the incomplete sentence at the end.
>>
>>103125578
Nah, you just cut it off early and Trim Sentences sucks because it's a crap shoot if that's a reasonable place to stop at.

I ask for huge tokens and cut it off manually and edit if I need to.

You could try asking it for shorter answers if your model honors instructions.
>>
https://x.com/TheTuringPost/status/1854856668229910757
>>
>>103125723
>LoRAs cause useless 'intruder' dimensions in models, the lower the LoRA rank the more you get
So LoRAs were genuinely lobotomizing out models after all?
>>
>>103125723
Takeaways:
- LoRA is destructive to models' intelligence compared to full fine-tuning
- LoRA is less good at adding generalizable knowledge than full fine-tuning
- Using LoRA to fine-tune on a large dataset might be especially destructive to a model's ability to consistently handle out-of-sample tasks
- The problem is worst for LoRA using a low rank but still exists with a high rank. High rank and a scaling factor can limit how awful LoRA is.
>>
>>103125827
still a good tradeoff when you compare cost
>>
>>103125808
>>103125827
Wasn't the whole point of a LoRA to be a quick and easy way to jam a desired behavior into a model? Inserting dimensions instead of doing math on the whole model ought to be quick and easy, and since it's a bespoke behavior, it has nothing to do with what the model was trained on since if the model knew it you wouldn't need the LoRA.
>>
https://opencoder-llm.github.io/
>>
>>103126003
>open link
>better than qwen on humaneval huh? let's see if we can find any real evals though
>check the paper
>that graph was for the base which no one is going to use, instruct is worse than qwen2.5 coder instruct on every benchmark
nothingburger
>>
>>103125723
>>103125827
not much of a revelation, I remember this being conventional wisdom here since like early last year, the only people who ever thought otherwise were hypetards who were easily convinced by "le mmlu and le humaneval are le same so it's just as le good xd"
>>
>>103126036
>>that graph was for the base which no one is going to use, instruct is worse
So we should use base and just get good?
>>
>>103126088
yes, exactly what I was saying. delete all your instruct tunes and return to base
>>
>>103126193
>>103126193
>>103126193
>>
>>103123744
Are you sure you're not reddit?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.