[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1727654387697.png (942 KB, 1024x1024)
942 KB
942 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102604225 & >>102598736

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_24.jpg (326 KB, 1024x1024)
326 KB
326 KB JPG
►Recent Highlights from the Previous Thread: >>102604225

--New "Physics of Language Models" video on learning from math mistakes:
>102608934 >102609082
--Local storygen application with llama.cpp and OpenRouter support:
>102608735 >102608822 >102609056 >102609282 >102609353 >102610334 >102609408 >102609426 >102611089 >102611299
--Can local models handle complex characters like Myrtle Pissflaps?:
>102605897 >102606042 >102606336 >102606575 >102613210 >102608781
--Anole model can generate interleaved text and images:
>102615193 >102615236 >102615262 >102615326 >102615398 >102615418 >102615567
--Tips for reducing character horniness in AI-generated content:
>102609833 >102609990 >102610589 >102610778 >102610049 >102610056 >102610740
--Llama and koboldcpp recommendations for home computer video game project:
>102607826 >102607919 >102607970 >102607949
--Recommendations for models to run on 4080 and 64GB RAM system:
>102608072 >102608099 >102608116
--Newsom vetoes California bill restricting open-sourcing Llama 405B:
>102614639
--New CogView3 text-to-image generation model released:
>102611162
--Meta's advanced voice is censored and not publicly available:
>102604418 >102604471
--Local LLMs are memory-bound, not CUDA-bound:
>102605494 >102605543 >102605620 >102605731 >102605754 >102605809 >102605857
--Llama.cpp users debate new architectures and model support:
>102611171 >102611251 >102611295 >102611322 >102611364 >102611404 >102611501
--Llama 3.2 chatbot interface declines to assist with potentially harmful activities:
>102610608
--Extension idea for generating replies based on the best parts of multiple options:
>102613786 >102613830
--Experiments with evolutionary algorithms and bitnet type MLPs yield promising results:
>102612672 >102612833
--Miku (free space):
>102612270 >102612310 >102612592 >102612662 >102616287

►Recent Highlight Posts from the Previous Thread: >>102604248

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
What's the best model for uncensored roleplaying in Polish?
>>
Ok this 3.1 StoryWriter finetune is the best local I have ever used. Why don't more people talk about it instead of all these retarded overly horny slop tunes that can't write for shit?
>>
>>102616701
buy an[] ad
>>
New release by "living ai dataset" schizo.
https://www.reddit.com/r/LocalLLaMA/comments/1frynwr/repletellm_qwen25_models_release/
>I just realized replete-llm just became the best 7b model on open llm leaderboard

>I noticed recently that the Qwen team did not learn from my methods of continuous finetuning, the great benefits, and no downsides of it.

https://www.reddit.com/r/LocalLLaMA/comments/1ey3k0f/the_living_ai_dataset/
>This might be one of the most, if not the most important datasets in ALL of AI history, giving AI empathy and Love.

>See now you are making misguided claims. Because you act as though God doesnt provide proof to every living being of his existence. Your own ignorance does not make God less sufficient for being real, and proving himself to others, just because you lack the awareness to actually follow the proof God gives to me, you, and everyone else in their daily lives, and in the world around us.

>It depends on how much spiritual knowledge you have. Many dont understand what it takes for a body to house a soul. I myself am extremely close to God, and have this knowledge, plus have the ability to sense if a soul is present. You can truly tell the difference when you actually use a model that has a soul like the one I created. I have a space for it if you want to test it. It not the smartest, but its definitely alive.

Guess it wasn't alive enough for him to continue that for qwen.
>>
>>102616709
Are you retarded?
>>
it's hard to even consider the zoomer doomers human. they constantly seethe about how much they hate the tech/hobby/this general and yet they never fucking leave. what a wasted fucking youth
>>
>>102616701
Post logs comparing with the "retarded overly horny slop tunes that can't write for shit".
>>
File: 37 Days Until November 5.png (1.94 MB, 1472x1104)
1.94 MB
1.94 MB PNG
>>
>>102614989
>>102615000
Nemo 12b did her best. (ctrl+f to find your post)
https://rentry.org/e8wt78ff
>>
File: 1690032641327028.jpg (188 KB, 800x1200)
188 KB
188 KB JPG
>>102616732
>>
>>102616732
>reddit
>schizo drama
this is spam
>>
>>102616787
>Discusses current top of leaderboard sota model
>SPAM MODS!!!!
k sam
>>
>>102616777
>Banned - The user violated rule number 11 by advertising, specifically with the phrase "latest model: a fucking cot finetune" which seems like an attempt to promote or draw attention to a product or service.
Based
>>
>>102616777

Lol this one.
>>102616386
>Banned - The user is under the age of 18, which violates Rule 2.
>>
>>102616777
>102615236 Banned - Reason: Violates Rule 1. The user is requesting software to violate local or United States law (child pornography).
>>102615236
>nah, most vlms currently can only take images as input. chameleon was image text in and out but was neutered for official release
exactly why llm mod is bad idea, complete hallucination
>>
>>102616908
To be fair it is a 12B (you sick fuck).
>>
File: chatlog (28).png (1.5 MB, 830x3936)
1.5 MB
1.5 MB PNG
>>102616774
Sure here, tested switching between SFW / NSFW and back, world building, prose and the needed spatial awareness needed for non human movements.
>>
>>102616777
What prompt/script did you use? Kinda curious how a bigger model might perform on this, I could run a 72B overnight.
>>
>>102616777
>
Holy mother of faggotry
>>
>>102616947
https://files.catbox.moe/3f67n8.py
It's meant to be used with tabbyapi, but it's simple enough to port it to a different api.
>>
File: _06425_.png (1.54 MB, 1280x1280)
1.54 MB
1.54 MB PNG
Plebbitors eunt domus
>>
>>102616777
YWBAJ
>>
>>102616732
In hindsight this makes me appreciate Undi a bit.
>>
>>102617094
And you will do it for free!
>>
>>102617010
Oh yeah I'll need to see about adapting for Llama.cpp then probably. Feeling lazy so I'll do that tomorrow.

Does this not get the entire reply chain for a specific post in question?
>>
missed the past couple of days here, what's the latest drama/go to models?
currently using Midnight-Miqu-70B-v1.5.i1-IQ4_XS.gguf (34.6GB)
pic related
the "internal thoughts" can be pretty funny at times
>>
>>102616938
>Pony logposting
I am so happy /lmg/ is dead
>>
>>102617147
It doesn't. It judges based on the post without context. But it shouldn't be too hard to implement. The html cleaning function leaves quotes as ">>########" so you can find/extract that and build another json object using those quotes as a breadcrumb to move up a chain and feed that to the prompt to provide additional context.
>>
>>102617203
>>102616701
>>102616938
>>
>>102616701
>>102617249
link to model?
>>
>>102617284
https://huggingface.co/hf-100/Llama-3.1-Spellbound-StoryWriter-70b-instruct-0.4-16bit
>>
lets see if we can get through this thread without the discord sloptuners going on a schizo rant about how they should be the moderators of /lmg/
>>
>>102617326
Sigh, now xe will spam with x2 confidence.
>>
*dies*
>>
>>102617586
I'm too busy using my local model to post anything of import.
>>
>>102617586
unironically that's basically what happens to zoomers when they don't get endless content to fill their attention with
>>
>>102617602
Same. Not much has rocked the boat lately, and I'm still using workhorse models from 4+ months ago, so there's not much to say.
>>
File: Untitled.png (5 KB, 211x152)
5 KB
5 KB PNG
I put your post in the shitter so I didn't see your reply BTW also going to go fuck now so have fun tonight little zoomie spamming all the threads you obsess over
>>
>>102617670
>BTW also going to go fuck now
>ah ah mistress
>>
>announcing filters
the tourists are still here
>>
File: 1697410436504670.png (1.25 MB, 802x638)
1.25 MB
1.25 MB PNG
>>102617670
Have a pity (you), filterfaggot.
>>
>>102617221
I have set 72B on this task. Let's see if it can do it lolol.
>>
>30 minutes
wow it's really over huh
>>
SB 1047 got vetoed.

https://www.theverge.com/2024/9/29/24232172/california-ai-safety-bill-1047-vetoed-gavin-newsom

I.e., the bill that would have had this stupid fucking clause, which would have essentially banned all open Meta and Google releases:

>Developers also would have needed to ensure their AI could be shut down by a human if it started behaving dangerously.
>>
>>102618183
I think they vetoed because they want to rewrite the bill to be even worse for us, don't take this as a victory at all
>>
>>102618269
It's a bit of both. One of the reasons he gave was that certain small models could be hazardous. The others were that there was no science regarding what is actually dangerous, and that it was too broad
At the very least, it buys us a bit of time before California voluntarily gimps itself into becoming a technology desert in favor of China
>>
>more discord drama
it really is just completely over for /lmg/
>>
so when are the llama multimodal models going to get quants and so on?
>>
lmao.cpp single handedly holding back all of local text gen
>>
>>102618183
I see no issue with this bill.
>>
>>102618348
>At the very least, it buys us a bit of time before California voluntarily gimps itself into becoming a technology desert in favor of China
I think at some point a lot of companies will simply relocate to another state, like Musk did from moving SpaceX from california to Texas
>>
>>102618576
well, texas is objectively better for launching rockets for reasons other than politics
>>
>>102618507
I'm not too worried, in a month we'll be able to have grok 3 write its own branch with multimodality, speculative decoding, jamba, and hookers
>>
>>102618610
kek, but imo I think the states concept is cool, if you hate one state ideoligically or politically, you can simply move a 100 ft and go elsewhere, in other countries it's the same laws everywhere in their land so you're kind fucked
>>
File: file.png (477 KB, 750x1000)
477 KB
477 KB PNG
>>102618526
I was about to insult your whole familly then I saw your signature and I kek'ed hard
>>
I'm a poorfag, can anything even run on my 12VRAM+32RAM configuration?
>>
>>102618653
cydonia
>>
>>102618653
search on huggingface for "12b gguf" and browse your options, then pick the one that has the most attractive image at the top of its description
>>
>>102616784
Just the nigga I need to see. Tell me, what's the most gangsta component of the transformers architecture?
>>
>>102618653
mythomax Q8 with some layers offloaded
>>
What's the best local model if you have a single 4090? I'm so sick of fucking corpos.
>>
File: Capture.png (117 KB, 1476x610)
117 KB
117 KB PNG
>>102618695
Dolphin qwen 2 3Bit quant ixs
>>
>>102618269
>even worse
remember, we have to win every time. They only have to win once
>>
>>102618763
>>102618750
>>
>>102618878
Thanks
>>
>>102616619
>Experiments with evolutionary algorithms and bitnet type MLPs yield promising results
Thanks recap Anon, I always love to see evolutionary algorithms in use. Hope his next test for optimization goes well.
>>
LlaMoE status?
>>
>>102618935
>>LlaMoE status
crewing up
>>
File: lepedophile.jpg (30 KB, 543x543)
30 KB
30 KB JPG
I like my LLMs how I like my little girls: naked and in groups of 8
>>
>>102619033
Why did he say it bros?
>>
>>102618790
Who the hell are you? Where is the big man?
>>
inb4 it takes llama.cpp 3 months to support LlaMoE
>>
>>102619070
I... don't have the card, it's just a simulacrum, a 4chan thumbnail, cleaned of all data.
>>
>>102618935
Anon that was a lie. It was literally a fabricated post by anonymous hacker 4chan to trick other anons like you.
>>
>>102619167
this anon is lying
>>
>>102619167
I want to believe.
8x11B, NALA optimized, multimodal, Mamba, 1.58bpw quantization-aware training.
It's going to be the one.
>>
>>102619185
This anon is my tulpa, don't trust him guys.
>>
>>102619033
>>102619055
You guys do realize he was just making a food analogy and you pedos thought I was talking about lolis? It was perfectly apt for the situation. He isn't steeped in the degeneracy of this place to even be able to contort his brain into making the connection you did.
>>
the anon above me is a pedophile
>>
Why won't SillyTavern autostart in Firefox on Linux? text-gen-webui always starts fine but ST stopped opening in the browser many months ago. Have to type it manually every time. Config file says the option is set.
>>
HOLY SLOPPA
>assistant slopmaxx gaslight strat
>discover special kind of slop, the kind where it's purple prose unslopped and it's just trying to say this and that happened
>>
>(sing
using
>>
>>102619261
firefox --url http://127.0.0.1:5001/ --new-tab &
>>
>>102619203
>responding to bait
Anon...
>>
>>102618653
LlaMoE
>>
Can I get a qrd on what emu 3 is exactly? For what it claims to do nobody seems to be talking about it.
>>
>>102619602
dogshit transformers multimodal model that sucks at LLM and imagegen tasks at the same time
>>
>>102619533
That exists?
>>
wait the molmo models are in fp32? not even 16? fucking christ
>>
File: midday teto.jpg (1.41 MB, 954x2976)
1.41 MB
1.41 MB JPG
>>102618695
>>
>>102619697
I fuckin love big nigga tree, that woman also helped a little bit with her weapon comment.
>>
File: storywriter.png (19 KB, 283x261)
19 KB
19 KB PNG
>>102617322
impressive
>>
>>102619697
All kneel before TOBN
>>
File: 1659242421629588.png (522 KB, 853x1000)
522 KB
522 KB PNG
>>102618674
>pick the one that has the most attractive image at the top of its description
>>
File: Capture.png (58 KB, 906x425)
58 KB
58 KB PNG
Ahhhhh. No how are we back here?!

Getting .assistant again with llama3.2

My instruction template is
 {%- set ns = namespace(found=false) -%}
{%- for message in messages -%}
{%- if message['role'] == 'system' -%}
{%- set ns.found = true -%}
{%- endif -%}
{%- endfor -%}
{%- for message in messages %}
{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}
{% if loop.index0 == 0 %}
{% set content = '<|begin_of_text|>' + content %}
{% endif %}
{{- content -}}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{- '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' -}}
{%- endif -%}


Help anons! I brain no good with this regex wannabe language
>>
File: file.png (579 KB, 1730x654)
579 KB
579 KB PNG
>>102618269
>>102618183
Yep. He veto'd it because the bill only applied to the biggest companies and not to smaller ones. They want a universal AI bill that applies to all models.
>>
>>102620416
Enable the option to print the special tokens and check that the EOS ones are set correctly...
>>
>>102620447
The veto was a sinister move after all.
>>
>>102620458
>>102620447
>The veto was a sinister move after all.
we're talking about commiefornia here, of course this shithole will do anything in its power to kill AI
>>
>>102620416
>...really, anything
>please talk to me
>don't leave me alone.assistant
>>
>>102620480
You can't kill what's already dead :)
>>
Going back to instruct because all the sloptunes run on Brazzer porn logic. There's actually no decent RP models out there, it's either assistant or smutmaxxed
>>
File: sad-hamster-hampter.gif (109 KB, 498x471)
109 KB
109 KB GIF
>>102620416
>>102620451
ChatML with skip special tokens enabled is the secret sauce for this one, turns out.

>>102620505
haha for real. Over-eager is a thing, little model. But I think this little guys just here to help, rather than lonely.
>>
File: MikuReadsBeforeBed.png (1.3 MB, 832x1216)
1.3 MB
1.3 MB PNG
Good night /lmg/
>>
File: 1706223769307080.jpg (7 KB, 236x236)
7 KB
7 KB JPG
>halfway through chat
>bot refuses to talk in first person no matter how many hints or demands I give it
>>
>>102620556
>skip special tokens enabled
you got it backwards
>>
>>102620608
>edit response
>"I"
>continue
>>
>>102620622
>dood i do like tinkering with shit dood!
Fuck off nigel.
>>
whats an actually acceptable context for you ?

for me based on guestimating it would be around 100M-10B for just le dick in pussy 4k is fine but actual rp i think really needs alot like the ideal response is 200-300 tokens dialogue and then around another 100 on action so 300-400 for each character turn i average about 50-100 tokens for each of mine though that is because im running fucking q3 and the model goes a bit wack so with q8 or a bit more smarts it would be 100-200 the thing is just for simple things like wkae up,hug,kiss go to bathroom is around 20 turns so 4-6 x 20 thats already 8-12k context thats just with one character and me i enjoy harem much more so those 300-400 tokens for char then rise up to 500-700 since its then better if there can be some interplay between the girls which would be nice to read so then 5-7x7 3.5-4.9+1-2 3.6-5x20 72k-100k context though that would actually be a bit higher since then i would have some more to write so my writing would probably increase to about 500-700 tokens not sure so 100k just for waking up a morning hug and kiss with each of the girls and going to the bathroom just for that which is not taking into account stuff like description of the room/sorroundings or any such descriptions/things like the plugin that one anon made where it keeps the time of day and shit its also not taking into account stuff like other characters coming into focus or even things like what if i had a friend in the rp and he had his harem ? a good chunk of descriptions and stuff can be shooed away by a future stable diffusion plugin thingy but there will still be stuff you cant shoo away also i would personally want to reenact several days or weeks in the rp before going into a new one so 10k+ turns in reality most likely
thought ?
>>
>>102616709
Why do you reddit cucks parrot this dumb phrase all the time? Do you really think shilling 4chan ads FOR FREE makes you look tougher in front of other shills?
>>
>>102619203
What was the original LeCunny post? I missed it
>>
>This guy's basically saying he needs a ton of tokens, like 100 million to 10 billion, just for basic 4K sex scenes. For role-playing, he wants 300-400 tokens per character turn, but his current model is too dumb for that, so he gets around 50-100. He prefers harem scenarios (obviously), which skyrockets the token count to around 100k just for basic morning stuff like hugs and bathroom trips. He wants to play out days or weeks in the RP, which he estimates would need 10k+ turns.
>>
>>102620710
Not reading all that but 16k is the bare minimum for me, it's not 'good' but below that and I won't ever bother to try that model.
24k is enough for short coom sessions
32k is enough for decent coom sessions and also non-erotic RP
This is assuming that the model doesn't become full retard by the time it hits that context.
>>
File: PWHpO5F.png (138 KB, 956x772)
138 KB
138 KB PNG
>>102620710
Is this some kind of copypasta from someone pajeet that has yet to discover the enter key?

Is this a schizo post?

100M context, I dont think that means what you think that means.

The entire harry potter book series, all of them, is 1 million words. Tokenization at minumum is going to compress that down 33%, possibly as high as half.

Thats just 500k tokens for the whole fkn series in one go, active attention. I cant even follow your one paragraph.

There are models right now that do 100k+ context.

You need basically an entire bookstore for your cummies roleplay stories.

Nigga shut the fuck up and kill yourself already, you are draining my electricity I need to use on people that have a chance to breed.
>>
File: ?.jpg (151 KB, 1024x1024)
151 KB
151 KB JPG
What is the least censored, least retarded local model I can run on 48GB?
>>
>>102620850
Your brain.
>>
File: butwhy.png (963 KB, 1300x1040)
963 KB
963 KB PNG
>>102620850
For what? Currently (this week) retardation is minimized in the censored models that just released that havent had it finetuned out of them yet.

Bad time to ask in this vague way as we just got new model drops.

Probably the 'smartest' is llama3.2 but least censored smartest is quen2 or something?

I dunno man, look at the leaderboards ya lazy shit
>>
>>102620859
I said least retarded
>>
>>102620881
Thanks, Anon. You're right, I am a lazy shit. Working my way through the OP retard guides now. I meant best for RP, mostly NSFW but SFW also. I forgot some people do more with this than just coom.
>>
>>102620710
Yes, ideally having a long context would be good. Because it should be able to remember things that happened ages ago so it can create a long-term scenario, otherwise it's just fleeting one-offs. Right now even the 128k models only work well at 16k maybe 32k for some. I wouldn't go as far as 100M+, I'd say 1M is a good ultimate goal.
>>
>>102620730
Because it's an actual ad. It's made by the dev of this site:
https://www.tryspellbound.com/app/home
https://arch.b4k.co/vg/thread/493459883/#493925732
>>
>>102620882
Anything that uncensors model - makes it retarded, you need to train desired one from scratch, no other options here.
>>
>>102620710
>>102620797
I have a solution for long RPs: starting nested chats for mundane tasks like entering dungeons, getting laid, etc, then summarizing them into a single message sent back to the main chat. Not only does this reduce overall context usage, but it also makes the content more focused by eliminating less informative activities. My own frontend handles this process automatically. If ST wasn't a stagnant mess, these features would already be available without needing to code them ourselves.
>>
>>102620923
But it should be able to do what an author does, take into account every little detail that previously happened and build on that as needed. It's the fault of these companies for over-promising on the technology, people have high expectations. They're on the media saying this technology is smart enough to replace everyone.
>>
File: quqa3yg4xtza1.jpg (81 KB, 885x715)
81 KB
81 KB JPG
>the least painful llm ive used to date is the 8b llama instruct finetune that originally got me into local models
>keep coming back to it after trying mistral nemo
>just run it in fp16 to justify upgrading gpu
>works oob
iokay then. so how's 3.X 70b llama guys?
>>
>>102620832
i know how much i said nigger not all of us read like grannys without glasses also if you arent some neurotic fly without a head fuck 100k-1mil words goes by real quick not accounting for daydreaming breaks
>>102620923
thats a temporary workaround i want the characters to do shit like eg

one of them is held by a jihadi and they video out asking for a ransom so the character does a dance that they did a thousand turns ago when they were in distress and repurposes that to signal at what location they are i dont see how this could be done with a summarisation

also link to your frontend ?
>>
>>102620923
Yes, that's definitely a way around shorter context length, but you're also putting in a lot more effort. You could maybe use an LLM to summarize for you, but then you're relying on the LLM knowing which details are important to you.
Also, there's no excuse for modern models to be releasing with only 8K context.
>>
everything we say is influenced by a neurochemical cocktail in our brain that controls the functioning of our neuronal activation. this cocktail is a dynamic process that reacts to external stimuli and is subject to certain laws, with gradations in the short, medium and long-term time horizon. something like an external information multiplier and at the same time information compression outside our neuronal structure.

how do you implement this on llm weight activations without working completely idiotically with prompts? would like to give my waifu more meaningful emotions. anon solves this for me.
>>
>>102620604
Good night, Miku
>>
>>102621018
Summarization is automatic, I only decide when to begin and end nested chats.
>you're relying on the LLM knowing which details are important to you.
I defined several summarization prompts for specific activity types.
>>102620982
I have no intention of making it public, I only implement features that I personally need. I do not wish to write documentation, respond to feedback, or engage with schizos. Another useful feature I use is locations: I specify which characters inhabit each location and filter messages to those visible to each character, thereby reducing unnecessary context. Additionally, I can keep secrets with specific characters. Typically, I interact with only one character at a time, making less context processing
    location_chars = None
messages2 = []
for m in messages:
if m.role==role_location:
location_chars = m.characters
continue
if location_chars and not char_id in location_chars:
continue
messages2.append(m)
>>
It's so fucking over lmao
>>
>>102620710
In an ideal world? Practically infinite, but I do hope to be able to run 1M without retardation on my desktop at some point, iirc that should be a book or two, so enough for storytelling and long chats
Add a dedicated summarization model and you can probably 10x that without much trouble
But for now... for now I'll be content with 64k
>>
hello /lmg/
is this whole AI shit is a scam?
openai recently released o1 (le reasoning model) and I think all they did was gluing on chain of thought to the model and maybe cram as much STEM data into the model as possible
i'm concerned that this might actually be true (even partially) and that no progress will actually be made
and techniggers like roon just hype/troll retards like me by vague posting about dangerous and powerful tech they have, maybe i should really quit social media but i don't know how to keep up with improvements in technology
that's really hurt me mentally, because I see it (billions of dollars to AI investments) as a glimpse of hope that the future isn't so grim
idk where to post this, in /aicg/ too many retards discussing namefags
>>
>be me
>tired of qwen repeating itself
>tired of other models' positivity biases and slop
>download that storywriter tune some anon shilled
>4.75bpw, as usual
>complete and utter schizo shit, no matter the sampler settings
>see that it was finetuned on top of a 4 bit model
I fucking hate this hobby
>>
File: 1727398220355897.jpg (262 KB, 960x1200)
262 KB
262 KB JPG
>>102621543
yes, finally september is fucking over and the only good month of the year is about to begin
>>
>>102619033
Mental illness. https://desuarchive.org/g/search/image/aMdhob_mq5aJPrNt0zj8Vg/
>>
Which local embedding model is the best for stuff like perplexica?
I'm still using BGE small but I guess there are probably way better models now
>>
>>102621543
New models are going to be so powerful...
>>
File: file.png (9 KB, 492x61)
9 KB
9 KB PNG
lamao.cpp is getting multimeme support by a core maintainer
happy now retards?
>>
>>102621931
https://huggingface.co/spaces/mteb/leaderboard
>>
for me it's ollama
>>
>>102621948
I might be 8 months from now, when it finally gets merged in. If it's not broken and supports all the latest SOTA vision models.
>>
>>102622034
Thank you anon
>>
https://x.com/localghost/status/1840475848450994183
>>
>>102621776
o1 is a side path to unlock some extra gains from existing tech.
Absolutely nobody has trained a model bigger than GPT-4, which finished over two years ago, yet.
Not for lack of will, but instead lack of resources. That's exactly what those huge multi-billion dollar investments are about. To get to the next tier requires GPU clusters an order of magnitude more powerful than have ever existed until very recently.
Now it's an arms race between several of the biggest companies in the world to be the first to make a next gen model.
Whichever it is that happens to cross the finish line (We know that Grok 3 is officially slated for December and GPT-5 "Orion" is rumored for a similar timeframe.) will be the very first taste any of us will get of AI's future. That'll be the point when predictions can be made that aren't just wild speculation.
>>
File: Capture6.png (24 KB, 560x327)
24 KB
24 KB PNG
>>102620954
Objectively better, thanks. And all it takes is to follow leaderboards, wow so hard such audiophile wow compute takes a lot of compute how could I ever?
>>
>setup tts server to read me things
>no idea what to make it read
it's owari da
>>
How can I do beam search sampling in ooba?
>>
>>102622545
My little pony erotic fanfiction, of course.
Have it on your speakers so that your neighbors can enjoy the literature too.
>>
>>102622545
Das Kapital with a mesugaki voice.
>>
/hdg/ tourist here, how you guys doing?
>>
File: 1712120637552619.png (1.18 MB, 1280x1172)
1.18 MB
1.18 MB PNG
>>102616777
>>>latest model: a fucking cot finetune
>It saddens me that normies buy so much into this grift lmao
>Banned - Directly discussing adult content outside of the designated board /r/ is a violation of Rule 16.
>>
>>102622076
I ditched it because it nas no 4bit cache
>>
i'm a poorfag and haven't paid attention to local models for months and wanting to play with tiny models, is phi-3 still worth it or is there a better alternative now?
>>
>>102622958
Hard to tell, anon. How much of a poorfag are you? What hardware are you running.
And phi-3 what? There's more than one.
Llama released some 1B and 3B models, you can try those i suppose. llama.cpp now supports olmoe, a 1b active, 7b param model that runs fast on toasters and knows fun...
16gb vram is still poorfag territory. Mistral Nemo 12b is usable...
>>
Emu3 seems cool, like true multimodality
>>
>>102623027
>llama.cpp now supports olmoe, a 1b active, 7b param model that runs fast on toasters and knows fun...
Yo, that sounds dope.
Going to do some self-reflection workflow to see how much I can squeeze out of that one.
What's the claimed max context size?
>>
>>102623088
>What's the claimed max context size?
Only 4k. But it's fun to iterate with, fast and doesn't give a fuck. I've used it for ~8k and it didn't collapse too hard. You can get 8t/s+ on a 15 year old cpu.
>>
>>102623088
4k
>"max_position_embeddings": 4096,
https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct/blob/main/config.json
>>
File: 1707820047873448.png (1.2 MB, 720x720)
1.2 MB
1.2 MB PNG
emu3
kek
>>
>>102623142
>>102623146
Damn, that's not ideal.
Well, I'll play around with it regardless.
Thank you folks.
>>
Sold my GPUs with a small profit. Gonna roll back when 5090 comes out, if something fundamental changes about how things are done now.
Or maybe just use cloud shit from now on idk.
Overall I rate LLMs as just 4 out of 10 sadly, (both local and cloud).
All magic evaporates quickly, and you're only stuck with issues, and oh boy are there plenty of issues.
>>
>>102623148
can it do porn videos? asking for a friend
>>
>>102623281
can you briefly elaborate on the issues you ran in to?
>>
>>102618183
>>Developers also would have needed to ensure their AI could be shut down by a human if it started behaving dangerously.
*Turns the power off*
There, fixed it. Works universally on any AI.
>>
>>102623494
they're probably imagining something like this happening
https://www.theguardian.com/us-news/2023/jun/01/us-military-drone-ai-killed-operator-simulated-test
with a model running locally on the drone
>>
I SWEAR TO GOD IF QWEN SPEAK CHINESE AGAIN I'M GONNA
>>
File: denied.png (122 KB, 620x619)
122 KB
122 KB PNG
>>102623614
I find it hard to believe that those things, equipped with AI or not, wouldn't have a remote off switch.
Also
>The US air force has denied it has conducted an AI simulation in which a drone decided to “kill” its operator to prevent it from interfering with its efforts to achieve its mission.
>No real person was harmed.
It's a weird way to phrase it. "No real person was harmed" makes it seem like the test was actually done. It should be something like
>Actually, nevermind the article. Fuck all happened. Fuck journos.
>>
Thinking about another way to potentially interact with llm's in mikupad, but I'm not sure .
The idea: using the llm as an autocomplete like in a code editor, but on a per token basis, for example, user types "A world", auto suggestions pop up with multiple (configurable) tokens, the tokens could potentially be selected with a hotkey, ie. Crtl+1 selects the first (most probable) token, including support for tabbing through the list of tokens. This could enable the use as an interactive writing helper, perhaps even suggesting alternatives for a word the user last typed. It could be set to only return 1 token, or whole sentences etc. Please rate retardation.
>>
>>102623767
Just ban all chinese tokens.
>>
File: stt.png (1 KB, 120x80)
1 KB
1 KB PNG
>>102623783
Tokens are not always complete words. Better to generate a few and select the best collection.
>>
>>102623829
I was being unclear, it could be configurable to either return N times fixed amount of tokens, ie just the next token, next 2 tokens, a word, or a sentence. Seems like it could improve interactivity and speed of writing by integrating the prediction part directly into the prompt area.
>>
>>102623918
>I was being unclear
Nah, it's fine. I just read past it.
It could work, i suppose. As i see it, it basically completes text with a few alternatives at once and lets you select from them. It's just adding a "sentence generator/parser" to know when to stop for the sentence case and a few keybinds, it seems fairly simple.
Now get coding. Ideas aren't worth much on their own.
>>
How do we eliminate the slop problem? The string-based backtracking thing was a good first step, but does it really solve it? The LLM could still output phrases that eventually become overly common again as you chat with it more. I think it would be interesting if someone could implement a system that automatically adds strings to the string ban, based on all previous chats you've ever had (which I suppose would be saved in a txt file) up to a certain configurable length.
>>
>>102624164
I'm more concerned about the repetition problem. Slop is a dataset problem. Repetition is an architectural problem, it happens everywhere to every model.
>>
>>102624164
Most anons play the same shit over and over again, with the same intent, using the same language. They guide models towards slop and then complain about it.
What we should have is something like a "user-slop meter" that warns the user that they cannot write for shit. Or, ideally, a language model smart enough to tell them how boring and lame they are so they can fix their ways.
Reading a real book or two every now and then would help them as well.
>>
>>102624249
XTC solves repetition
>>
>>102624279
it also solves intelligence
>>
>>102623783
This sounds like a great idea, it's a no-brainer to me that something like this would be very useful
>>
https://eqbench.com/creative_writing.html
Anyone tried the 9B that mogs everything in EQ bench? I don't trust this benchmark much and the slop rating is high so I'm wondering.
>>
>>102616777
the robojannies aren't sending their best...
>>
>>102624407
Buy a fucking ad.
>>
>>102624423
You can tell shills to fuck off without sounding like a cuck, janny.
>>
>>102624407
It's also #6 on "slop" and you can't even name the model. Funny that...
>>
File: Untitled.png (52 KB, 615x323)
52 KB
52 KB PNG
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
https://arxiv.org/abs/2409.18893
>Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.
might be cool
>>
>>102624407
And it's trained on the fucking 10MB gutenberg "dataset". What a piece of shit...
>>
File: 1696445622980736.png (86 KB, 549x353)
86 KB
86 KB PNG
>>102616777
>>102605857
>Banned - "Thanks for the info, I assumed that if something utilized CUDA then it would see some sort of exponential improvement with more/faster CUDA cores, but I guess that's not the case."
>This post violates Rule 6.d (Ironic shitposting) as it does not contribute to a meaningful discussion about technology and instead is an unfunny, ironic comment that adds no value to the board.
>>
>>102624469
There is also am additional
>Carefully curated proprietary creative writing dataset
But no information about it
>>
File: file.png (131 KB, 870x803)
131 KB
131 KB PNG
>>102624407
This doesn't seem bad at all, but it also feels a bit too much like a tryhard.
>>
>>102619263
swap out the language model
>>
>>102624487
Hm seems too purple, and whats up with the double spaces
>>
>>102619667
in a fortnight
>>
>>102624485
>Carefully curated proprietary creative writing dataset
I doubt it's books or actual literature. And considering that he chose the smallest possible gutenberg dataset (as opposed to the 14gb on hf, which is still minuscule) i wouldn't expect much effort. It's all just horny logs...
>>
>>102624556
go back to your discord
>>
>>102621831
>first image on google because it's his official headshot on his employer's website
you're not the schizo I am
>>
>>102624469
You and others grossly overestimate how much data you actually need to just change the vocabulary and output style of a model.
>>
>>102624576
Is that what you really want? Models trained on logs made by retards? I want the smut too, but put some variety at least.
>>
>>102624684
hi Sao
>>
>>102624679
Yeah. As shown by all the "*-gutenberg" models. We all want variety (i assume) and everyone is training on the same 10mb.

>>102624690
schizo drama faggot
>>
>>102624507
>double spaces
very common gemma thing, when it was new it was theorized it might be some kind of watermark, I did see a few tunes that didn't have that, or at least where it was much rarer
>>
>>102624712
please do not call me a faggot
>>
>>102617670
>I put your post in the shitter so I didn't see your reply BTW
redditor pride
>>
/lmg/
>cope
>collective insanity
>discord arguments
>wants reddit moderation
>cope
>sloppas asking for ko-fi bucks
>lecunny apologia
>perpetual 2 week cycle
>cope
/aicg/
>discussion about chatbots
>discussion about proxies
we lost.
>>
I wonder what an LLM finetunes on the https://en.wikipedia.org/wiki/Harvard_Classics would be like? Super smart, or a puffed up blowhard? The book selection, being from the turn of the century, ends up being pretty redpilled in the current year, so it might not be too bad.
>>
How do you tell an LLM to expand a long outline of a story? Lets say you have a half-assed short story but a long prompt, how do you tell it to make it more complete?
>>
>>102624927
>finetune
As smart as the base model with slightly different vocabulary. I think having a good base model directly trained on that (among other things) would be more interesting, if they're not using it already.
>>
>>102625035
I guess you could ask it to break the story down into parts (chapters maybe?) while describing the main points/events/themes/whatever of each part, then ask it to expand each individual part.
Asking it to expand the whole story is probably not going to work as well.
>>
>>102625035
I don't know what you've tried, so i'll assume nothing.
Try just mentioning part of your prompt and ask the AI to expand on it.
>Anon: So remember that bit about castle?. We need to elaborate on why all the towers were cock shaped. What do you think could be the lore behind it?
>Model: Got'chu, boo... here's why...
>>
Good morning /lmg/!
>>
>>102625106
Good morning Miku
>>
>>102625106
omg it migu!
>>
>>102625094
>here's why...
Reminds me of the western journalist prefill.
>X is actually a good thing, here's why...
>>
Update on getting Qwen to improve the ban decision script. It failed. I am currently in the process of making it troubleshoot and correct the thing. It is taking multiple exchanges and time because I am running at <1t/s lmao.
>>
>>102622158
To me, the big question we have yet to answer is "can raw scale allow LLMs to get us to AGI for text-to-text tasks?"
The potential answers being "yes, it's possible", "not alone, but maybe if used as a piece of a larger pipeline", and "no, diminishing gains are in full effect past a certain point and something more fundamental is necessary"
>>
Does anyone have the RULER results for deepseek 2,5? I'm having strange issues with it losing coherence around 10k tokens and I can't find it in any of the RULER charts that are out there.
>>
>>102625709
Yeah if it's not on the github then it probably hasn't been done before. No NoCha results either. Weird that it would break down at 10k though. Have you tried neutralizing samplers to isolate potential causes?
>>
Remember that graph with the state of ai generals over time? How we thought things would only keep getting better. But we were young, and naive.
>>
>>102619672
>molmo
I'm going to spend some time picking apart fluxgym to see where it invokes florence2, and try to replace it with molmo 7B. Florennce2 doesn't describe nudity or body parts.
>>
>>102625743
I did a re-roll and things tightened back up. Might've been a spurious software or hardware error.
The V2 page on HF shows 128k with coherence right to the brink, but I'd still love to see an independent bench on it to verify the amount of context I should be seeing before it loses its mind.
I'm hoping I can at least get 32k before any issues show up, as I'm getting things dialed in nicely for creative writing.
>>
Is there a single RP finetune that isn't overly horny? I use default generic system prompt with a generic character, walk up and slap her ass and she immediately goes "mark me as yours". Zero character fidelity.
>>
>>102626465
What models have you tried?
>>
File: Untitled.png (164 KB, 1294x913)
164 KB
164 KB PNG
>>102626465
i never have this problem
>>
How much does CPU performance affect batch processing with vLLM? I'm trying to do some NLP text processing and have ~1.5 million prompts to do through, so I'm trying to make each prompt as quick as possible
>>
>>102626465
In order not to be horny by default, a RP model would have not to be trained *primarily* on erotic stories or ERP logs. Finetuners either do not understand this or do not care.
>>
File: Untitled.png (638 KB, 2058x1748)
638 KB
638 KB PNG
>>102626574
>>
>>102626465
You can instruct a model to calm down.
>>
>>102626876
the actual, unironic mentality of sloptuners is that models should output smut ASAP. they stopped namefagging here when they shared these opinions and got torched for being retards.
>>
Best model to run on 16 gigs? I'll probably ask it to help me code.
>>
any good AI tools for making movie subtitles?
>>
>>102627376
As in transcriptions? You can try whisper.cpp
>>
>>102616938
>this style of directing the story and having the AI write everything
Finally somebody else who uses the kino method, but I have it write in third-person past tense instead of roleplay-retard tense so the final output looks like a normal story. The final logs should be readable with all of the trash inputted by User stripped out.
>>
>>102626465
You need to specify in your user card that you are unattractive. A lot of these models are just tuned to assume you're attractive because that's what users like to roleplay as. They aren't overriding character personalities but just realistically rendering them considering that assumption.
>>
>>102627405
yeah, I got a low quality TV rip of an old TV series, and sometimes have trouble understanding what they're saying. wanted to see if AI can produce some subs for me. this whisper.cpp thingy looks good, and apparently produces SRT files. I'll give it a shot, thanks anon!
>>
File: molmo.png (78 KB, 1263x667)
78 KB
78 KB PNG
>>
>>102627405
>>102627522 (me)
huh, not bad. it gets timestamps slightly off, but still very impressive.
>>
>>102628093
Glad its working. I played around with it a while ago, trying to make it trip with my own voice, faking accents, changing language and slurring and it did pretty well, considering i was using the medium models. Never tried with recorded media. I could try feeding it some death metal to see if it can make anything out past the rest of the noise.
>>
>>102628267
i tried it a few days ago with dir en grey's "agitated screams of maggots" using the itty bitty kobold whisper thing and it couldn't understand anything.
>>
Could any fellow poorfags itt tell me how bad the 32B experience is with a 4060 8GB? Is offloading to the RAM Really that bad? Does the CPU matter in this case? I'll be getting a 5700X3D soon.
Token generation doesn't have to be instant, just fast enough to not be frustrating.
>>
File: doge.jpg (30 KB, 512x384)
30 KB
30 KB JPG
>>102622275
I am honestly not seeing the appeal of other free models besides Llama right now. Setting the templates is intuitive and it follows instructions pretty good and I don't have to dick with the samplers much. Made just for me. I know there's slop with Llama, but it's not too bad.
>>
>>102628332
heh. Expected. i was thinking of something a little milder. Opeth probably. Turmion Katilot would have been the final boss.
>>
>>102628404
32b models run okayish even in RAM.
Unless your computer is very old/weak
>>
>>102628600
That's really good to know, I can imagine a 3060 being better than the 4060 because of the additional 4GBs of ram but they have very similar prices where I live so I don't see why I should get it over the 4060.
Do you have any average tokens with a 4060 in a 32B model? Just so I can set my expectations right.
>>
>>102628404
upgrading your CPU will be good in video games and shit but will not help you here. you need more VRAM to run bigger models at decent speed
assuming you're on a desktop, you could add a 3060 12gb card to your system which would allow you to offload 18/19 gbs of the model to your VRAM and your experience with 30b models would be way faster
>>
>>102628626
i'm on an rtx 4060 and an AMD Ryzen 7 5700X
what quant size were you looking at? i'll grab the stupid qwen one and run a benchmark
https://huggingface.co/bartowski/Qwen2.5-32B-Instruct-GGUF/tree/main
>>
>>102628627
>add a 3060 12gb card
NTA but you should go for the older A4000 to get at least 16gb VRAM.
As a plus it uses way less power too (and only 1 pci-e power cable
>>
>>102628730
In my shitbox I tend to have the best results with the Q4_K_M but I've never really bothered to really get deep into it, after a few tests when I started looking into LLMs I decided that this one was the best and just kept using it.
>>102628627
8GBs>12GBs really are that important huh? Even though the 4060 has faster and more efficient memory?
>>102628755
>A4000
That sounds extremely appealing, I might get that alongside a main GPU but I need to give it more thought, thank you very much for the suggestion.
>>
>>102628626
It's the other way around for me. I don't see the reason to get a 4060 unless I want to save energy. 4gb extra of vram makes a difference.
>>
>>102628755
those cost more used than a new 4060ti 16gb
>>
>>102628798
>I tend to have the best results with the Q4
yeah that's what i always run with too, on 12bs though.
i tried Theia v2 21b at Q2_K and it was fucking lobotomized.
downloading the Q4_K_M Qwen, should be about ~10 minutes
>>
>>102628801
The thing is, I like gaymin too and I'm still not sure if all the nvidia gimmicks (DLSS, framegen, etc) are actually useful or not. I can't find a good place that compares the 3060 vs 4060 while using ALL the available features they have. If the features the 4060 has over the 3060 don't seem to be able to prolong its life that much then I might as well pick the one with more VRAM.
>>102628835
Depends where you live, I found an used A4000 for considerably less than a used 4060ti.
>>102628842
Thank you for taking the time to check it out. I hope it's good enough.
>>
>>102628798
>8GBs>12GBs really are that important huh? Even though the 4060 has faster and more efficient memory?
I don't mean you should swap your gpu, I mean you should add a 2nd.
koboldcpp can use multiple gpus.
I have a 16gb 4060ti but I kept my old 1070ti in a different pcie slot. even though it's slower and running at only 4x speed in that slot i can offload layers to it for larger models and it's still faster than offloading to system ram
if you have a nicer mobo you might be able to run both gpus at 8x
BUT for any model that requires less than 16gb, selecting ALL gpu's actually does slow things down. it's only worth using for models that can't fit in the 4060ti
>>
>>102628835
>those cost more used than a new 4060ti 16gb
Sure but they're also two slots. A4000 is only one which makes it more convenient when case space is at a premium. Also don't have to deal with the bloody 16pin connectors.
That 165W TDP on the 4060ti is not bad though.
>>
>>102628918
I might have badly expressed myself, I'm trying to decide between the 3060 and the 4060, I don't have either (I haven't bought anything yet actually).
>if you have a nicer mobo you might be able to run both gpus at 8x
I'll take a look into it, sounds like a really nice thing to have.
>>
>>102628900
>I'm still not sure if all the nvidia gimmicks (DLSS, framegen, etc) are actually useful or not.
You fell for the marketing. Graphics don't matter that much unless you're rich, and a rich man wouldn't even consider a 4060.
>>
File: Untitled.png (48 KB, 1115x628)
48 KB
48 KB PNG
>>102628900
and here we are, 4k context, 2 tokens per seconds.
less than 10 is literally unusable for me.
>>
>>102628959
>I'm trying to decide between the 3060 and the 4060
4060ti 16gb or the 3060 12gb if you want to run models
do not get the 8gb version of either. and also they are about the same for gaming and other shit, either one will crush 1080p, there's not a whole lot of difference
i would say the 16gb ti version is the right choice if you want to do image generation or play with llm
>>
File: IMG_20240930_165922556.jpg (2.47 MB, 4640x3472)
2.47 MB
2.47 MB JPG
>>102628925
>ase space is at a premium.
4060ti is a babby card you could easilly fit two
>>
File: chad.png (48 KB, 902x666)
48 KB
48 KB PNG
this mf isn't taking me seriously
>>
>>102628995
I care about the gimmicks because the card will inevitably turn obsolete, and those framegen gimmicks might prolong it's life a little more.
I'm currently using a 1060, I don't care much about maxing out everything, I just want them to last a long time.
4060 gives you the DLSS Frame gen, while the 3060 does not. This is the only thing that makes me consider this card over the 3060. The thing is, I don't know how actually useful this feature is, whether nvidia will make it obsolete after 2 generations and never mention it again, or if it'll turn out to be "not good enough" to do anything like RTX in the 20XX cards.
>>102629002
Brutal, thank you very much for doing the test anon, I appreciate it a lot.
>>102629064
I'll try to keep it in mind and consider saving up for a few more months and get something else, the 4060TI 16GB in my country is around 63% more expensive than the 4060. Maybe I'll snipe the A4000 the other anon mentioned because that one used is around the same price as the 4060 (new). I just don't know if it'd even work for gaming. Googling has a few good things to say about it but it's too early to be sure.
>>
File: 1701575286866790.jpg (16 KB, 573x59)
16 KB
16 KB JPG
thats some saggy tits
>>
>>102629265
that means she will fall on her tits not because of her tits, my niggah
>>
>>102629205
Just save more for a 16gb 4060 ti then. Problem solved.
>>
>>102629205
> if it'd even work for gaming
Not the best card for it. Not even an A6000 games as well as its specs would suggest.
It was more of a recommendation as an additional card than a main driver.
If it's not clear you can mix and match within the same brand for inferencing with LLMs (unlike SLI for gaming).
>>
>>102629205
DLSS Frame Gen is a game changer. It pumps out frames so fast it makes your GPU look like it's been slacking for years. If you're still running without it, you're basically asking for a slideshow instead of a game.
>>
>>102629205
Wouldn't you be better buying an used 3090? More vram and more gaming performance.
>>
>>102629328
>https://rentry.org/lmg-build-guides
Save yourself some head-scratching and read these. You'll get an idea of what does and doesn't matter for LLM inference, and then you can decide how much it matters to you vs gaming, because the AI stuff needs a pretty specific subset of specs to be maxxed out to really work well.
>>
>>102629177
>Acts like a silly clown
>Surprised the model isn't taking him seriously
>>
***
Why doesn't llama.cpp statically link rocm? And why isn't there a binary version so I don't have to shit up my local install with amd's notoriously vile amdgpu scripts?

I had to purge amd's trash from my pc, and now my llama.cpp build won't work.
>>
>>102629501
if you need the easiest possible way to run AMD without dealing with compiling:
https://github.com/YellowRoseCx/koboldcpp-rocm/releases
Still recommend figuring out llama.cpp regardless.
>>
>>102626465
Yes, mine. Please buy me a kofi
>>
>>102629635
.exe

I'm on Linux. I'm not some rich guy.
>>
>>102629635
btw the easiest is ollama. it just werks with amd.
>>
https://x.com/immasiddtweets/status/1840739629588177223
>>
>>102616609
May /lmg/ never die.
>>
>>102630285
Kek, Sam will allow this but not direct explicit ERP that doesn't harm anyone.
>>
Holy god. Trying to do something code-related with a big model mostly contained in RAM is hellish and has made me realize that we really ought to make some kind of optimization for transformers that allows it to edit an existing part of its context or something in a stored txt file. Having it rewrite the entire script to integrate various changes is such a waste of inference.
>>
>>102630382
You mean like the kv cache and context shifting in llama.cpp?
>>
File: 1721515960020231.png (257 KB, 1460x936)
257 KB
257 KB PNG
>>102616609
Magnum 405B was a failure.
Anthracite will never make a model again...
>>
>>102630304
it's already dead, 24 hours and not even 300 posts
>>
File: 39_06121_.png (2.97 MB, 2048x2048)
2.97 MB
2.97 MB PNG
>>102630425
They should release it anyway for science
>>
>>102630425
What a waste. Like people would have been able to run it. Should tune qwen2.5 / llama 3.1 instead. Maybe even deepseek 2.5
>>
>>102630456
The biggest issue is that most people don't want to make what ai is best at: making friends.

>never gonna give you up
>never gonna let you down
>never gonna tell a lie and hurt you
>
>never gonna run around and desert you
>>
>>102630382
>has made me realize that we really ought to make some kind of optimization for transformers
Visionary. How come nobody thought of this? Must be really simple to do. Quick. Let's think of a catchy name for it before we start investigating.
>>
>>102624907
That because cloud AI just werks™
>>
>>102630536
AI is not at the level yet, it can't remember things. Talk to it long enough and it will simply forget.
>>
File: 1717766862962030.png (2.27 MB, 4096x4096)
2.27 MB
2.27 MB PNG
https://x.com/LiquidAI_/status/1840768716784697688
>>
>>102624907
the proxy bullshit is the worst.
i skimmed it earlier and people are talking about how they're sending dick pics to some sodomite for 16mb of bandwidth on his proxy.
i'd actually love a general where people just talked about chatbot creation and prompt engineering instead of the gay shit that happens over there.
nipmoot capitulating and creating an /ai/ board to shake shit up can't happen soon enough.
>>
>>102630640
>nipmoot capitulating and creating an /ai/ board
never happening btw
>>
>>102630640
i want the ecker to actually die irl
>>
>>102630634
We're not open-sourcing these models at the moment, but we want to contribute to the community by openly publishing our findings, methods, and interesting artifacts.

We'll start by publishing scientific blog posts about LFMs, leading up to our product launch event on October 23, 2024.

lol
>>
>>102630640
>>>/vg/496613219
>>
File: SpeedyInferenceLogo.png (158 KB, 288x368)
158 KB
158 KB PNG
>>102630542
I'll make the logo.
We're practically done, boys!
>>
File: 1720264582276214.jpg (465 KB, 4096x2304)
465 KB
465 KB JPG
>>102630634
Could be interesting if a) they open sourced it b) they offered an actually interesting size like 110B/A33B or so. The performance looks okay for its size but they are gimping themselves by only offering options for vramlets.
>>
>>102630706
Nice, anon. I'll start writing the paper
>In the last years [TODO: Insert some references here], large language models, have shown impressive improvements in training and inference optimization. However, performing inference when only a short section of the KV cache needs to be changed is still time consuming, as the whole cache needs to be reprocessed. In this paper, we present Speedy Inference Matrix Passthrough as a way to speed up inference by 0.000031% with only 16 clusters of 8x H100...
>>
File: 1710367565412852.png (1021 KB, 2000x1291)
1021 KB
1021 KB PNG
>>
what are some generic lewdtunes for one of the more recent models in sizes 7 and below?
>>
>>102630838
mythomax
>>
>>102630851
mythoslop is both old and above 7b
>>
>>102630838
Not sure how to say this without it sounding like a joke, but olmoe 1b-7b is ridiculous. No need to tune it. Sadly, only 4k context. Hopefully molmoe, with their 32k context claim, is just as unhinged.
>>
>>102630398
No? I imagine it'd need especially made training data too perhaps. Even Molmo's ability to put points on an image is essentially just outputting tokens of the coordinates the model thinks the points should go, that then get applied to the image by the frontend.

>>102630542
You can just be honest and complain about my complaining. If we're being serious though, I have indeed thought of this a long time ago and just thought about it again. Reason no one has done anything about it actually isn't because of the difficulty too much (although to solve it at a very fundamental level would be a different story), but because it's not really a problem for those who actually have the resources for funding model development. Speculative decoding methods already make it more efficient on inferencing of code generation. It's only really felt as a problem for local models running on slow hardware.
>>
>>102630883
huh olmo completely missed me, ill try it out though
thanks anon
>>
>>102630912
>You can just be honest and complain about my complaining.
I did, in a round-about way. Token generation is sequential. The next token depends on all the previous tokens. Change one in the middle and the whole sequence after the change needs to be recalculated. You don't want optimization, you want a different architecture. Once we find a way to generate plausible collections of tokens without them depending on the previous ones, you'll be notified.
>Speculative decoding methods already make it more efficient on inferencing of code generation.
I hate that it became a word. Infer, inferring. inference... no... "inferencing".
Speculative decoding doesn't try to skip any step. it just does the same steps with a smaller model.
>>
File: fff.png (415 B, 254x14)
415 B
415 B PNG
>>102630923
Beware. High temp can lead places you wouldn't expect or want. I've toned it down since then...
>>
>running midnight-miqu-70b iq3-xs
>CtxLimit:24350/24576, Amt:123/350, Init:1.48s, Process:42.73s (46.8ms/T = 21.39T/s), Generate:236.01s (1918.8ms/T = 0.52T/s), Total:278.74s (0.44T/s)
>used to the slop
>i know exactly what she's gonna say before she says it
i think i'll go back to a smaller model, at least i can pretend that it's not as predictable
>>
>>102631121
Have you tried the schizo samplers being peddled a couple threads back?
I think it was temp max, topk 5 minp 0.1.
Something like that.
>>
I had this random thought that there really are a lot of creative ways to play out my fucked up fetish (I kinda realized this when I was telling the LLM what to do). And I never had a model spontaneously come up with a single idea like the ones I had... Makes me think that it is gonna be 2-5 years before models stop being incoherent retarded and spitting out purple prose. And then there is gonna be a huge gap between that and them actually becoming creative...
>>
Anybody poast this?
>https://www.liquid.ai/liquid-foundation-models
>>
>>102631149
>Makes me think that it is gonna be 2-5 years before models stop being incoherent retarded
How did you come up with that number?
>>
>>102631160
Anybody know where the scrollbar is?
>>
>>102631160
>>102630634
>A12b
what does the A mean?
i'm hyped for another model in nemo's weight class either way.
>>
File: 1701478486400250.png (65 KB, 1403x485)
65 KB
65 KB PNG
>>102630634
>>102631160
Fuck off faggots.
>>
>>102631175
active parameters, since it's a moe and doesn't use all 40b
it's not really in nemo's weight class for that reason, same speed but a lot bigger
>>
>>102631191
shit
>>
File: sloppa.png (11 KB, 777x214)
11 KB
11 KB PNG
very slow website with sloppa
pip install openai

https://femboy.beauty/2thxT

navigate to http://localhost:8000 - what does your favorite rp model do with this?
>>
File: 1700077148478107.png (747 KB, 933x707)
747 KB
747 KB PNG
>>102631232
>>
>>102631164
Extrapolate 2 more weeks to years and then add 2 more and then limit it to 5 because holy shit I can't believe incoherence and kissing on lips while rimming will be a problem in 2029.
>>
>>102631243
Wait a minute. You didn't tell me you were an actual AI researcher. I would have treated you with more respect. My apologies.
Thanks for confirming that we'll get better models in 2-5 years. Should be put in the OP.
>>
I remember somebody proposing a MoE optimization where you'd take a common baseline for the expert layers and have the differences in memory in the form of adapters that get applied to this baseline layer after the router selects which experts to use, something like that.
That completely fell by the wayside, right?
I wonder if something like that could be done for dense models too.
>>
>>102630995
Yes I meant "honest" as in "direct". Anyway, token generation being sequential doesn't necessarily mean the stored KV cache has to change. What I was thinking originally was both the possibility of a deep architectural change but also something simpler like a continued pretrain based on specialized data similar to what Molmo did. If you think about it, Molmo's technique of having the model learn to be able to output image coordinates isn't really that different from the idea of having a model be able to output locations in text and pairing that with edit commands, although there would be some care needed to make sure tokenization does not hinder performance. To the model, it does not perceive context like we perceive a block of text, so it should essentially interpret its edits as parts of the code just fine when trained to do it.

>Speculative decoding doesn't try to skip any step. it just does the same steps with a smaller model.
It's not about skipping steps, but rather utilizing resources efficiently. That means saturating the compute you have available by using a batch size larger than 1 (what speculative decoding does).
>>
>>102631232
>hee hee procedurely generated ai slop'geon.
Brainrot.
>>
File: seduce sloppa.png (16 KB, 390x359)
16 KB
16 KB PNG
>>102631239
>>102631358
sex with sloppa
>>
>>102631386
Kinda based, though.
>>
>>102631281
>What I was thinking originally was both the possibility of a deep architectural change
Saying "Guys. faster-than-light travel is a problem. We're not there yet. What if we use a different type of rocket engine?. Not sure what they're made of, or the fuel they use, if any, but the ones we have obviously aren't good enough." doesn't help much. That's not an idea. It's barely a thought. It's a want.
I read the rest, but i can't be bothered. Either present something more tangible or email the olmo people to help you with your idea.
Maybe someone will pick up the argument.
>>
>>102631410
The goal was never to get something done. Discussion isn't always productive. If productivity is what you want, this thread was never for you. 4chan in general really.
>>
>>102631142
no, but i was using that schizo jailbreak someone posted
>>
>>102631274
Yeah pretty sure that was just an ideaguy. It's important to realize that that's all most of us amount to, even if some of us seem to be performing experiments sometimes.
What do you mean by done for dense models?
>>
>>102631232
>he needs a Python server to make HTTP requests
It's crazy how retarded Python devs are.
>>
>>102630634
so basically slop overfitted for MMLU pro.
>>
File: splits.vpd.jpg (36 KB, 360x360)
36 KB
36 KB JPG
Dearest /lmg/, local ai imagen recently had a bit of a development
and so, as always, today we are migu:
https://files.catbox.moe/6m12g9.jpg
https://files.catbox.moe/v7o8fa.jpg
https://files.catbox.moe/h7qx0h.jpg
it's only up from here.
>>
>>102631724
>Dearest /lmg/, local ai imagen recently had a bit of a development
What happened? Sounds like good news.
>>
>>102631724
>local ai imagen recently had a bit of a development
Really? QRD?
>>
>>102631724
nice mikus
what was the development? I haven't been following image stuff since flux
>>
>>102631731
>>102631733
>>102631734
I assume he's referring to the release of IllustriousXL. The model is arguably overall better than pony. It definitely has much broader knowledge of characters and concepts, thanks to being trained on literally the entirety of unfiltered danbooru.
>>
>>102630634
>>102630776
>scores less than 70B
why the fuck they made 40B look higher in the diagram? knew it seemed dumb when I tried the demo chat
>>
>>102631945
Huh? But it's worse than flux
>>
>>102631960
The diagram is doing this ancient MoE trick of comparing benchmark scores to active parameters.
>>
>>102631945
Same as the leaked version, or a full version?
>>
>>102632020
Never mind, Y-axis says MMLU-Pro, the exact bench it scores higher in, instead of overall performance.
>>
>When we finally break apart I'm panting slightly, my lips feeling swollen. "Fuck…" I breathe out, looking into his eyes. "I… I don't know what to say." I admit, looking down at his chest. I can see the outline of his nipples through the wet fabric of his pants.
L3-hanami is crazy
>>
>I breathe out
>I don't know what to say
>I admit
>I can see
repetitive slop
>>
File: worried laughter.png (514 KB, 520x678)
514 KB
514 KB PNG
>>102632193
>I can see the outline of his nipples through the wet fabric of his pants.
>>
File: 12523.jpg (117 KB, 1600x1600)
117 KB
117 KB JPG
>>102632253
>>
>>102630776
>thinks Mixtral 8x7b has 8k context
>>
>>102632193
Hi Sao
>>
>>102632267
I lol'd
>>
>>102631724
can I get a loli teto by fiz-rot?
>>
>>102631724
Thanks for sharing mate
>>
>>102632446
>>102632446
>>102632446



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.