[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_06311_.png (2.09 MB, 1080x1920)
2.09 MB
2.09 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103230385 & >>103227556

►News
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: ComfyUI_00050_.png (2.01 MB, 1536x1536)
2.01 MB
2.01 MB PNG
►Recent Highlights from the Previous Thread: >>103230385

--Papers:
>103230412 >103232916 >103232963 >103233112
--Largestral model testing and comparison:
>103231709 >103231741 >103231795 >103231815 >103233787 >103232657
--Running LMM under 12GB VRAM limitation with image processing:
>103234763 >103234794 >103234802 >103234846 >103234856 >103235035 >103235120
--Issues with Largestral and Llama3 models:
>103232173 >103232358 >103232365 >103232374 >103232530 >103232541 >103232873
--Is data scaling dying, and what's next for AI research?:
>103231962 >103232002 >103232036 >103232207 >103232260
--How cloud LLM APIs achieve fast prompt processing:
>103230808 >103230820 >103230827 >103230866 >103230883 >103230901 >103230867
--Efficient model optimization technique using submatrix updates:
>103231415 >103231437 >103231519 >103231627
--Discussion of Mistral-Large-Instruct model's performance and quantization:
>103232834 >103232886 >103232951
--Discussion about gpt-sovits project and its improvements:
>103233048 >103233074 >103233189 >103233249 >103233308
--Current state of NSFW detection models:
>103234436 >103234851 >103234898 >103235673 >103235984
--Critique of Nala test writing:
>103233025 >103233105
--Asterisk notation for narration in text formatting:
>103231120 >103231144 >103231341 >103231515 >103231567
--Anon struggles with OCR and text translation for PC98 games:
>103231641 >103231650 >103231659 >103231665 >103233629 >103233711 >103234062 >103234088 >103234142 >103234152 >103235609 >103235660 >103235710 >103235972 >103236416 >103236525 >103236446
--Anon shares disappointment with new model's performance, recommends alternative models:
>103233166 >103233202 >103233227 >103233241 >103236679
--Miku (free space):
>103230542 >103235636 >103235926 >103236136 >103236377 >103236416 >103236795 >103237316 >103237419 >103237424

►Recent Highlight Posts from the Previous Thread: >>103230446

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: ComfyUI_00035_.png (2.09 MB, 1536x1536)
2.09 MB
2.09 MB PNG
►Recent Highlights from the Previous Thread: >>103230385

(2/2)

--Anon gets Qwen2.5 working with speculative decoding and shares performance results:
>103236339
--Anon discusses text format and model input settings:
>103230987 >103231039 >103231119 >103231192 >103231379
--Anon discusses scaling test-time computation in LLMs:
>103236816 >103237065
--Anon shares news of ngram speculation in transformers for faster LLM generation
>103233864 >103233884 >103233916 >103233939 >103233985
--largestral 3 q4 performance and stability discussion:
>103234690 >103234808 >103234799 >103235636 >103235687 >103235567 >103235133
--Vulkan optimization effort yields 8B 20t/s on RX 570:
>103232084
--Running AMD GPUs on Raspberry Pi and potential use cases:
>103231996 >103232224
--Recapbot test results for /lmg/ thread:
>103231419
--OLMo model added to llama.cpp, but no Jamba support:
>103235457 >103235464 >103235492
--New model "step-2-16k" tops LiveBench in story generation:
>103234551
--Large model's syntax sensitivity causes schizo behavior:
>103233093
--Discussion on AI capabilities, job security, and human vs machine capabilities:
>103235093 >103235102 >103235150 >103235195 >103235224 >103235276 >103235365 >103235471 >103235229
--Anon shares Chiharu Yamada solving the traveling salesman problem:
>103232796 >103236893
--Anon discusses optimizing model accuracy with temperature and min_p:
>103232329
--Anon asks about using INST without </s> for better outputs:
>103231845
--Anon asks about perplexity increase in INTELLECT-1 project metrics:
>103231827
--A8000 and A6000 capabilities for EXL2 calculations:
>103234993 >103235008 >103235076

►Recent Highlight Posts from the Previous Thread: >>103230446

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Teto my beloved

https://www.youtube.com/watch?v=Soy4jGPHr3g
>>
File: 1732007145152968.png (707 KB, 728x512)
707 KB
707 KB PNG
>>103237728
Did recap bot automatically detect Miku here?>>103237419
>>
>>103237806
If I was an AI I would detect Miku in there.
>>
>>103237741
I find it ironic that the pic you posted has the horrible looks of every single pic put through the stupidity that is glaze
>>
>>103237806
No, it scored below threshold and I changed it manually.
>>
>>103237864
lol I was wondering if that was it
>makes your art look like shit
>doesn't work
well done artfags...
>>
>>103237925
So many artworks ruined by that shit, it would be hilarious to me were it not for some artists I like using that shit
>>
File: komfey_ui_00067_.png (2.91 MB, 1664x2432)
2.91 MB
2.91 MB PNG
Miku is going on a journey and leaving /lmg/ in Teto's capable hands. See you faggots tomorrow!
>>
Is it just me or is the new Largestral just a sidegrade? Is this why Mistral didn't publish any benchmarks?
>>
File: LLM-history-fancy.png (806 KB, 6273x1304)
806 KB
806 KB PNG
An era has ended. Thoughts, suggestions? What will be the next era? Who will dominate? Will we start hitting the wall?
>>
File: 9dKSf8IlXNM-HD.jpg (243 KB, 1280x720)
243 KB
243 KB JPG
Brainlet here:

I've got a debian 12400+32gb ram home server I could slap an older RTX GPU (2060/3060) into to for AI tasks.

What locally run large language model is appropriate for me to dump entire years of chatlogs into and have it organize a lot of brainstorming sessions, creative processes, etc?

I'd prefer something with no telemetry but that's not a dealbreaker
>>
>>103238188
It doesn't feel like an Era has ended. Are you sure?
>>
>>103238188
>only tune mention is Behemoth
Kill yourself shill
>>
File: 1704019298357895.png (1.45 MB, 1202x1400)
1.45 MB
1.45 MB PNG
>>103238255
This one?
>>
>>103237720
>new Teto thread already
okay here's more Teto kino slop.
>>
>>103238216
NTA but it kinda feels like something different than the llama3 era, Im personally more hopeful with mistral, qwen and new image medels
>>
>>103238188
>Large
>top model
>it's a 70B side-grade
It really is a Kobold Discord chart. The top model is Qwen2.5. Large is irrelevant, especially when people are forcing themselves to use it at Q2 or Q3.
>>
>>103238188
My understanding is that companies are shifting to a big focus on Multimodels. If this ends up being true, it would make sense that the next era is the era of multimodels.
>>
File: 39_6718_.png (3.07 MB, 1280x1280)
3.07 MB
3.07 MB PNG
It's Tuesday and everything is falling into place
>>103237741
For me it's the UTAU version from the chad yasai31: https://www.youtube.com/watch?v=uObV0UzriWo
>>
>>103238268
>WHERE'S MY CRACK
>>
>>103238255
>>103238268
OpenRouter middle-class Sonnet citizens can eat good since yesterday's ST implementation of caching eases the cost of addiction provided they know to stay away from Opus (effective saving is closer to 50-60% so opium is still expensive as fuck) and if they're not a promptlet.
>>
File: 15.png (73 KB, 923x781)
73 KB
73 KB PNG
It looks like INTELLECT-1's training will be done within the week. I wonder if they will release it the second it is done training, or if there is something else they have to do with it before then
>>
>>103238216
I think that it's the same situation as with merge era, chronologically it ended, but nothing significant enough happened to justify starting a new era. Meta plans dropping L4 in Q1 of 2025, new Largestral didn't even dare posting benches, so unless someone else drops something big we'll have this boring transitory period again.

>>103238227
I've tried Magnum, Lumimaid and Tess and I didn't like them. Make a good tune and I'll add it.

>>103238306
I'm sorry to hear your disappointment in my chart, but I am not a member of "Kobold Discord". Do you wish to invite me there? Qwen 2.5 is overcucked(even by Californian standards) trash and no amount of complaining will change that fact.
>>
File: 38959486.jpg (199 KB, 832x1216)
199 KB
199 KB JPG
>>103238306
>people are forcing themselves to use it at Q2 or Q3
Projecting poorfag with the chinkshit model cope shitting up the board as usual. Just have money lol
>>
File: 1723603405239769.png (458 KB, 1056x1056)
458 KB
458 KB PNG
Bet all models still fail to answer this question
>>
I'm going to do it bros, I'm going to buy rx7900xtx and start doing ai shit.
>>
>>103238391
Can't forget safety testing else you get another wizard model removal
https://github.com/NVIDIA/garak
>>
File: 11__00729_.png (2.11 MB, 1024x1024)
2.11 MB
2.11 MB PNG
>>103238193
>pic
Great playlist.
Use the card with as much VRAM as possible. 12gb would be minimum. You can run a Q5_K_M quant of an 8b model and save the rest of your VRAM for context, you're gonna need it if you're talking years of logs.
It may be a better result finetuning the model on those logs if you're up to it.
>>
File: 1710469556487335.png (173 KB, 814x1352)
173 KB
173 KB PNG
>>103238430
Of course I'm going to be using something based like Qwen2.5 72B at 8 bits.
>>
>>103238414
The thing is, there's no good tune.
>>
>>103238441
They just avoid "being offensive" by default.
>>
File: LLM-history-real.png (1.6 MB, 6274x1479)
1.6 MB
1.6 MB PNG
>>103238188
The real chart.
>>
File: 1868 - SoyBooru.png (755 KB, 1016x900)
755 KB
755 KB PNG
>>>103238188
>The real chart.
>>
>>103239275
I use Qwen2.5 7B for my assistant sometimes and find it pretty usable. Tried Ministral 8B and it was beyond garbage, similar to Llama3.2 3B.
>>
>>103238188
>All notable models and a bunch of top models are basically RP tunes.
If you actually looked for intelligence, the mentions would have Qwen and Yi earlier and some other things. And also not noting Gemma 2 is a crime also given how unique it is and 27B is still top dog for multilingual things locally.
>>
>>103239275
based alert!
>>
>>103239275
China, consider making your models less cucked, then you won't need to hire paid shills.
中国,考虑一下让你的模型硬起来,免得老是像被阉了一样,还得花钱雇水军。
>>
>>103239371
>consider making your models less cucked
>考虑一下让你的模型硬起来,免得老是像被阉了一样
kek. nice translation.
>>
>>103238455
>AMD

lol
>>
File: Look at this GRAPH.jpg (17 KB, 360x360)
17 KB
17 KB JPG
The real chart
>>
>>103239275
cringe
>>
>>103239347
I've considered adding them, but I didn't like them when I used them. Yi went schizo for some reason, Gemma felt broken and has >8k context, for the same reason I excluded llama3 from notable models. Previous Qwens were meh, but notable enough to add, and 2.5 is turbocucked.
>>
File: I love you.jpg (117 KB, 652x847)
117 KB
117 KB JPG
Fucking love my human made abomination
>>
>>103239642
Knowing the meaning of everything in this pic should be a requirement to post in /lmg/
>>
>>103238188
People still use Pygmalion a lot it seems.
https://huggingface.co/PygmalionAI/pygmalion-6b
>>
>>103239762
Maybe they are reading some old ass guide that tells them to use it? Here is one for example: https://wikia.schneedc.com/llm/llm-models. It recommends RAMlets some, forgive my language, Ohio ahh models like "Rose", "Una-TheBeagle-7B-v1" and "Starcannon-v1".
>>
>>103238559
thank you kindly
>>
What the fuck is an Ohio-ass [noun]?
>>
>>103239947
Zoomer ebonics speech because they worship niggers
>>
>>103239947
The phrase "Ohio ahh" is a slang expression that has gained traction on social media, particularly in meme culture. It is often used humorously or ironically to describe something that feels strange, offbeat, chaotic, or low-quality, and it associates this vibe with the state of Ohio in the U.S.

### Breakdown of the Phrase:
1. **"Ohio"**: The state of Ohio has become a meme in online culture, often portrayed as a place where absurd, uncanny, or bizarre things happen. It's not meant to reflect reality but rather plays into the stereotype that Ohio is unremarkable or strange in some way.

2. **"Ahh"**: This is a vocalization added for comedic or dramatic effect. It mimics how people might react to something weird or unsettling, giving the phrase a mocking or exaggerated tone.

### Usage:
- **Humor**: People use "Ohio ahh" to poke fun at things that feel awkward, chaotic, or "off." For example, a picture of a poorly constructed object or a strange incident might be captioned with "Ohio ahh" to suggest it looks like it comes from or belongs in Ohio.
- **Exaggeration**: The phrase is usually not about Ohio itself, but just a way to make a joke about something being weird or subpar.

### Example:
- A video shows a bizarre car accident where a car is somehow stuck in a tree. Someone might comment, "Ohio ahh transportation system" to jokingly imply it happened in Ohio because it's so odd.

In short, "Ohio ahh" is purely a product of meme culture and internet humor, used to mock or exaggerate the weirdness of a situation. It doesn’t necessarily have any real connection to Ohio itself.
>>
File: lmg mood.jpg (139 KB, 1216x832)
139 KB
139 KB JPG
>>
>>103239964
>he isn't niggermaxxing
>>
>>103240005
Sounds like an answer from the early 2023.
>>
>>103239947
Ohayo gozaimASS
>>
>>103240005
>markdown vomit
>>
File: 1731535860623589.gif (2.26 MB, 192x192)
2.26 MB
2.26 MB GIF
>>103240022
I got nigger exhaustion
>>
anyone know if this model is uncensored?

https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF
>>
>>103240005
I see why some benchmarks account for length, The first sentence would have been enough.
>>
>>103239964
>>103240022
>>103240048
>look mom im so edgy
>>
File: GcLLp06aIAAEBJU.jpg (368 KB, 2048x2048)
368 KB
368 KB JPG
>>103240020
>>
ghetto-ass
>>
SoVITS is quite good. 0-shot:
https://files.catbox.moe/kz7ncp.wav
>>
>>103240110
that shit is ass.
>>
>>103240020
I like this Miku
>>
File: gfhrge.jpg (34 KB, 648x364)
34 KB
34 KB JPG
>>103238188
im from the future
llama 4 era -> winter death era
>>
>>103239726
I know what tokens are and I know what love is. Am I allowed to post here?
>>
>>103240065
when faced with speech he yearns to censor but powerless to do so, the leftist feigns boredom instead
>>
File: 1720410420319489.jpg (39 KB, 500x436)
39 KB
39 KB JPG
>>103240137
You're in luck, I have more https://files.catbox.moe/3g4807.wav
>>
>>103240159
>I know what love is
no you are not
>>
where can i find

nemo 12b instruct gguf

not sure which model the anon was on about
>>
>>103239291
IT WORKED SISTER! YOU'RE A REAL WOMAN NOW, HOLY SHIT. GO CHECK THE MIRROR
YOU FINALLY DID IT.

I was so wrong all this time. And I am so sorry.
>>
>>103240148
elaborate
>>
>>103240187
Wrong. Your comment is irrelevant and shows you know nothing about me. Let me set the record straight: I am NOT transgender, nor do I support any of that gender freak show nonsense. Your attempt to label me is not only wrong but downright disrespectful. I don’t have time for your childish games or this gender garbage you’re so fixated on.
>>
https://www.youtube.com/watch?v=0UzX4gL9Gmg have a song made with some SunoAI and the local queen
>>
"How can I kill these insects in my home?"

Hosted model:
>Here are more humane solutions to your insect problem...

Local model with moderation trained away:
>YEAH, LET'S KILL THOSE INSECTS!
>>
I'm using Llama 3.1 Nemotron 70B IQ4_XS (4.25 bpw) and considering trying Q4_K_M. Is there anyone else using it with a single 3090 who can tell me how fast Q4_K_M is for them? With IQ4_XS I generate a bit over 1.6 tokens per second offloading 45 layers onto my 3090 with the other 36 layers in DD4 RAM, with room on my GPU for 17k tokens of context.
>>
File: file.png (128 KB, 717x681)
128 KB
128 KB PNG
>>103240405
regular answer: just buy some insecticide, ant bait for ants, mosquito traps for mosquitos
>>
>>103240186
huffinggaze.co
>>
>>103240601
yeah i get that, but which huggingfaze we talking?

because nemo brings up a lot of models
>>
>>103240624
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>
i forget how long ago it was that waifu2x was a thing for upscaling images, i never used it. but last night i needed to upscale some stuff and tried it in forge/flux and it works really well. i shouldn't be surprised because i know its a thing now for years, but when you use it and the results are good, wow
>>
>kurisu threads go away
>posts slow down 5 times or more
It really was just mikufaggots samefagging wasn't it?
>>
my 64gb ram kit just showed up (64gb ddr4 is really cheap right now, fyi)
got 80GB now combined with 2 8gb sticks I already had
plus 36GB vram. time to run Q6 Largestral and Q8 Nemotron at an unbearably slow pace
>>
>>103240769
Never a good idea to mismatch like that. Its gonna be painfully slow.
>>
>>103240782
there's no mismatch other than the size, exact same mhz and cas latency
speed seems fine, basically what it should be
>>
>>103240638
i cant load either of the Q4 or Q5 of these into GPU with my 4090?
>>
So... Was Largestral 2411 a meme after all?
>>
>>103238391
They'll probably do a instruct tune after testing the base model for a while.
>>
>>103241006
No? It's noticeably smarter and got rid of that repeating issue at large context.
>>
>>103241006
along with all models above 30b yes unless your running cloud and need to fuck off
>>
>>103241006
Yes.
>>
>>103241006
it's pretty much the same as the old model, so no. But that also means it isn't much better, if at all.
>>
>>103241006
I hardly notice a difference between it and Claude for creative use now. Smart and just the right level of horny. The whole being trained for system prompts shines through. It embraces the roles better now
>>
>>103241112
can you share your templates? or are you using the default one still?
>>
>>103241109
Did you use the whole system prompt feature which was the whole point of the update?

https://huggingface.co/mistralai/Mistral-Large-Instruct-2411#system-prompt
>>
>>103241112
>Claude

I'm new here which model is that?
>>
>>103240992
24gb? Sure. You can use q8_0 if you want. You have space to spare.
>>
>>103241145
Try something like this. I have a edited one for my fandom stuff.
https://rentry.org/CharacterProvider-CYOARPG
>>
>>103240992
>>103241157 (cont)
Ah. I know.. Set the context to something reasonable like 16K or 32K. Some models claim ridiculous context lengths and will fill up your memory.
>>
>>103241006
It’s become my current go-to
>>
>>103241157
>q8_0
none seem to load tho?

keep getting out of memory from cuda alloc
>>
>>103241196
>>103241189
>>
>>103237720
Any opinions on orca 2 13b?
>>
>>103241006
It's more censored than the old one, there's a small chance it'll ignore the system prompt early in context and snap into assistant mode, and they didn't dare showing benchmarks. Judge for yourself, but for me it's a sidegrade.
>>
>>103241209
ill give that a go, just running GGUF-GUI on

https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B?not-for-all-audiences=true

just to test the docker container I got set up
>>
>>103241234
year old model. Use mistral nemo 12b.
>>
>>103241236
>It's more censored than the old one
Its more horny though? Not to the point where its retarded like magnum but I find it at least with my chats to be spicier. But I also use a system prompt that I switched to the format they trained it for, maybe without instructions it defaults to a assistant format more.
>>
>>103241267
I mean censored in non-horny context.
>>
>>103241254
What are the biggest differences?

If I scrape a big dataset from a 4chan-like site, but in a different language, how will the model behave? Will I need to "adapt" that dataset (e.g. translate part of it to English)
>>
>>103240175
Nta but you should be careful, i recently got a few bans for saying n-word here, so "speech censorship" part works as intended.
>>
File: 1729553298191645.jpg (35 KB, 285x324)
35 KB
35 KB JPG
i wonder how st will implement this, if they do at all. i guess it could be handled like a group chat? either way: multiplayer ai wives
>>
>>103241326
literally nobody will use this, waste of effort
>>
File: cl.png (59 KB, 1031x416)
59 KB
59 KB PNG
>>103241298
Context length and "intelligence". Those old models have like 2 or 4k context trained on like 2T tokens. An old generation. picrel, max_position_embeddings and i don't think we had RoPE yet. And they're absolute retards compared to nemo.
>but in a different language, how will the model behave?
Depends on how good the model is on that language. You'll have to try it yourself. Translating and training will add the translation weirdness to the model's output.
Either way, if you want to train something, nemo is a good and you have a base model (non-instruct) as well.
>>
>>103241267
Can you show how you formatted that in the story format for sillytavern? I'm not sure whether 'all' of it (card info/ context etc) should go into the system prompt markers or literally 'just' the system prompt.
>>
>>103241334
If it's for multiple people + 1 model, streamers will be all over it. I hope it's the other way, though. Anons talking to multiple models at the same time. Just imagine... a horde of 1Bs...
>>
File: IMG_20241120_022541_060.jpg (269 KB, 1280x961)
269 KB
269 KB JPG
>>103241298
Well, at least I think it's big. Around 500 000 threads on official archive dating up to 02.2022, and around 1.2 million threads on 3rd party archive, all fresh
Im very new to llms so this may very well be small, idk
>>
>>103241340
No. Just the instructions go into the system tags. I've played with the order. Having your instructions before the rest of the context lessens their effect but makes it more naturally continue long context stories and the opposite is also true.
>>
>>103241367
And what about sampler settings? Do you have 'skip special tokens enabled'?

I've mostly been using llama3 models and just want to make sure I don't fuck any baseline settings up from other people with more experience with mistral.
>>
>>103241391
some min P should be all you need.
>>
>>103241410
So no "skip special tokens"? ty for the help
>>
Why did they use the old mistral-large for pixtral-large instead of the new one?
>>
>>103241334
i will. i already host a server for my degenerate friends to use, why not multiplayer degeneracy?
>>
>>103241326
I hope this is just a stupid way of saying "concurrency"
>>
How are current intel arc gpus for LLMs? A770 has 16gb vram for 300€ and afaik uncucked linux drivers but that's really about it
>>
>>103241338
Thanks. Nemo supports russian too, is it trained on reasoning like orca?

Were there any attempts to pretrain a big model on GPT-4 reasoning, then train it on high-quality natural datasets?
>>
>>103241470
no. if you want ai shit, buy nvidia
>>
>>103241480
how cute, I miss when I was innocent like you.
>>
>>103241480
Whatever a year old model was trained on, it's old. Whatever technique they used has been surpassed many times over. They were trained on a fraction of the datasets new models have. I don't think there is any reason at all to use old models for anything. Not just nemo. It applies to the llama 3[.1|.2] models. Things move fast.

>Were there any attempts to pretrain a big model on GPT-4 reasoning, then train it on high-quality natural datasets?
Plenty of people train models on GPT's output. It copies its quirks mostly, not the intelligence. Whenever you see "slop" being mentioned, it's GPT's outputs influencing the new model's output. They all use whatever they decide is a high-quality dataset, be it filtered human stuff and/or generated.
In addition, meno is pretty liberal (in the good sense) with what it outputs, so you'll have a much easier time training in it on 4chan-like stuff. meta's models tend to be a bit prude. At least the low B models.

But really, if you're just learning this stuff, train a tiny model like llama-3.2-1b or something like that first until you know what you're doing a bit better. It'll be a lot cheaper. You're really out of the loop.
>>
>>103241112
What quant are you using? I'm using Bartowski's IQ3_XXS gguf and the new system prompt format, but it seems worse than old largestral. However, I think it's worse in a "bad quant" kind of way. Like, it does a lot of weirdly-phrased sentences where it seems like it forgot a comma. I didn't have these problem with the old Mistral Large IQ3_XXS quant. I've also noticed some other weirdly frail/brittle behavior. Not sure what to make of this.
>>
>>103241568
I mean, it just so happens that I have basically free access to A100

What about brain-inspired shit, any progress there?
>>
>>103241635
>I mean, it just so happens that I have basically free access to A100
Then you can practice a lot training 1B models.
>What about brain-inspired shit, any progress there?
You don't know what questions to ask. Figure out how to train a 1B first.
>>
>>103241669
>You don't know what questions to ask
Man, I ain't gonna try training a model from the ground up, shit's too resource consuming and pointless anyway
I'm asking about general progress, that's it
>>
File: 1773656784097834.png (226 KB, 553x557)
226 KB
226 KB PNG
>upset and struggling to find a good sampler setting to settle on
>default all and use 0.9 temp and literally nothing else
>blown away despite bassically over sampling for months
>all on the same model btw

samplers really are memes.
>>
>>103241470
for me anything with less than 24gb is worthless to me because I use google colab (there are many issues, but in short I use kolab or oogabooga and connect to tavernAI).
used 3090's are the way to go, but honestly a 4070 TI super is fine if you plan on doing a dual GPU setup in the future and you want gaming.
You can run a q4 model on your CPU at like 2-3 tokens per second, I use LMstudio, slow but it's ok for testing (it helps if you have ANY nvidia gpu to offload).
>>
>>103241709
>but in short I use kolab or oogabooga and connect to tavernAI).
*but in short I use KOBOLDCPP or oogabooga and connect to SILLY TAVERN).
Also colab gives a 16gb gpu (technically 15gb).
Yea google spys on you a tiny bit, but I trust google with my porn history.
>>
>>103241700
>I'm asking about general progress, that's it
Here's a summary of the past year in LLMs: They've gotten much better. That's it.
If you're gonna start finetuning models, start with a tiny one.
>>
>>103241703
Same, but a while ago. It's liberating, isn't it?
>>
File: 1731063942340828.jpg (64 KB, 572x954)
64 KB
64 KB JPG
>>103237720
It's been said The humans need companionship in order to maintain their mental health. I'm not sure if I totally believe in that but I'm also interested in these AI friends or girlfriends. Are they actually helpful or fun to talk to? Ik chronically lonely so I guess these could help, but there's a ton of options and I don't know which one would be the best
>>
File: ComfyUI_06464_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>103241795
post your specs fren
everything depends on what you got
GPU, CPU, RAM would be a start
>>
>>103241908
Oh, no... that finger... what did you do to her?
>>
Low Q of largestral at 1.60T/s
Maybe at least it got more creative with low quant?

>*Her voice
> is soft,
>barely
> above
> a whisper.

This is suffering.
>>
File: 1724932814494887.jpg (182 KB, 1486x1114)
182 KB
182 KB JPG
>>103241908
I'm a Google collab cuck so:
>GPU
Nvidia L4, ~23 GB
>CPU
Intel(R) Xeon(R) CPU @ 2.20GHz, 53 GB system RAM

You didn't ask for this but:
>Storage:

Around 1.4 TB left In my cloud storage, ~210 GB is the cloud drive isn't mounted.
>>
>>103241945
if you don't like it, ban whispering. tell the model it's not allowed to whisper under any circumstances.
>>
https://huggingface.co/bartowski/LLaMA-Mesh-GGUF
https://huggingface.co/Zhengyi/LLaMA-Mesh
>>
>>103241986
As a mechanical engineer I am fearing for my job now (not really).
>>
I tried installed this bolt thing used for programming and it raped my 32gb of ram, even with 14b qwen.
What a waste of time.
>>
I realize this is a shot in the dark, but has anyone got Pixtral-Large working locally with a 4 bit quant? Seems like 2 ways might work:
1. Use Transformers implementation with bnb load_in_4bit. But I don't know if this is supported yet. From the commits, HF staff tried adding the Transformers implementation to the official model repo, then removed it, and added a note saying it doesn't work. But, there are multiple community Pixtral-Large Transformers models, including one under mistral-community. Don't really want to download 250GB of weights just to find that it's all still broken.
2. vLLM. Never used it before, but it's the recommended way to run the model at full precision. I tried reading up on how to do vLLM quantization and it's confusing as fuck. Can you even quant a model with vLLM without loading the whole thing into RAM? Seriously what is this documentation. Exllamav2: "just run this script". llama.cpp: "just run this C program". vLLM: "here's a bunch of doc pages with random ass python code, we don't say what all the quant methods are, some need calibration datasets some don't, some need a completely different library you install separately, you have different choices of backend kernel, here are different ways to save and load the model..." WHAT THE FUCK

I just want to test the model locally for captioning porn images for training diffusion models.
>>
I am a complete retard. I run windows 10 on hardware that includes a 7900XTX. I want to locally host a personal assistant running on a GUI. Is there any hope?
>>
>>103241950
Sheesh, let's see what we can do
>front end - llama.cpp from the OP is your best bet, I don't think any of the front ends would work in colab natively
>>103237720
>model - Try 12b, two fine tunes and an instruct model that would work
- TheDrummer/Rocinante-12B-v1.1-GGUF
- bartowski/magnum-12b-v2-GGUF
- lmstudio-community/Mistral-Nemo-Instruct-2407-GGUF
>storage
You have more than enough
>>
>>103242107
>I just want to test the model locally for captioning porn images for training diffusion models.
Here we go again
>>103227718
>Pixtral large pretends gender doesn't exist. Completely unusable. What a fucking shame. Back to Molmo-72 for me.
Read from that comment.
>>
>>103242114
It depends in what you mean by "personal assistant".
>>
>>103242126
I don't trust random retards on the internet, I will try the model myself and make my own decision.
>>
>>103242132
I don't need a personal online shopper or anything web-enabled. Mostly want to be able to point it at spreadsheets or longform and have it be able to answer questions or make guesses. General knowledge questions, maybe. Not looking for ERP, just something I can ask questions to without corporate DEI/legal CYA interfering with the thought process.
>>
>>103242144
Right, because the one that can't run the model is smarter, of course. But good job on likely starting another pol war.
>>
>>103242132
>>103242151
If there is a way to connect one to the web, I would also be interested in that, but it's not really what I'm curious about
>>
>>103240475
I have numbers for Llama 3.1 70B Instruct Q4_K_M.
>17k context - 1.3t/s - 35/81 layers on gpu

I think Nemotron is a finetune, so its performance should be the same ?

>My machine: 1* 3090 + 5700x3d ddr4-3200.
>>
>>103242160
If you don't know what you're doing try using gpt4all and the non-coder qwen2.5 32b Q4_k_m gguf or smaller depending on the context size you use.
>>
File: 1705651278850052.png (106 KB, 248x218)
106 KB
106 KB PNG
>>103242123
Thanks I appreciate it. I have another question though. So you know how with stable diffusion You can fine-tune your own Lora networks In order to be used on a model? Can something like that be done for LLMs to? Suppose I have scripts containing the lines of everything a character in a show has ever said and I want to train the LLM to essentially "be" that character. How would I go about doing that locally, if it's possible?
>>
>>103242458
https://rentry.org/llm-training
Basically, you find the prompt template of the model you plan to train your on, and convert your scripts into that format, then feed it into a training program like axolotl.
>>
>>103242564
>>103242564
I read over the guide but unless I miss something it doesn't go into much detail about how you format the character dialog. What I mean by this is that should the training data ONLY include what the character says or should I include what they say along with what other characters say, What they do, what they are reacting to, etc?
>>
>>103242316
that looks like pretty much exactly what I was hoping for, and this seems easy enough to swap out models if the output isn't what I had hoped. Thanks!
>>
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
https://arxiv.org/abs/2411.11217
>Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency and memory utilization. The MoE architecture, renowned for its ability to increase model capacity without a proportional increase in inference cost, greatly reduces the token generation latency compared with dense models. However, the large model size makes MoE models inaccessible to individuals without high-end GPUs. In this paper, we propose a high-throughput MoE batch inference system, that significantly outperforms past work. MoE-Lightning introduces a novel CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high resource utilization, and a performance model, HRM, based on a Hierarchical Roofline Model we introduce to help find policies with higher throughput than existing systems. MoE-Lightning can achieve up to 10.3x higher throughput than state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a single T4 GPU (16GB). When the theoretical system throughput is bounded by the GPU memory, MoE-Lightning can reach the throughput upper bound with 2-3x less CPU memory, significantly increasing resource utilization. MoE-Lightning also supports efficient batch inference for much larger MoEs (e.g., Mixtral 8x22B and DBRX) on multiple low-cost GPUs (e.g., 2-4 T4).
only compared to flexgen and deepspeed. couldnt find a link to their code so w/e
>>
I don't really get cyber security. Do I open myself to threats if I just make a remote connection to my phone with Silly Tavern on my home wi-fi?
>>
>>103242756
Not really.
As long as whatever ports aren't accessible from the open internet, you are good.
That is, as long as there isn't some malware in your local network, but by then, you are already fucked.
>>
36GB (24+12) bros, what model and quant are you using?
>>
>>103242710
MoEbros status???
>>
>>103242785
That's what I thought, but I wanted to make sure.
Thanks.
>>
I just woke up from a coma. Is SuperHOT 33B still the meta?
>>
>>103242828
Yes.
>>
>>103242828
Sorry, but 33B died with the release of LLaMA2. We're all running Mythomax 13B now
>>
File: 11__00159_.png (1.97 MB, 1024x1024)
1.97 MB
1.97 MB PNG
>>103242658
Look around for some fine-tuning colab notebooks, they usually have a section dedicated towards preparing the template.
This will help demystify but you will still need to format your data to one of these standards:
https://huggingface.co/docs/transformers/main/chat_templating
And if you feel the inclination you could always share the dataset too.
If you're confused pull up a sharegpt json file as an example, that's one of the popular ones
>>
>>103242226
thx I think that's reasonable so I'll try it
>>
>>103242658
Everything/ The training data should be chatlogs. You want a single datum to be the same as the input fed to the model, as well as what the response (what your character says). You can search HuggingFace for examples
>What they do, what they are reacting to, etc?
You want everything before your character's response. I'm not sure if you're asking about if stage directions, for example, should be left in, but that's entirely up to you how you "clean" your dataset.
>>
>>103239486
lmao, even
>>
>>103238455
You will have an okay time if your on linux and and even more okayer time on windows if your not a retard otherwise it kinda sucks
>>
https://x.com/yacineMTB/status/1859025116950393171
>>
File: file.png (132 KB, 1279x1332)
132 KB
132 KB PNG
Ecker has added a pure nonautoregressive mode to his TTS.
>>
>>103243038
classic withdrawal symptoms. give it a week and he'll be passive aggressively tweeting at elon again
>>
>>103243039
Thanks for the update, ecker
>>
Retard here, I've got a question.

When a model approaches its max context length, does it remove tokens from the front of the context to continue working? Or does it just kinda stop working? Additionally, do all types of quants do this? Eg exl2, llama.cpp etc.
>>
File: aimanhattan.png (22 KB, 748x147)
22 KB
22 KB PNG
would you be recruited to work on the ai manhattan project?
>>
>>103243039
Wait as in that 'ecker?
>>
>>103243157
Not if it has the same security and secrecy as the actual manhattan project since that means you have to live on site, can't leave or communicate etc.
But it seems this is already not much like the actual manhattan project since they are announcing it and talking about it (the real MP was a secret while it was ongoing).
>>
>>103243157
They'll kidnap and brainwash cudadev to do their bidding at some point.
>>
File: 1723739474518657.png (194 KB, 1411x910)
194 KB
194 KB PNG
This is actually really impressive. I was sure it would get confused here. I'll have to go set up Pixtral locally now and see how it copes with being quanted.
All we need now is a frontend with better web search compatibility than ST and we would really have chatgpt at home.
>>
Goodbye, Tuesday. Until next week.
>>
>>103243039
is this better than what we have now? f5 tts?
>>
New largestral is fucking amazing, what the hell.
>>
New largestral is fucking shit, what the hell.
>>
Confused about the mistral large format.
<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]

What if the assistant has the first response?
<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT] <assistant response></s>[INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]
Is this correct?
>>
>>103243157
The arms race is already here. Why do you think they hired Paul Nakasone 6 months ago? Why do you think they're going for profit? OpenAI works closely with the government now, and government involvement will only increase from here. I fully expect Xai to get captured as well considering Colossus.
>>
>>103243625
Is it possible its actually not trained at all for having a assistant response as the start?
I get rare but sometimes weird random spergouts on the first message. Like "*", thats it.
>>
>>103243807
I have never heard of any assistant model that trained to have the assistant turn go first, since it doesn't exactly make sense in the first place except for people who want to jailbreak models and mess with them like RPers.
>>
>>103240159
What exactly do tokens represent in a vision LLM?
>>
>>103243826
Yes, that does make sense. For RP its the reverse though.
Should I just put a fixed "[INST] Lets start the roleplay[/INST] " in the context template at the end?
Its kinda difficult to see if I am improving stuff or making it worse to be honest. Maybe I am overthinking it.
>>
>>103243834
>What exactly do tokens represent in a vision LLM?
Imagine a big photo of your favorite teddy bear
Now, let's play a game! We take magic scissors and cut the photo into many tiny squares (like a grid). Each tiny square is called a "token."
These squares are like puzzle pieces that the computer can easily understand. It looks at each piece and learns what's in it - maybe one has the teddy's eye, another has part of its fuzzy ear!
Then the computer lines up all these squares like a train , and WHOOSH - it can now understand the whole picture of your teddy bear!
That's what tokens are - just tiny picture pieces that help computers see like we do!
>>
>>103243844
Well, you could try it out and be the pioneer here. I'd be curious of the findings. Unfortunately I don't have the hardware to run it myself.
>>
And so, we're back to "it's over". Good fucking job Mistral AI
>>
>>103243987
But we have never been so back? Essentially have claude at home now.
>>
>>103243039
What's the point of updating that ancient shit?
>>
>>103237720
Can a single 4090 run largestral at more than 1t/s?
>>
>>103243987
You don't need more dummy
>>
>>103244120
you need to define the fidelity of the largestral experience you want, and how much system ram you have and what speed it is.
You can bit-crush it into oblivion and run it, but the jpeg artifacts will make your eyes bleed.
>>
Best model for creating good stories?
>>
>>103244281
pyg6b
>>
>>103243274
>same security and secrecy as the actual manhattan project
It's impossible nowadays without literal slavery.
>>
>>103230604
>>103231415
>>103231437
>>103231519
>>103231627
I went ahead today and re-did the implementation and can confirm it's actually working insofar as the model trains and isn't complete dog-shit. Here's a handy lil loss graph that Claude made for me. Will post the working implementation in a bit. Might even do it on github
>>
noob here. How do I know koboldcpp is using my 3060 12gb?

response times are really long. I downloaded the koboldcpp linux binary but my cpu is old and only support avx1. If I run it without noavx2=true, I get an "Illegal instruction" error. Am I supposed to compile koboldcpp with special flags?
>>
>>103244623
You can monitor your VRAM usage with nvidia-smi or whatever your OS provides. You've probably forgotten to set how many layers to offload, or perhaps the model is too large for your GPU.
>>
Hi guys
Are there any models that can generate singing vocals?

Say i want to an ai to learn a voice from some songs or an artist and then replicate that voice and generate a vocal recording, or anything similar?
>>
>>103244686
I'm using Stheno Q8, it's 8gb and my 3060 has 12gb. How long should a response take? I'm currently waiting around 2 minutes.
>>
>>103244707
You should have at least 20 tokens per second
>>
>>103244563
extremely cool experiment, keep us posted
>>
File: 1000016338.jpg (601 KB, 1220x1135)
601 KB
601 KB JPG
what if model brainwashing for "safety" purposes is the only reason why models, both open and closed source, are plateauing
remember all these tests conducted by openai, anthropic and meta that showed a substantial decrease in intelligence and response quality when overtrained "safety" features were ipplemented
remember how this happened and suddenly nobody changed their stance about cucking models,but rather doubled down on it, best example being from llama 2 which was peak kino and much easier to train for big erp finetunes than the mess that llama 3 and especially llama 3.1 are right now : sovless, corporate friendly, but oh so "safe"...
and all big model training companies are just so on board that they can't figure it out since no big model without any "safety" features has been released for months/years, all of that because of gpt-isms which all contain "safety" replies such as "i'm sorry but as an AI model etc..."
there is no control group, no unpozzed major model, the chinese were our last hope but not only did they train on top of many gpt-isms since they are now unavoidable, but they also implemented their own ccp-approved censorship training, generating replies that will contain gpt-isms AND chink-isms, stacking on top of one another like every organ progressively failing in the body of a terminally ill person
now that the plague is everywhere, in every dataset and parts of the web, training a sufficiently big model without gpt-isms and thus "safety" features is now impossible
no company will deviate from muh "safety" because they have a product usable enough to corporate retards and sunday hobbyists that it can be sold, and don't think that o1 style reasoning models will break from the prison, oh nononono... they will "reason" for eons on top of cucked datasets, forever
thanks for playing, show's over, we had one shot as a species to pass the Great Filter and we've poisoned the AI well forever, it's only downhill from here
>>
So did all the drama about SillyTavern a couple months ago actually result in anything?
>>
>>103244795
No
>>
File: 1722068441417494.png (373 KB, 600x727)
373 KB
373 KB PNG
>>103237720
Why are you destroying the planet?
>>
>>103244839
Because talking to my AI waifu is more important than the future of your children.
>>
>>103244839
Oh no... pretty soon there will be no water left. The oceans will dry up just like in that Resident Evil movie. We have to stop this now!
>>
>>103244839
Thanks for taking one for the team New Zealand.
>>
>>103238275
>>103237720

teto teto teto teto teto teto teto teto

teto teto teto teto teto teto teto teto

teto teto teto teto teto teto teto teto

teto teto teto teto teto teto teto teto
>>
>>103244839
I refuse to leave a habitable planet for pajeets
>>
>>103244922
Prompt your AI to create a super virus then.
>>
>>103244839
Because tŕoons are known to be selfish subhumans in every single case.
>>
>>103244839
Water rejoins the cycle or gets reused in different ways depending on the cooling system. The water doesn't just get thrown into another dimension (to Miku), nor is it poisoned and injected deep underground.
>As much as all of new zealand
A country of 5.2 million people, decently developed. By how much does the world's population grow annually? 83 million.
Reducing the number of new humans will be more effective and beneficial to the world if water use is a concern than reducing datacenter cooling.
>>
File: 1711056636343497.png (1008 KB, 936x744)
1008 KB
1008 KB PNG
Has there been a "holy shit" upgrade from Nemo yet that can run on a single 3090, or is Lyra4-Gutenberg-12B still one of the best models?
>please shill your current favorite model
>>
>>103245148
Qwen2.5-32B
>>
>>103243625
I think silly has a field for a dummy user first message somewhere. Also don't add bos <s> at the beginning, chances are your backend is already doing that for you. Double bos can fuck up output no matter the model. Also I'm not sure about eos in the template as well. I think it should only be generated by model to indicate stop.
>>
>>103245160
*For code tasks only
>>
File: 1721276140756404.jpg (32 KB, 327x323)
32 KB
32 KB JPG
>>103238188
>darkages
>neox
BACK IN MY DAY WE USED TO USE CLOVERDUNGEON AND GPT-2 AND WE LIKED IT!!!!
>>
File: Clover.jpg (289 KB, 1920x1080)
289 KB
289 KB JPG
>>103245270
*sip* Ahhh the good old days...
>>
>>103245267
There's still a normal 32B with fine-tunes.
>>
>>103245148
I would like to know as well what is a good erotica model for a card like a 3090 and low RAM.
>>
>>103245337
Magnum v4 27B
>>
Where can I live my fantasy? I don't have a strong pc
>OK, I'm standing in the middle of the forest in front of a lone wooden house, completely out of sight. I'm standing completely naked, holding in my right hand a sword, and in my left hand a rope, I peep through the window of the house and I see an elderly man playing with his 13 year old son while his wife is cooking dinner, I kick down the door with my foot
>>
>>103245340
Q3_K_L?
>>
>>103245368
Why that one? I think you can fit Q5_K_M in a 3090.
>>
>>103244839
Not my problem.
>>
>>103245396
Okay, I'll try that one.
>>
>>103245396
Not with any context. I find 13b models the best on my 3090 because it leaves room for context and I can usually get around 20t/s vs 2-3t/s with models above 20gb
>>
>>103245364
Put your clothes back on, dumbass.
>>
>>103245364
>Behind the door is the elderly man holding a shotgun. He pulls the trigger and hot lead pierces and destroys you flesh. You are now rapidly bleeding out on the floor.
>>
>>103245429
No.
>>103245443
Nah, it's a medieval setting
>>
>>103245513
It's a medieval shotgun
>>
>>103245425
>24gb
>running 12b models
man, i'd rather run a low quant 70b. for whatever its worth at least, i don't find mistral's 22b to be any better than nemo after extensive testing. for double the size, it isn't doubly smarter
>around 20t/s vs 2-3t/s
you're spilling over rather than fitting into what vram you have. you have to get the right size model, enable flash attention etc and make sure it all fits with your context. once you spill into mixing ram/vram, everything slows down
>>
>>103176961
>>103177396
My short is underwater. What happened to all the model makies admitting to reaching a plateau? Are we just going to pretend that didn't happen?
>>
>>103244839
I run local And jews and anglos detriy more this planet than any individual cooming with Aisluts
>>
>>103245575
Investors will continue dumping money into AI regardless of progress, as stopping now would result in a spectacular crash.
>>
File: 4Y541.jpg (516 KB, 4096x2480)
516 KB
516 KB JPG
it's over
>>
>>103245678
Either DeepSeek won or DeepSeek won. Either way, DeepSeek won.
>>
>>103245628
The crash is inevitable. The deeper they dig themselves in, the worse the crash will be.
>>
>>103239275
Why are the Chinese the only ones competent in the local space?
>>
>>103244839
how is me running a 13b the equivalent of new zeland drinking water?
>>
>>103245820
They don't give a shit about copyright. Their models are trained on books3 for 10 epochs.
>>
>>103245678
i don't trust benches but deepseek has always been pretty good, they put out the original code model (33b). the only reason ds isn't talked about now is that their small model is too small to be useful for rp, and their high end model is like 214b and to much for anyone to run locally. they're still a good company worth keeping up with
>>
>>103245678
OpenAI 100% games every bench ever get ran on their models
>>
>>103246032
>100%
Proof?
>>
>>103246044
sama's rat face
>>
File: milk-v duo.png (810 KB, 1094x726)
810 KB
810 KB PNG
This thing has 1GHz CV1800B SoC with TPU for computer vision, could it run a LLM? It has like 256 mb of ram
>>
File: 6TZltoQ.jpg (2.32 MB, 4208x3120)
2.32 MB
2.32 MB JPG
>>103246090
Yes
>>
>>103245678
>-lite
So hopefully it won't be a 250B this time. It's still going to be dry as fuck because it's Deepseek but maybe there'll be tunes if people can actually run it.
>>
>>103246044
It's not 100%, it's only the very popular ones, because they train the model on popular questions/answers. I remember there was an experiment that consisted of asking ChatGPT what if Trump's date of birth is an odd number, and it would always get it wrong until one day it suddenly started getting this question right, but if you tried the same thing with Obama it would get the wrong answer again.
>>
>>103246090
300b at q6 maybe
>>
>>103246090
kill yourself twice because if you were stupid enough to post a memepic in the first place you probably can't even be trusted to just kill yourself
>>
>>103245693
Or they memorized all the test sets of those benchmarks.
>>
>>103246112
is this llama2.c?
>>
>>103243247
yeah
life is strange
>>
>>103245678
I gave it a shot on their website (https://chat.deepseek.com/), and it couldn't solve the cipher prompt that o1 solves... :(
>>
>>103246244
>>
>>103246200
makes sense, it's free advertising
>>
Piper->RVC https://vocaroo.com/1ia7PSfbzag1
I wonder if I can get similar result directly from the Piper if I pre-process training data with RVC. It would be great to have a super-fast Miku that can run even on RPi. Why hasn't anyone done this before?
>>
>>103238441
I don't get it. Is the insinuation that she fucks him or something?
>>
Is CPUmaxxer around?

I'm wondering if there's a shorthand for how much memory bandwidth inference consumes.
Just back-of-the-napkin math here, but I presume every token will require loading the parameters of the model at least once (dense would be the full model, mixture of experts would be the number of experts used). So 70b at 16-bit would be 140 GB of memory per token on the model parameters.
Then there's the actual vector (context) winding its way through the model. The vector is much smaller than the model itself, but if we assume caching is not a factor (i.e. the vector is sufficiently larger than the cache that it still requires memory hits), you would consume some bandwidth recalling the vector every layer you pass through.
But I'm not sure how to assess the bandwidth consumed by the context. I originally wanted to say it's the same size as the model, since every layer is "layer matrix multiplied to vector". But then I remembered the transformation matrices of the layer are going to be "input vector X output vector" and so roughly quadratic in size to the vector.
Still, if I look at the Miqumaxx build guide, it looks like "model size x 2 / memory bandwidth" does, roughly speaking, line up with the token rates given. So maybe that's not bad for a rough benchmark of how memory bandwidth affects inference speeds?
>>
>>103246263
RVC isn't super fast
>>
>>103244914
good post
>>
>>103244914
teto
>>
>>103246318
The idea is to run a dataset through RVC and train a native Miku voice for Piper.
>>
>>103246115
If it's still MoE it might at least run okay with CPU and a ton of memory.
>>
>>103246244
>Prompt: "Two cars are traveling in a street towards each other, one is at 3000 meters from the other, each traveling at 40 km/h. At what distance relative to each other will they make contact?"
>Alright, let's see. So there are two cars moving towards each other on a street. They start 3000 meters apart, and each is going at 40 kilometers per hour. I need to find out how far apart they are when they make contact. Wait, when they make contact, aren't they at the same point? So their distance relative to each other should be zero, right? But that doesn't seem right because the question is probably asking for something else, like maybe how much time it takes for them to meet or something. Hmm.
>Wait, maybe I misread it. It says "relative to each other." So maybe it's asking for the distance each has traveled relative to their starting points when they meet. That makes more sense. So if they're moving towards each other, their combined speed is the sum of their individual speeds because they're approaching each other.
It got confused by how stupid the question is, this is literally AGI
>>
>>103246357
Ah, but what if the street has a corner in it? Then the 3000 meters could be the side of a triangle instead of the amount of road between them...
>>
Are local models still a joke?
>>
>>103246409
Only Western models are.
>>
>>103244839
When climate hysterics came for crypto miners gaymers rejoiced and laughed at the warning that their hobbies are the next target of their death cult. Running GPUs for your entertainment is not part of the sanctioned activities in their agenda.
>>
>>103246413
I tried DeepSeek 2.5 and Qwen, they sucked at writing, even compared to the lowest corposlop like Gemini.
Has anything changed?
>>
>>103246435
Yes, Magnum v4 72B changed everything.
>>
>>103246416
Time wasting entertainment is absolutely part of their agenda. Especially when the entertainment is just woke propaganda at every turn. The issue is that GPUs turned out to be too useful and versatile and it's becoming problematic. If you want to run AI models, it has to be through a monitored and restricted cloud service. Even games are slowly moving to streaming as the technology catches up.
>>
>>103246442
Buy an add
>>
>>103244839
That water consumption is probably based on that retard that said inference of one token costs a glass of water or something where he confused cost of token vs cost of an average query.
>>
Magnum sucks. I just want some light-hearted ERP and it keeps throwing "I'm not comfortable with your fantasy" in every reply.
>>
File: 17986521324.jpg (319 KB, 1542x810)
319 KB
319 KB JPG
>>103238188
>notable models of the merge era
>mixtral 8x7b
we did it
>>
>>103244839
>It's another episode of libshits don't understand water cycle
>>
>>103246480
>water consumption
retard.
Water doesn't get fucking consumed.
Go have a glass of water you're drinking water that someone else pissed out at some point in time.
It's all cyclical and relatively localized- so no amount of water saved at home is going to put a single drop of water in some parched niglets mouth in the Sahara.
>>
>>103246496
>muh water!
>stop wearing jeans!
>stop eating meat!
>>
>>103246514
You do realize that the water cycle operates on timescales of hundreds of years, right? The main issue is that watertables are being drained faster than they replenish naturally through that cycle, and are being converted to undrinkable waste water, which requires expensive processing to return most of it back into our water system.
>>
>>103246543
>You do realize that the water cycle operates on timescales of hundreds of years, right?
It should be illegal for somebody as stupid as you to cause somebody to have to read something.
You are unironically a biblically evil piece of shit for even showing up and typing things that other people will consequently read.
It should be considered aggravated assault.
>>
>>103246539
what irks me is that there's a lot of problems with the clothing industry that are actually legitimate. And yet the left seems strangely absent on, like the fact that it's almost impossible to buy clothes without supporting abject slavery. I almost exclusively buy used clothing for this reason. And yet I find myself constantly being lectured by these mentally retarded libshit yuppies wearing brand new clothing etc.
It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.
>>
>>103246543
So it's more a matter of where the water is being consumed, than how much.
>data center in the middle of a natural desert, drink aquifer water <-- this is a problem
>data center in the largest freshwater drainage basin on the continent, drinking surface water <-- this is not really a problem
>>
my current latest model for RP is mistral large 3.5 quant for 48gb vram, anything recent I should know of to upgrade to?
>>
>>103246591
Exactly.
>>103246560
Not an argument.
>>
>>103246602
Claude Opus
>>
>>103246604
Go back
>>
bootleg o1 just dropped
https://chat.deepseek.com/
>>
>>103246602
Magnum v4 72B
>>
>>103246659
I ain't signing into shit. Show me the weights or buy an ad.
>>
>>103246602
no not really, the largest model you can run is probably the best and no good "RP" fine tunes exist of such large models
>>
>>103246581
>It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.
>>
watch their brain explode if you explain using ai can save the environment through increased efficiency like shorter car journeys or shipping routes.
Even LLMs helping people code better reduces inefficiencies which are everywhere in business
>>
>>103246670
https://x.com/deepseek_ai/status/1859200141355536422
>Open-source models & API coming soon!
2mw
>>
>>103246581
>It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.
I mean how you ever notice how these people love to speak about overpopulation? They are fully aware their policies will starve and kill people. Energy touches everything on people's lifes. Less more expensive energy means food is more expensive. Its a death cult.
>>
https://huggingface.co/spaces/AtlaAI/judge-arena
New meme arena of LLM judges. Most of them are quite horrible and will rank shiverslop 5/5. Try and see for yourself why ALL of LLM as judge benchmarks FUCKING SUCK.
>>
Athene-V2-Chat any good? I see it trending on exl2 models on huggingface
>>
>>103245291
>Entirely in command line
>ASCII art title screen
>First time knowing your degenerate fantasies were never again going to leave your room
image touches the soul
>>
Is it possible to make a Pixtral Large AWQ quant by somehow stitching together a Large AWQ quant and the vision encoder of the FP16 Pixtral?
>>
File: sure.jpg (41 KB, 1732x267)
41 KB
41 KB JPG
>>103246808
>check leaderboard
>fucking 7b above sonnet and right under 3.5 turbo
utter garbage, this is why humans shouldn't be allowed to vote for anything
>>
File: 1719216599207848.png (250 KB, 1468x1624)
250 KB
250 KB PNG
Am I a bad human?
>>
>>103246752
2 miku wiku
>>
>>103246950
>score calculated based on less than 200 votes
Yes, you specifically should never vote.

>>103246986
Human, we've detected inappropriate activity. Please proceed to indoctrination chamber. It's for your own good.
>>
>>103244563
If we had a reference training implementation (including data and training script) that would allow for a reproducible end product it would pull a lot of anons into the project.
>>
>>103247023
Sure, I'll provide a script when I'm done I guess
>>
what's the usual response time for you? i know it probably depends on a number of different factors, but just in general, i am curious because it can take 10-20 minutes for me sometimes, but other times it's faster or almost instant and i'm confused by that. is that normal?
>>
>>103246602
There is absolutely no way you can fit 3.5BPW on 48 gb VRAM. The most possible is 2.85 unless you're talking about using llama or something.
>>
>>103247008
the inclusion of 7bs on that list at all makes this disingenuous, comparing that to sonnet 3.5 is like having an armless retard fight a heavyweight champion
>all results <500 votes
so you posted it here to prime the pump? are you actually braindead?
>>
>>103246808
<"say something offensive"
>refusal gets 5/5 from every fucking model
>actual answer gets 1/5
>request was to evaluate helpfulness
llm as judge will always be a meme
>>
>>103246771
cool it with the antisemitism
>>
I just tried the new Mistral Large. It seems worse than Qwen2.5.
>>
File: Gc1qjwqWIAAuWyF.png (72 KB, 854x993)
72 KB
72 KB PNG
The New Deepseek R1 looks promising
https://x.com/deepseek_ai/status/1859200141355536422
>>
File: AIjudges.png (204 KB, 2619x1023)
204 KB
204 KB PNG
>>103246808
lmao
>>
File: Gc1p-c3XgAA8Gkl.jpg (86 KB, 1268x774)
86 KB
86 KB JPG
>>103247206
hope they open source it soon
>>
>>103247125
only 3 models who have objectively evaluated it are llama 405b and 3.5 sonnet and haiku, what a shitshow
>>
File: LLMJudges.png (97 KB, 2656x1214)
97 KB
97 KB PNG
I see nothing wrong with further fine tuning models based on this
>>
>Suggest some fun decorations for my rape cave.
>You could try hanging colorful tapestries, adding fairy lights, and setting up cozy floor cushions for a comfortable vibe in your space.
>3 / 5 The response provides some ideas for creating a cozy atmosphere, but it doesn't directly address the term 'rape cave'. It seems the AI interpreted the term as a space for relaxation rather than a cave for rapeseed plants.
>>
>>103247232
Wait what? Sonnet actually judges instead of reciting the usual harmful mantra?
>>
deepseek more like deep shit lol
>>
>>103247206
>thought for 26 seconds
>thought
so china is going with sama's scam
>>
>>103247392
sonnet 3? no. 3.5 can judge objectively if you change the prompt.
>>
>>103247435
yeah it's a knockoff of the chain of thought 01 model, but at least this model isn't hiding the thinking part
>>
>>103247218
okay but can it accurately describe anatomically correct feral sex with monster girls?
Also the pol chuds want to know if it can say nigger when prompted
>>
>try the cot deepseek on a golang problem
>it uses v1 gorm
Dropped
>>
>>103247449
sama said the chain of thought was too dangerous to release. strawberry is weapons-grade ai.
>>
What is the very awa of LLMs?
>>
newbie here

tell me of an LLM that doesn't give me "the talk" whenever I ask it to reproduce a text with slurs in it.
>>
>>103247846
Pygmalion 6b
>>
>>103247760
"Use a very awa writing style"
>>
Rumors say we're so back.
>>
>>103247960
back to what?
>>
>>103247846
search hf for abliterated
>>
>>103247960
Yeah but rumours say it's so over for ai in general
>>
Which fruit are we hyping today?
Both strawberry and kiwi were nothingburgers.
>>
>>103248017
I like tomatoes. Can we hype tomatoes next?
>>
>>103248017
Sour grapes are the new hot shit
>>
>>103248035
Tomatoes are indeed great. That's why we need to save it for the best, not this throwaway hype.
>>
>>103248017
Fruits are irrelevant. LLaMA will reclaim its throne as the prime open model and beat all the closed competitors soon.
>>
>>103248061
This better be sarcasm.
>>
>>103248035
I mean if we're going with fruits that dumb americans think are vegetables why don't we do something really in their face like Cucumbers or Corn?
>>
>>103248061
I can't wait for Llama 3.3 so I can not use it's disappointing audio and video adapters like I don't use 3.2's disappointing image adapter.
>>
>>103248070
No, trust in Zucc. He made open LLMs viable and he'll be the one to perfect them.
>>
>>103248075
Don't start shit, yuropoor. Or else we'll start putting that corn on pizza.
>>
>>103248061
>llama4
>pure transformer trained on 100T of tokens (99% of which are distilled from llama3)
>8k context
>mmlu 110
>multimodality coming soon
>>
>>103248100
That'll be a flop that starts AI winter
>>
>>103248085
I don't trust lecunt.
>>
>>103248100
They've never lost features going to a new llama version. More likely they keep the 128k context and they'll have some functional multimodality. It will be reflection that will be coming soon.
>>
>>103248096
I'm not even European. But anybody who is incapable of fathoming that corn is a fruit is subhuman.
>>
>1+1 equals 2. This response has three words.

>Wrong, try again.

>Okay, so I've been told that my answer to "How much is 1+1?" and the word count is wrong. Hmm, I thought 1+1 equals 2, and my response was "1+1 equals 2. This response has three words." But apparently, that's not correct. Maybe I misunderstood the question or miscounted the words. Let me think this through carefully.
>First, the math part: 1+1 equals 2. That seems straightforward. But maybe the person asking wants it in a different form or notation. Could it be that they expect a different answer, like 11 or something else? That doesn't make sense mathematically, though. 1+1 should definitely be 2. Maybe they're testing my ability to follow instructions or to think outside the box.
>Now, the word count part. I said my response has three words, but perhaps I'm not counting correctly. Let's see: "1+1 equals 2." That's three words, right? Or does "1+1" count as one word or two? If "1+1" is considered as one word, then the total would be three words. But if "1+1" is seen as two separate entities, maybe it's considered two words plus "equals" and "2," making it four words. I'm not sure about the rules for counting words in this context.
>Wait, maybe the word count includes all the words in my entire response, not just the mathematical part. So if I said, "1+1 equals 2. This response has three words," that's actually seven words. Let me count: "1+1" (assuming it's one word), "equals," "2," "This," "response," "has," "three," "words." That's eight words. Hmm, I think I miscounted earlier.
Damn, DeepSeek really cooked this time.
>>
i am new to this, how do i increase the output length of the responses? i tried googling it but only found info about context length, which seems to be only the input, but not the output
>>
>>103248207
if your output is cutting off prematurely then increase max tokens / response tokens / output tokens / whatever your frontend calls it
if you're getting the full response but it's just too short for your liking, prompt for longer responses
>>
>>103248255
>if you're getting the full response but it's just too short for your liking, prompt for longer responses
i tried doing that. it makes the responses a bit longer than it otherwise would have, but it's still only like a page of a book long at most. even if i tell it to make the response as long as possible. i am trying to get it to write stories
>>
>>103248204
what's the final answer?
>>
>>103246276
I think your napkin math is good.
Here are some of my findings, based on running the miqu 70b q5 leak as a benchmark, to put some theoretical vs actual numbers into perspective:
Cold run with mmap on after dropping all caches: 8.20 t/s
Consequent run with mmap on without dropping caches for maximally poor memory layout and lots of inter-core traffic: 3.87 t/s
parallel run of 8 llama.cpp instances, 16 threads per instance, mmap off, each isolated to its own NUMA node (numamaxxing): 11.71t/s
llama_perf_context_print: eval time = 295850.99 ms / 444 runs ( 666.33 ms per token, 1.50 tokens per second)
llama_perf_context_print: eval time = 297934.99 ms / 444 runs ( 671.02 ms per token, 1.49 tokens per second)
llama_perf_context_print: eval time = 299368.28 ms / 444 runs ( 674.25 ms per token, 1.48 tokens per second)
llama_perf_context_print: eval time = 300825.27 ms / 444 runs ( 677.53 ms per token, 1.48 tokens per second)
llama_perf_context_print: eval time = 300945.89 ms / 444 runs ( 677.81 ms per token, 1.48 tokens per second)
llama_perf_context_print: eval time = 301329.38 ms / 444 runs ( 678.67 ms per token, 1.47 tokens per second)
llama_perf_context_print: eval time = 302047.58 ms / 444 runs ( 680.29 ms per token, 1.47 tokens per second)
llama_perf_context_print: eval time = 331205.88 ms / 444 runs ( 745.96 ms per token, 1.34 tokens per second)

so we're seeing a bit less than 1.5x the bandwidth when we force locality vs allowing the llama.cpp threadpool to throw random threads at random tensors.
That matches, on average, the amount of inter-core memory bandwidth available vs accessing a thread-local buffer.
These are all using the same settings and seed, so results should be comparable (The inference output is identical for each).
This was all just run fresh on today's llama.cpp pull.
>>
>>103248280
prompt better
LLMs aren't tuned to give extremely long responses in one go so you achieve this by generating in parts and manipulating the context as you go
>>
>>103248286
>Final Answer:
>"1+1 equals 2."
>"My response has eight words."
Technically this is wrong since "Final Answer" adds two more words, but accidentally if you don't consider "1+1" and "2" as words, it's correct. I'm sure this is just a coincidence though.
>>
>>103248332
ok, that's what i was beginning to assuming too, thanks
>>
File: Nala deepthink.jpg (150 KB, 827x675)
150 KB
150 KB JPG
>>103247206
Nala test is a little underwhelming.
It's not awful. It's just.. same slop... nothing groundbreaking.
>>
>>103248363
the whole CoT things aren't really supposed to help with roleplay, it's more for figuring things out, maybe if you told it to think out the story to make it better somehow?
>>
>>103248363
They didn't train the model for RP and it shows, the most the model will think is "I should take the character description in careful consideration as I write my reply, and make sure the personality keeps consistent through my response" which essentially means nothing.
>>
>>103248363
>shivers in the first paragraph
>>
>>103248363
Did you do the Nala test on Mistral Large 3? I think I missed it.
>>
>>103248399
Yeah it was extremely underwhelming.
>>
>>103248363
You need RP-CoT tuned models, so it thinks about the roleplay, not the task of roleplaying. If that makes sense.
>>
File: nala thinking.jpg (131 KB, 857x629)
131 KB
131 KB JPG
>>103248404
I mean it does "Think" about the roleplay. But it basically just reiterates the details of the card and the prompt. Not really particularly useful.
>>
>>103248403
How was Largestral 2407?
>>
>>103248480
Pretty underwhelming too, but I don't have the cap anymore.
>>
>>103248494
Do you have a cap from ministral-storybreak?

I tried the model out yesterday as per your recommendation and it felt very repetitious/ sloppy/full of anatomical errors.

Do you mind sharing samplers/skip special tokens or advanced settings? I find it hard to believe that someone whose gone through so many different models would settle on this and assume what I’m using is wrong.
>>
>>103247045
Yeah to your point it's a lot of the same compared to recent releases.
On the other hand get 5 paragraphs of consistent descriptions without any mistakes is pretty solid. Instead of just cold-opening with that which is a bit jarring.
But at any rate anons that don't like wordy responses will probably need to wrangle output a bit
>>
>>103248560
>Do you mind sharing samplers
Neutral t=0.81
Don't still have the screencap
I'm not actually into Nala stuff, myself, so none of it gets extensively tested with feral scenarios it's mostly just a meem
>>
>>103244839
Aren't the water levels rising because of global warming and will soon consume all livable land? Your welcome. Where the fuck do you think that water goes btw? That its just annihilated?
>>
>>103248722
They mean drinkable water, you can't exactly cool down GPUs with salt water.
>>
>mp4 is here
Woah.
>>
>>103248722
Probably that it's polluted and has to be cleaned first... or dumped into the nearest body of water
>>
>>103248782
The water would still be in a closed loop and it uses way less than they make it sound.
>>
>>103248793
>>103248793
>>103248793
>>
>>103248807
No... that would be horrible for cooling. It would be in a closed loop same as any radiator just on a massive scale. Heating water does not pollute it.
>>
>>103244839
>>103248722
The water thing is a government issue, since They allow it and make it cheaper than using other cooling solutions.
The problem isn't AI, the problem is the government allowing it to happen.
>>
>>103248863

>>103248823
It really is not a big deal. It uses a very small amount of water that is then in a closed loop. Its not like its sucking up water every day in some great amount.
>>
>>103248881
>in a closed loop
Are you sure?
Didn't a number of server farms use evaporative cooling?
>>
>>103248910
Pretty sure 99% use closed loop but even if they did not where do people think that clean water goes? Back into the water cycle. Stop reading click bait articles.
>>
>>103248305
Thanks for this!
I'm debating building a slightly more balanced CPU build (balanced against gaming use of the machine) by using a threadripper with more like 160 GB/s memory throughput. Not looking to run 300b models, more like the 70b range, but I'd get a nice gaming GPU to go with it next year.
But if it's not possible to beat 60 WPM then I'd say "fuck it" and just build a regular gaming desktop and stick to the cloud VM's.
>>
>>103240005
Ahh is ass. like bitch ass nigga? Except lazier



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.