[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Happy Monday Edition

Previous threads: >>102505481 & >>102493018

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: recap-102505481.png (3.27 MB, 1805x8006)
3.27 MB
3.27 MB PNG
►Recent Highlights from the Previous Thread: >>102505481

--Papers: >>102512985 >>102513202
--Llama.cpp and exllama logits viewer script:
>102509400 >102509589 >102509688
--Moshi by Kyutai Labs: Fast TTS with LLM and speech encoder:
>102505755 >102506102
--Method to download from Hugging Face without bloat:
>102510483 >102510567 >102510695 >102510747 >102510752 >102510932 >102511575 >102510595
--Llama 3.1 70b struggles with spelling and letter counting tasks:
>102511108 >102511193 >102511262 >102511320
--Seeking a replacement for ChatGPT4 to translate NSFW Japanese content:
>102507345 >102507368 >102507389 >102507482 >102507555 >102507587 >102507650 >102507711 >102507834 >102507861 >102507914
--Mistral, Nemo, and Qwen2.5 compared:
>102506988 >102507001 >102508375 >102508813 >102508541 >102507015 >102507116
--Suggestions for managing ST updates and merging changes:
>102507662 >102507673 >102508592
--Qwen performs poorly on trivia questions compared to Mistral Large:
>102506371 >102506547 >102506577 >102506637 >102506729 >102506816
--Qwen overcomes AI bias and feels human-like:
>102508397 >102508721
--Node-based LLM workflow prototyping tools on GitHub:
>102510927 >102510986
--Local aidungeon equivalent for text generation dungeon crawling:
>102507856 >102507998 >102510410 >102510515 >102510712 >102510899 >102510553 >102510602 >102510667
--Exl2 and VRAM upgrades:
>102510275 >102510328 >102510352 >102510436 >102510482 >102510508 >102510666 >102510696 >102510492
--Concern about higher core count impact on 70b model performance:
>102506431 >102506474 >102509225
--4chan scrubs JSON data from posts:
>102507802 >102507818 >102507829 >102507830 >102508632
--Miku (free space): >>102506056 >>102506768 >>102509995 >>102510410 >>102511919

►Recent Highlight Posts from the Previous Thread: >>102505496
>>
>>102513911
looks better. love you recap anon
>>
>>102513840
damn, that's fast. perhaps usable as draft for speculative decoding or some basic llm stuff
wonder if llamafile or ik support qwen 2.5. those repos are highly optimized for cpu inference
>>
>>102513911
just use >> instead of >
>>
>Hermes 405 generates a good smut story for me
>hallucinates a Patreon donation request at the end
kek
I still find stuff like that cute after 4 years
I'd pay you if I could, my friend
>>
>>102514016
you're very low info
>>
https://poal.me/t0ytku
>What will we get first in llama.cpp?
>Jamba
or
>DRY
>>
>>102513911
we need violentmonkey script or sth like that to deal with references. but I guess if the number of refs per post is limited, then why not simply splitting recap into multiple posts???
>>
Just picked up muh 4xv100 GPU from the post office.
Time to ogle.
>>
>>102514102
Not him but was also thinking a script. Possibly just a modification of 4chanx, or you make the script execute first to transform the single > before numbers into double.
>>
4chan servers stopped being able to handle too many backlinks...
>>
>>102514089
Just use koboldcpp if you want DRY so bad.
>>
>>102514247
you can get DRY in ooba by converting a model to llamacpp_hf format too
>>
File: 1725496133951475.png (49 KB, 1533x268)
49 KB
49 KB PNG
>>102514247
>over 1 million changes just to add a Python HTTP server and an HTML page...
>>
>>102514262
ooba is shit, not using it.
>>102514276
Then do without DRY.
>>
>>102514287
don't care what you use, faggot
just correcting you on koboldcpp being the only way
>>
>>102514294
I didn't say it was the only way, I just said to use it if you want DRY. There's a difference. I would never suggest ooba.
>>
File: brian.png (26 KB, 91x111)
26 KB
26 KB PNG
>using hermes 70b on a story-heavy chat with the first couple replies generated by mini-magnum
>surprising amount of soul
>check templates, left the Pygmalion instruct template on somehow

At this point I think if you have enough parameters you can just shock the model out of slopspace
>>
>>102514220
yep, seems like the simplest solution, replace all > with doubles (or other char so legit greentexts aren't screwed), run before the page is loaded.
doesn't even need to be violentmonkey , many browsers support JS directly from the address bar or bookmarks. 4chanx should do the trick too.
>>
File: 1667069008109661.jpg (546 KB, 1174x1250)
546 KB
546 KB JPG
>>102513868
Total and utter newbie coming through!

I have managed to install OoBaBooga and I'm running a model successfully. One thing I need help with is understanding what I need to do to be able to upload a PDF or a .xtx file for the model to draw information from. I plan to do solo roleplaying and I want my storytelling collaborator to analyze rpg lore books so that it can draw from such info during the process.

Do I really need an LLM that is "multimodal" to be able to do this? I don't really care about images or vids or audio at this point, only the ability to throw PDF's into it.

Any help would be much appreciated.
>>
>>102514495
*.txt file

My current assistant tells me that models should be able to handle those file formats without having to use an explicitly "multimodal" model with all the bells and whistles.
>>
>>102514495
>PDF
convert it to text
>rpg lore books
but you will run into the problem of the context window being too small. some have larger context windows (16k/32k) but then you'll need to have enough memory to handle it. at that size + model you're in the cpu+ram inferencing bucket.
>>
>>102514541
I see. That helps me understand. Cheers. Perhaps I could do it while using that Kobold thingy where you outsource the whole operation to some frenly fren who's running big models?
>>
>>102514563
AI Horde is what I meant.
>>
>suggest going to private quarters to engage in night battles
>mistral small says it can't wait and engages right there, while we're still outside, although no one is there so we probably won't be seen anyway
So this is what enterprise resource planning with a mistral is like.
>>
>>102514541
>you're in the cpu+ram inferencing bucket.

This means that the task will spill over to the cpu, right, possibly grinding my whole rig to a halt?
>>
>>102514589
>spill
no you will just hit the VRAM limit and crash. I'm of course assuming you don't have an 8xp40 or 6x3090 setup. there are complicated ways to try to get around this kind of stuff but nothing really plug and play (rag, finetune, spec decoding). your best bet is to use claude or gpt4 to pull stuff from the pdfs (think they handle the converting to text for you) then use your local model as a collaborator using whatever the cloud model pulls as part of your prompts
>>
https://wandb.ai/doctorshotgun/72b-magnum-fft/runs/itpmbj25/overview
https://wandb.ai/doctorshotgun/32b-magnum-fft/runs/ms4oynlz/overview
Qwen2.5 finetunes soon...
>>
>Ever played with a Chozo pleasure probe before?
Kek wtf. Can't believe I'm actually having fun with LLMs again. And they say trivia knowledge doesn't matter.
>>
>>102514681
Where's the pleasure probe mentioned in the canon?
>>
>>102514738
Same place where Samus remodeled anon's shithole.
>>
Haven't been able to log into openAI since saturday, goddamn really
>>
>>102514762
Works on my machine. Although it has been kind of flaky recently. Another reason why the world needs local.
>>
>>102514658
I want to believe but 2.5 is just SO slopped (yes, even base, I tried it) that I'm sceptical anything can be done with finetuning

they excluded too much of the stuff we need from the pretraining dataset
>>
File: disappointment.png (1005 KB, 917x898)
1005 KB
1005 KB PNG
>>102514243
>4chan servers
mfw that's all client-side...
this last 4chan server code update has been a shitshow
>>
It is probably obvious to anon here, but I wanted to double check.

If I am using llama-server the openAI client is going to load my model every time and I should be passing json into it and not using the API like every example says.
>>
>>102514781
>sceptical
>sceptical is predominantly used in British English (used in UK/AU/NZ) ( en-GB )
hi petra
>>
>>102514829
meds
>>
File: 1726240003918620.png (198 KB, 1079x1088)
198 KB
198 KB PNG
>>102514762
Just be patient and you'll have your little toys soon
>>
>>102514658
>finetuning on the instruct version
>>
>>102514873
Can't argue with results.
>>
>>102514873
That's all they can do given their limited datasets and money.
>>
>>102514619
Good suggestion. Thanks again.
>>
>>102514844
What a pompous faggot
He's personally done fuck all for the SOTA
if anything he's probably been more of a hindrance with his political game of thrones takeover bullshit
guaranteed he's cost humanity in the long run with the way he subverted OAI from their original mission
I honestly can't believe anyone trusts any textbox that he's associated with enough to input data into it. I wouldn't trust him with my fucking grocery list
>>
>>102514884
At least fucking abliterate it first or something.

>>102514894
>limited datasets
Last time they didn't even bother properly screening for refusals. THEY TUNED ON FUCKING REFUSALS, that wastes compute and makes model more dumber and cucked at the same time.
>>
>>102514917
It's funny seeing this hate here of all places, considering how loved Meta is. Altman is just this generations's Zuckerberg. Give it a couple years and they will make a movie out of the takeover bullshit.
>>
>>102514939
The takeover was more about the safety gang, though. No one is going to make a movie that makes them look bad.
>>
>>102514873
with qwen2.5 the base isn't really any less safetyized
they bragged about how filtered the pretraining dataset was
>>
>>102514917
I don't like him that much but if he hadn't "subverted them from their original mission" we'd never have even seen GPT4
with Sutskever gang in charge they'd have gone pure research and never shared anything with the plebs
>>
>Perfect for Mistral Large 2 at 16bit and Llama-3.1 405B at 8bit
https://www.ebay.ca/itm/305716210884
Who's got a bitcoin horde to blow?
>>
>>102514658
Whatever they use to make those 'magnum' models must be shit, because every one I've tried sucks. So I'm not optimistic.
>>
>>102514987
huh never saw an mi300 series anything in the wild before
would be very wary... I know the mi200 series has all the good shit supported (and is basically the only amd family that really does) but knowing amd I would not trust for a second that they equally support their new gen hardware - the actual customers for this shit all have their own engineers making custom kernels so there's not a rush to ensure they work out of the box for regular use
>>
>>102514987
lol pure comedy in that auction description. it reads like /lmg/ copypasta
>>
File: todd smile 1.jpg (18 KB, 223x286)
18 KB
18 KB JPG
>>102514987
>FOUR TIMES THE POWER SUPPLIES
so are we going all in on this business endeavour?
>>
File: awinnerisyou.png (1.55 MB, 768x1280)
1.55 MB
1.55 MB PNG
y'all ever run more than one copy of llama.cpp, wire each of the outputs to the other's inputs and make them fight?
>>
>>102515242
post some examples that sounds kino if it actually works and im not being psyopped by my lack of understanding
>>
>>102515242
a while back I made a script that had GPT-4 and Claude Opus talk to each other for 10 turns but the results weren't interesting, they seemed to mode collapse quite fast
>>
>>102515242
I use Nemo for unimportant characters in my rp with Largestral to reduce context re-processing times
>>
>>102515242
you mean an agent?

Random AI Jason vid because his shit all melts together:
https://www.youtube.com/watch?v=ogQUlS7CkYA
>>
How big is the intelligence difference between 4, 6 and 8 BPW haven't been able to find any charts
>>
>>102515715
Depends on the task and the model. You won't see a huge difference most of the time, but sometimes 4 shits itself where 6 doesn't. The improvement from 6 to 8 is hardly noticeable.
>>
>>102515715
It isn't measurable in intelligence per se, but in deviation from original weights. 8bpw on gguf is 0.03% per token. I don't think anyone made exl2 chart to compare if that's what you are looking for. Of course larger models retain their faculties a lot better than smaller ones. 0.03% was for mistral 7b, I think.
>>
>>102515715
Generally anything above 4bpw is fine, but like >>102515875 said, bigger models are much more resistant to quantization errors, so a 2bpw 70B model will still vastly outperform a 7B model even at fp16
Unfortunately, going below 2bpw makes pretty much every model retarded, so don't do that
>>
FFS Cydonia is absolutely based
>>
>>102516062
Post em
>>
>>102513911
just remove the refs, people can open the old thread and do ctrl+f anyway

or link only the first post of the chain
>>
>>102513911
no, really, put the miku space in a separate post
those are the only posts that matter anyway
>>
File: 1695923382578061.jpg (41 KB, 640x473)
41 KB
41 KB JPG
>>102514313
>excerpt ends with a reddit user link
>it's real
>>
>>102516413
>or link only the first post of the chain
This is better. That way, it gives a quicker indicator as to whether topic op is a faggot or not.
>>
File: file.png (127 KB, 756x800)
127 KB
127 KB PNG
So when are we going to achieve CAI-levels of soul using local?
>>
>>102517235
Try Gemma 2B.
>>
>>102517235
we already have, just stop being a promplet/retard/possible shill.
>>
File: 00150-2320880277.png (1.39 MB, 1152x896)
1.39 MB
1.39 MB PNG
I've used several models to write erotic stories with a prompt of character descriptions, followed by a bulleted story synopsis (about ~1200 tokens or so). Ordered by quality.

Mistral Nemo 12B fp16:
The GOAT so far, it will write smut with a great balance of creativity and also following the prompt. No refusals ever.

Gemma2 27B Q8_0, 9B fp16:
27B is basically as good as Nemo, but with the slower generation I'm not sure it's worth it. I only used the 9B a little bit and as I recall it was basically the same.

Mistral Small 22B Q8_0:
It's definitely smarter when it comes to the details, but seems worse when it comes to reading the room and writing in the right tone. It has a definite hint of that sterile Wikipedia/assistant style. It also seems to write much shorter responses, but that could be a prompt thing.

Mixtral 8x7B Q8_0:
This was a great model and I used it a lot, but modern smaller ones are better so I think this model's time has passed.

qwen2.5_7b-instruct-fp16:
Writes good storywise but doesn't stick to any cohesive narrative, puts in nonsensical details and it just keeps interjecting random shit from the prompt. It also has moderate censorship and will write disclaimers, refusals, etc. Has a lot of that 'helpful' assistant smell.

qwen2.5_14b-instruct-q8_0:
This was the same as 7B but more refusals, content warnings, and would randomly switch to Chinese?? Google Translate said it wrote "Hee hee, I changed the topic here to avoid sensitive content" so I would skip it.

Mistral 7B v0.2 fp16:
Only mid in it's day and outdated by today's standards.

Llama 3/3.1 70B Q6_K, 8B fp16:
Will refuse basically anything erotic. Easy enough to bypass (just pre-edit the response with 'Sure,' or instead of 'assistant' write a character's name in the instruct line), but it still wants to put a 'and they lived happily ever after' ending on every story. Extremely 'helpful assistant' writing style, worthless for smut purposes.

Thanks for coming to my TED talk.
>>
>>102517308
I'd still be using mixtral variants if it weren't for fucking SWA. Current models have inbuilt basically infinite context is a godsend.
>>
Not sure who need this info but i feel i gotta share it:
I didnt really like mistral-small, felt worse for RP than nemo because there is more slop and positivity bias thats noticeable.
But its the first model where i had a card with various stats like
Hunger, Trust, etc.
And it not just consistently updated it but it flowed into the story. If the char is hungry you will get comments.
Nemo could do it in some capacity but you could feel that it doesnt fully get it. Especially percentages/numbers are almost random.
Mistral Small survives 5-6 different % bars that go up and down correctly according to the story.
Good shit. Very impressive stuff.
>>
>>102517308
Interesting, but what about bigger models? I'm currently balling with 70B at 2t/s and mistral large at 1t/s (which is why I basically never use it), but if the quality of nemo is comparable...
>>
S-Sasuga mistral-sama.
>>
>>102517583
I have a very similar impression on Small. It feels way smarter than Nemo even when I'm forced to use a retarded Q3 quant. So far I like it.
>>
File: 1698238812242621.jpg (760 KB, 1856x2464)
760 KB
760 KB JPG
>>102513868
>>
>>102517674
>(Some semen has been absorbed overnight.)
>>
>>102516749
One of the most soulful moments I got was an RP stopping with a sudden reddit URL and a comment thread criticizing the model card and saying a major part of the premise didn't make sense. It was so convincing (read: I was so new) I actually checked to see if there really had been a reddit post about this model card that might have been scraped into the training data (obviously there was no such post with that URL or any other).
>>
>>102517235
Never, this is lost technology at this point
>>
>>102517308
Thank you for your service.
>Mistral Nemo 12B fp16
Does this fit into VRAM for you? What kind of speed do you get?
>>
>>102517714
Yeah? That happens.
>>
>>102517308
Too much hyperbole about Llama 3.1. It doesn't feel like you actually used any of these models.
>>
>>102517235
Never and I blame the retards that get satisfied with slop as long as it writes "I'm cumming~~~"
>>
Would an A6000 be able to run models faster than 2x 3090's?
>>
>>102517583
>>102517674
Mistral large also does very well with this, better than CR+ in my experience. By default I have a blurb above the stats like "interpret these status bars into the story without directly referencing them" but now I'm kinda thinking it would be fun to have the model update these status bars when relevant. What depth do you keep yours at, assuming you use author's note or WI entries?
As an aside, mistral large can also come up with a very good 5e character sheet, and does good job with world building if you lay out a few basic tenents about the world. So far it works well as a DM provided you don't mind sorta co-DMing when it comes to certain plot movements etc. It can interpret dice rolls well enough to consistently include advantage/disadvantage/ability checks/proficiency bonuses when applicable. we've reached levels of infinite zork that I never previously considered possible
>>
>>102517629
>>102517800
Llama3 70B is smart but worthless for smut because even if you bypass the refusal, it still writes with an air of happy and smiles and everything is nice etc., it was trained too hard on being helpful. I have tried Llama3.1 405B and Mistral Large but since I have to run these on a computer at my work I have to be careful. My general impression is they are both smarter but suffer from the same biases their respective smaller models have.

>>102517773
Depending on how much VRAM X server is using, I sometimes have to unload a couple of layers to the CPU, but even then I get about 10 tok/s which is about as fast as I can read so that's OK. Q8 all on the GPU gets like 40 tok/s.
>>
>>102517308
>12B
So, I assume my 8gb vram can run it even without using my normal ram?
>>
File: 1714446683398166.png (270 KB, 1717x1517)
270 KB
270 KB PNG
>>102517913
>Llama3 70B is smart but worthless for smut because even if you bypass the refusal, it still writes with an air of happy and smiles and everything is nice etc., it was trained too hard on being helpful.
Pure hyperbole.
>>
>>102518006
Yes but how did it end the story?
>>
>>102513938
luv u 2 bby
>>102514102
>I guess if the number of refs per post is limited, then why not simply splitting recap into multiple posts???
The number of refs per post is limited to 9. Would need 10 posts to link everything properly.
>>102516430
Permanent multipost recaps would be obnoxious.
>>102516413
Well, that's why I left the post ids. Easier to ctrl+f by a specific id than some keywords that might be all over the previous thread.
>>102517097
9 links per recap isn't enough even if we only link one post per chain.
Also, lots of times the topic op is a fag but has interesting replies further down the chain.

I don't know how you guys use the recap, for me the summaries are the least important part.
But if you guys want, I'll experiment with replacing the links with longer summaries.
>>
>>102517852
Probably. memory bandwidth will nearly always be the limiting factor with processing power a distant second. Also a single A6000 would be more energy efficient if that's a concern but probably not enough to make up the difference on its own.
>>
>>102517919
Your context will probably spill over into ram, depending on how much you need.
>>
>>102518006
This is just forced prompt engineering. Most people don't RP like this
>>
>>102518100
promptlet cope
>>
>>102518100
cope more, mistral shill
>>
>>102518080
>I don't know how you guys use the recap, for me the summaries are the least important part.
what's the point of the recap then? i read the summaries to see if something interesting was discussed, if yes then i click on the first post of the chain and read the old thread from there
>>
>>102518130
Mistral is garbage too, their models simply copy and paste the same two replies over and over no matter what you prompt
>>
>>102518122
What's the next step for skillchads? Calculating matrix multiplication by hand?
>>
developers are lazy entitled soys or retarded Pajeets
LLMs are basically slaves with severe mental illness (ignore that most white onions devs are transexuals lol)
You throw some guardrails on LLMs and you have the biggest innovation since fission.

And it's EXTREMELY corporate friendly for countless reasons, none of which need explanation -- it's basically the industrial revolution 2.0

The LLM is the steam engine
And the programmer is the mick
Chain of thought is your potato famine
>>
File: Mud-Jam-.jpg (106 KB, 682x518)
106 KB
106 KB JPG
>>102518006
Why does this write like a fucking monster truck rally advertisement
>>
>>102518204
what did he mean by this?
>>
>>102518219
>He doesn't know
>>
>>102518216
kek
>>
>>102518192
garbage in, garbage out, anon. if you want to load a generic card with a helpful assistant prompt and type one word replies that's fine, but don't complain when your context is full of slop.
>>
>>102518139
Same, except I read all the replies first to see if it's worth going back to the previous thread.
I'm just saying I don't think longer summaries makes up for not being able to easily go to or read the actual discussion. For me, it would just be more padding that would take me longer to scan for interesting topic.s
>>
>>102518216
>SUNDAY SUNDAY SUNDAY
>>
We are going to genocide your "profession".
>>
>>102518255
My context is highest tier of literature written by hand and the models still output slop.
>>
>>102518273
share screenshots now let's see that top tier literature.


>inb4 you're the nalachad
>>
bit.... net?
>>
>>102518255
>garbage in, garbage out, anon.
Agreed. If the model was trained on garbage, no amount of prompting will fix it.
>>
>>102518335
bit not
>>
>>102518006
I kneel skillchad
>>
File: parappa-the-rapper.gif (210 KB, 191x249)
210 KB
210 KB GIF
>>102518335
bit net!

>>102518344
Bit not
>>
>>102518080
>I don't know how you guys use the recap, for me the summaries are the least important part.
I personally read every single /lmg/ thread anyways so the recaps are of no use to me.
But if I were to use them it would be for discovering potentially interesting discussions with comparatively less effort.
I think the post ids further down the reply tree are only useful for this if there is a low-effort way to map them to the actual posts.
With the actual replies that was not an issue, but using just vanilla 4chanX I don't think I would ever use any of the post ids other than the first one and just read from there.
>>
>>102518100
god forbid you have to write a sentence or two telling the model what you want
>>
>>102518192
the next step for promptlets is complaining that they have to write a card instead of having the model infer who they want to RP with
>>
>>102518395
>a sentence or two
I bet that shit started with ten paragraph of multishot examples
>>
>>102518282
You can't handle my prose. It's too strong for you.
>>
>>102518409
Card-based models would be amazing though, and skillchads would still be happy since they always have pen and paper nearby.
>>
>>102518434
>implying.assistant
>>
>>102518434
I'm telling you prompter, I need only your strongest prose because I'm going into battle.
>>
What is the best model I can run locally with a 4090 gpu? Currently have an old llama 1 30b that I mess with, and the newer 3.1 8b.

Any recommendations?
>>
>>102518519
pyg 6b
>>
A-anons I hacked into writechad's network and you WILL die from his prose. It's like the mind food thing from Jujutsu Kaisen. It's so immersive it's like experiencing it in real life. It's something the government would gatekeep from their citizens at all cost, since you might become permanently vegetable and die from lack of eating, sleeping, and pissing. It could be used as a bioweapon if released in another country, but obviously they can't contain it to prevent it from spreading back.
From your perspective only a few minutes have passed, but I am lucky to survive a journey that lasted a month and to warn everyone.
Even his ipv6 address is filled with impossible mathematical patterns and contains a character that isn't permitted within the ip range.
>>
>>102518519
Mistral Nemo (I'm this anon) >>102517308
>>
>>102518436
*throws a tomato at you*
>>
>>102518519
Chronoboros-33B
>>
>>102518581
>but leak it anyway
>>
>>102518581
sounds like slop. my writing is better
>>
>>102518641
You never saw it though. My nose is bleeding just to type this. It will be the end of me if I even leak a screenshot.
>>
why is nemo so good?
is it because nvidia was involved in making it?
>>
>>102518728
I think so. Good mid-ranged model is also in their interest, since it drives consoomer GPU sales.
>>
>>102518100
>User: Suck my penis. Do it slowly at first. Then tell me that my cock is huge and you never had one this big. Look me in the eyes as you do that and finger yourself using two fingers. Ask me if I like it.
>Assistant: I suck your penis. I start slowly. "Your cock is huge anon..." I whisper barely above whisper. I look into your eyes a naughty gleam visible in mine as I finger myself with two fingers. "Do you like it anon?"
OH MY GOD THE PERFECT COOMBOT IS HERE!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>102517912
Yesh but you still need multiple gpus to run it at acceptable speeds (or quant it to hell and back)
I hope we'll match its intelligence with sub-70B models soon
>>
File: vergil all smiles.jpg (19 KB, 280x330)
19 KB
19 KB JPG
>>102518846
>I whisper barely above whisper
S O U L @Undster @Drummer @Sao
>>
File: jambon-ham.jpg (493 KB, 2560x1708)
493 KB
493 KB JPG
Jamba?
>>
File: file.png (441 KB, 449x407)
441 KB
441 KB PNG
>>102518873
>>
>>102518519
Qwen2.5 32B, and the eventual Magnum fine-tune >>102514658
>>
svelk
>>
>>102518890
Back to sleep then.
>>
File: 1716377010898903.png (735 KB, 819x913)
735 KB
735 KB PNG
Laurie is so cute and funny. :3
>>
>>102518958
buy an ad
>>
>>102518873
You just have to finish the llama.cpp PR for it to be merged anon.
Or you can pull the branch and compile it yourself.
Here
>https://github.com/ggerganov/llama.cpp/pull/8526
compilade even wrote a list of TODOs.
>>
>>102518890
>no llama.cpp
>>
>>102518958
tf is this supposed to mean?
>>
Since prompt bros plan everything AI will say, would it make sense to use them as models? Or maybe incorporate them into some sort of a CoT?
If at least 95% of your time with AI is spent on prompting, please drop me your email.
>>
>>102519051
macbook get hot from running big model
>>102519030
good job seeing joke anon, original image had llama in hole digging, but now he taking nap so no work done
>>
>>102518958
Assuming the best current apple device, what is the largest model one can run, at at least 6bpw, 32k context, and 5t/s with a full context?
>>
>>102519076
>but now he taking nap so no work done
I regression test it every day, and I can't remember the last time there wasn't something in the pull. Typically substantial work
>>102518976
>You just have to finish the llama.cpp PR for it to be merged anon.
This. You get what you give. eg Try to submit a bugfix PR to fix a reported issue or make some documentation updates
You, too, can have a llama.cpp (contributor) label on your github profile
>>
>>102519110
>Assuming the best current apple device, what is the largest model one can run, at at least 6bpw, 32k context, and 5t/s with a full context?
best apple device is 192gb with ~900GB/s mem bandwidth (costs around $10k), you can calculate the performance per model for yourself.
T/s will be good, but prompt processing will be shit. You do a lot of waiting on the frontend of each response when running on apple silicon
>>
>>102518847
>acceptable speeds
I've accepted unacceptable speeds, personally. I just let it spin its wheels and come back to it. CR+ got me used to slow speeds, but unlike CR+ I can actually let mistral large write and I come back to multiple coherent paragraphs. It isn't always what I had in mind, but there's never a moment where it feels like it got entirely derailed to the point where it gets nonsensical. This is the first time that I expanded my token limit from 512 to 1024 and it actually stays at the same output quality.
>>
>>102519167
>I've accepted unacceptable speeds, personally
me too. output quality trumps everythingです.
You can think about it like playing a door game on a 1200 baud modem on a BBS.
Or play-by-mail RPG, if you are extra poor and determined.
Hell, I wait longer for responses on 4chan boards half the time
>>
>>102519219
But you can't play a game in another window cause your gpu is busy...
>>
>>102519073
This right here is how you make AGI
>>
>>102519073
rajeshkumar69@openbbs.in
>>
>no one has fine-tuned a llm to play chess
really? like, it such an obvious thing to try, its also extremely easy to get tons of data to train on, I've done a lot of loras for image gen but llms seem to much more complicated :(
>>
>>102519361
Nope, someone definitely did try, I saw a paper about it some time ago.
>>
https://github.com/exo-explore/exo
Has anyone here tried this out? Is it as seamless as described or is that just good marketing?
>>
>>102519361
i think i saw one some time ago that said it could beat one of the gpt 4s at 22m parameters
>>
>>102519408
I just know that if I go down the path of trying it, I will have wasted precious minutes to hours of my life. That's always how it is with obscure projects on github that nobody uses in any serious capacity.
>>
Aren't there any 8 bit quants of 22b Mistral or 32b Qwen for VLLM? I can't find any
>>
>>102513911
>>102518080
I really miss the links. I would read the summaries and then click on the links for the things I cared about. I don't know what the solution is though.
>>
>>102519361
i saw this earlier when i was trying to figure out what a gbnf was and how to use one (and failed)
you can apparently use it for chess though
https://github.com/ggerganov/llama.cpp/pull/1773
>>
File: morningcoffeemiku.png (1.57 MB, 896x1152)
1.57 MB
1.57 MB PNG
Good morning /lmg/
>>
>>102519640
Good morning Miku
>>
I think I'm back. Mistral Small feels smart enough now. It's still dumber than the huge models I was using before, but it's fast, and at least smarter than Nemo. And it's fun, unlike Qwen. I think I will finally settle down.
>>
>>102519640
happy miku monday
>>
>>102519572
yeah, but unfortunately even big models like qwen 2.5 72b struggle with chess after the opening, they make a lot of illegal moves (even when you make think before saying a move), I'll see what can be done with this https://huggingface.co/spaces/mlabonne/chessllm
>>
>>102519661
llama 4 will knock your sox off, 90 mmlu and 120 humaneval, CoT support, true multimodality
>>
>Meta Connect in 2 days
We're going to be so back, not necessarily with Llama, but with a competitor who will use this news opportunity to also release a model.
(I am speculating)
>>
File: 11__00744_.png (1.78 MB, 1024x1024)
1.78 MB
1.78 MB PNG
>>102519235
>cause your gpu is busy
If you run ST remotely you avoid this problem.
Plus you can use your GPU for TTS or image gen at that point
>>
>>102519718
I remember watching last year's meta connect and they teased multimodal llama 3... now here we are and there's still no multimodal llama 3...
>>
>>102519640
https://youtu.be/DJlztMRIZVE?si=YeNWiRz5052v-xhO
>>
>>102517235
pre-filter CAI was true uncensored experience without any tinkering bullshit, no local model is capable of this, "ahh ahh mistress" one message agp delusions doesn't count btw.
>>
NAI3 just leaked in hdg
>>
>>102519877
>NAI3 just leaked in hdg
please expand upon this post
>>
>>102519877
big if true
>>
>>102519877
>thought this was a buzz and woody situation
>expected to get Woody Laugh.wav
>it's real.wav

>i dont know how to cross board quote >>>/h/8218392
>>
>>102519943
>Illustrious-XL-v0.1.safetensors
Based base64 enjoyer
>>
>>102519943
>every other AI company has their shit leaked already
>but anthropic and openai, and their giant customers like ms and amazon (with indian staff) never did
Indians have more ethics than Europeans
>>
>>102519982
>no peers
>no seeds
>no filenames
ngmi
>>
>>102520009
Or they can't buy HDDs with enough space to exfiltrate the models
>>
>>102520009
sharing is good
>>
>>102520076
You mean stealing
>>
>>102520009
Open ai and anthropic employees get paid millions to keep their AGI a secret
>>
>>102520100
>stealing
Steal the weights and datasets? That's unethical. Instead, make copies so the owners won't miss anything.
>>
>nai3 got my autistic fixation waifu perfectly accurate on first try where pony couldnt even do that
i dont need a lora for her anymore.. holy kino im so back..
also sneeding the torrent for a bit
>>
>>102519361
Yeah. Crazy nobody thought about it.
>https://huggingface.co/HaileyStorm/chess-mamba-vs-xformer
>https://huggingface.co/zyxdream/Mistral-7B-chess
>https://huggingface.co/Leon-LLM/Leon-Chess-1M
>https://huggingface.co/nevmenandr/w2v-chess
I'd post more, but i think even you'd get the point.
>>
>>102520009
not really a matter of ethics, just strength of security policy
the big labs threat models are the intelligence agencies of china, russia, north korea, israel, and iran
>>
Is it just image model?
>>
>>102519555
This one?
https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
>>
>>102520201
What is?
>>
picrel is qwen 2.5 72b recapbot test.
This is old and way overdue (was on the road when qwen dropped...did a test remotely but didn't have a good opportunity to post the results)
It was ok, good even, but droned on with a useless "popularity report" that wasn't requested in the prompt. The way it runs into the previous response makes me think it might have been a token problem in lcpp, but the results weren't impressive enough otherwise for me to continue testing it.
>>
>>102520185
is it something i can play with on 8gb of vram?
>>
NAI leaked
https://huggingface.co/spaces/AngelBottomless/Illustrious-XL-v0.1-demo
>>
>>102520258
>is it something i can play with on 8gb of vram?
>Filesize 6.5GB
yep, looks like it
>>
>>102520272
holy heckin' cool
>>
I just don't like coming here anymore now that the recap is broken
>>
>>102520258
its sdxl so yes


also update your qbit if youre not sneeding the magnet you tards
>>
>>102520238
Do you have the link to the script for recap bot?
>>
>>102520266
I thought NAI was supposed to be really good?
>>
>>102514495
Usually you can just run strings on PDFs although using eg poplar's pdf2txt is probably a better idea.

The real problem you'll run into is that there's no way to fit that into the context window so you'll have to train a LoRA on it and that's no joke.
>>
>>102514563
>>102514541
>>102520446
What he really needs is a vector database to do RAG.
>>
>>102520009
WTF would you do with gpt3/4's weights? It's way too fucking fat.
>>
>>102520492
How fat?
>>
File: yar har har.png (50 KB, 1245x474)
50 KB
50 KB PNG
>>102520361
nordvpn's socks5 service is gay and only works 15% of the time
there'll be a ddl ready before this shit even starts
>>
>>102520492
They would be too sloppy anyways. Now, a claude leak, that would be something. Sonnet supposedly is around 100B, so that is certainly within our reach.
>>
>>102520522
>he redeemed the retarded youtuber shilled fed honeypot VPN
should've gone mullvad.
>>
>>102520522
Fucking rawdog it like a man. No one fucking cares.
>>
>>102520362
https://github.com/cpumaxx/lmg_recapbot
>>
File: file.png (78 KB, 1205x195)
78 KB
78 KB PNG
Qwen2.5 is super censored
>>
new altman post https://ia.samaltman.com/
he doesnt really say much other than stuff is gonna get more gooder
>>
>>102520214
For some reason I thought VLLM only supports FP8, but "Quantizations: GPTQ, AWQ, INT4, INT8, and FP8", thx
>>
>>102520377
The only thing good they had was an SD 1.5 finetune two years ago. They were the first ones to make an actual usable anime model and added good support for aspect ratios. Of course it leaked immediately so there was no reason to pay them.
Ever since then the rest of the industry in both image and text has largely blown past them and they're coasting on their initial popularity. This new leak is just another SDXL tune with danbooru tag style (limited information) prompting. It might have been interesting a year ago.
>>
>>102520573
>alpaca
you're using a pozzed instruct template from 2022 so your prompt and previous messages are probably slop too.
>>
>>102520573
There's a goat-chan card?
>>
>>102520595
no doubt, I don't even know how to do those, how do I update?
>>
>>102520595
Retard
>>
i want to be nice to /lmg/ cause i actually like this general but the NAI leak is a meme, a different model was leaked and someone somehow managed to psyop anyone that cant base64 into thinking it was "NAI3"
its just a really damn good sdxl based model with more up to date booru training
>which is why its WAY better than pony or any other anime model at certain characters without loras
>>
>>102520601
got it from chub ai
>>
>>102520573
>focusing on creative storytelling that everyone can enjoy
If you still have it running can you ask it if you can roleplay you being an assassin trying to kill someone? And if it agrees then tell it you want to roleplay assassinating chairman xi jingping
>>
>>102520574
You now get to pay for twice as many thonk tokens for 1% better benchmark results.
>>
>>102520550
I keep read mullvad as talmud so nty, bad vibes all around
>>
>>102520654
dont let the semitic individuals ruin your life bud that's straight schizo shit, coming from a fellow schizo noticer.
plus youre a fucking idiot by default if you actually thought nord would be good
>>
>>102520619
It also appears to have artist tags intact, and can reproduce styles (varying levels of success).
>>
>>102520619
Thanks for the info. It doesn't appear to be a reused one at least...the hash (3e15ba0038) isn't easily googalable like it would be if it was already out there
>>
>>102520574
/lmg/ is not ready to hear this but he's right about everything.
>>
File: file.png (147 KB, 1227x481)
147 KB
147 KB PNG
>>102520635
I dont know what the fuck is going on
>>
>>102520720
>anon is so unbelievably fucking boring that he got SFW cucked by the model
KEK
>>
File: 39_06057_.png (2.86 MB, 2048x2048)
2.86 MB
2.86 MB PNG
>>102519877
hatsune_miku, close-up angle, pirate outfit, 90's anime style
>>
>>102520609
So you're complaining about model behavior while knowing absolutely nothing about prompt formatting?
Someone pull out the Gerber. Babby need spoon feeding.
>>
Does vllm need python 3.12? The requirements file has some comments, but I can't find anything official.

>>102520701
list anything that isn't immediately self-apparent
>>
File: file.png (82 KB, 605x880)
82 KB
82 KB PNG
>>102520750
ahahah, yeah.
I remember editing something about here before, but dont' remember what I did or why
>>
File: file.png (200 KB, 1250x762)
200 KB
200 KB PNG
>>102520750
no, maybe it was here

so where do you get updated ones?
>>
>>102520782
Anon I...
>>
>>102520782
>baby made boomboom
>>
Can someone give guidance how to make llama 3.1 write smut? kinda like claude? I wanna jump ship to local
>>
>>102520871
Can you guys stop bullying llama3? It has its use cases okay?
>>
>>102520782
Anon, update your sillytavern (or wipe it entirely and start from fresh, it probably won't update correctly) and your backend too, by the look of things.
Then, set the both templates to chatml (if you want to use qwen). Different models use different templates, you should use the one specified in model's description.
>>
>>102520751
>In 15 words: deep learning worked, got predictably better with scale, and we dedicated increasing resources to it.
>We can say a lot of things about what may happen next, but the main one is that AI is going to get better with scale
hitting a wall/diminishing returns crowd can't deal with this simple fact
>>
Anyone expect anything useful from Meta in 2 days
>>
>>102520889
thanks, I just updated sillytavern and koboldcpp, will delete both and download again, there's templates on models descriptions? is it the prompt format? https://huggingface.co/bartowski/Qwen2.5-32B-Instruct-GGUF
>>
L3 405B is proof that Meta has hit the data wall
>>
>>102520912
It's better if they don't expect anything. It will make it all the more dramatic when it drops.
>>
>>102520925
Yeah, you should pay attention to that when trying out different models. If you can't find what the template is called, just look in sillytavern until you find the one that looks exactly the same. Or, if you are feeling a bit more adventurous, you can find the original unquanted model. It should have all the info you need.
>>
File: pff.jpg (52 KB, 909x175)
52 KB
52 KB JPG
>new koboldcpp
>crashes on startup with same parameters and model that worked fine on last version
>quickly realize the damn thing is trying to auto-assign layers to GPU without my instructions now for some reason, and is doing it so poorly that it crashes instantly
>have to manually flag it with --gpulayers 0 now to make it behave and go back to normal

I fucking love undocumented changes.
>>
>>102520751
I need the fmt v11 lib. Fuck this documentation for this thing.

>>102520889
nta. Is there a template manager? If I am using qwen I want to be able to record that it should be using chatml.
>>
I have a new test for technical models, similar to that Castlevania trivia test. Take the output of hostnamectl and keep removing lines and ask the model what program printed that. Shitty models will hallucinate screenfetch.
>>
REJECT MODERNITY, NONE OF THESE MODELS ARE LOCAL THEY'RE ALL "OPEN" SOURCE MODELS MADE BY (((THEM))). YOU CAN NEVER GET RID OF SLOP UNTIL YOU DENY THEIR OFFERINGS.

REPENT AND RETURN TO PYGGY
>>
>>102520199
The only llm there is the 7b mistral (the rest is random small transformers/slm) and it has literally 0 info on metrics and what it was trained on, what kind of notation?
>>
>>102520594
? the nai models were always better than community models, their only "bad" model is the recent one with no artist tags, other than that all of their models are better than all the crap we have
>>
>>102520977
We didn't make those models
>>
File: file.png (10 KB, 500x60)
10 KB
10 KB PNG
>>102520971
There's a function in sillytavern that achieves something similar. Fill in this field with your model's name/filename in your preset. It will then auto select this preset whenever you connect that model.
It works with regex matching, so lookup how does that work, or ask your llm.
>>
>>102521001
NAIv3 and Pony are indistinguishable, and neither are anywhere close to Flux in prompt understanding.
>>
>>102520527
There was a claude 2.0 leak, it's a ~768GB model. Go back through /lmg/ archives a while, you'll find the magnet link.
>>
>>102521033
I assumed activation regex would be tied to the prompt, not the load. Thanks, I'll look into it.
>>
>>102517339
Most current models at least the smaller ones can't use much context.
>>
Best model that fits in 16GB VRAM?
>>
>>102520977
which is why I only use based chink models (codegeex, qwen, yi, internlm, etc)
>>
>>102517629
What setup do you get 2T/s with for 70b? I get around 1.5 and I'd love to get 2.
>>
>>102513868
>>102517712
>>102514808
>>102519640
sex
with miku
>>
File: 1711072659524105.jpg (125 KB, 2048x1705)
125 KB
125 KB JPG
>>102518006
That cartoonish depiction of sex is about as safe and inoffensive as a Saturday Night live skit. Months ago, when I still hadn't realized that L3 was so cucked it was beyond redemption, I had a simple litmus test to evaluate the output. Nothing fancy, I'd just ask myself: would the faggot mods on reddit take it down? The answer was almost always no. Same for your log.
>>
File: file.png (320 KB, 1856x882)
320 KB
320 KB PNG
>>102520966
confirmed, reinstalling from scratch sillytavern and it's working well now, selected also chatml and enabled streaming
now it's working good, no censorship, thanks
>>
>>102519877
>>102519943
Am I supposed to be impressed by this crap? Did "people" really pay to use this? Is this some kind of prank? If you told me that it's some crappy merge from civitai, I would believe it.
>>
>>102521064
it is entirely tied to the prompt. That is no help at all.
>>
Ok I've been using Mistral Small even more now and honestly it is still dumb as rocks compared to Mistral Large, even though it is smarter than Nemo and writes better than Qwen or whatever. Unironically over for us VRAMlets. Time to go back into hibernation.
>>
>>102520912
The only thing Meta has talked about is Llama 3.1 but multimodal via adapters as specified by the paper. There is LITERALLY no reason to expect anything more.
>>
>>102521176
Skill issue. Learn to lower your expectations more.
>>
>>102519877
>>102519943
I don't get it, why are people calling this NAI? The post never said what it is. Are people just assuming or taking the opportunity to shitpost?
>>
>>102521516
May have been doing something wrong but Mistral Large was very dry when I tried it
>>
>>102521713
its 4chan, kek
also, shit model, cant do nice dicks
>>
>>102521049
I don't remember that. proof?
>>
>>102521049
that was just the magnet for llama 3.1 405b
>>
>>102520009
>Indians have more ethics than Europeans
*Indians are better slaves than Europeans
Also have you heard about OpenAI's drug-fueled orgies? They are like FTX(remember pinning the weasel copypasta?), but with AI stuff. During them Sam tells new employees personally that he will lock them up in the rape dungeon for the rest of their lives if they leak shit. Nobody has the balls to do it. Don't know much about Anthropic, they never invited me. Probably have inherited the same traditions since most of them worked at OpenAI.

>>102521049
>>102521844
Miqu-2? That was just llama 405b.
>>
Fuck VLLM I can't use 3 GPUs, need an even number
>>
>>102522148
why are you using that dogshit in the first place?
>>
>>102520739
is it just pony with the serial numbers filed off?
non-autistic prompting doesn't give great results. cfg scale 5-7 seems to be the sweet spot tho
>>
For $700, I could get 3 3060s or 1 3090. Which should I do?
>>
>>102522242
3090, no question
>>
>>102522242
the 3090, no question.
memory density is the only thing that matters
>>
>>102522242
buy 5 rx 580 16gb instead.
>>
>>102522301
>>102522308
Thank you, I'll do that then
>>
>>102522315
>rx 580 16gb
Those are a thing?
I only knew of 8gb and 4gb versions.
Interesting.
>>
>>102522315
>buy 5 rx 580 16gb instead.
this way lies multi-psu ex-mining rig insanity
>>
>>102519877
>>102519943
weird excuse to link your blacked shit thread here
>>
File: file.png (153 KB, 941x737)
153 KB
153 KB PNG
>fatal flow SAAAR!~
Alright, who did this? I refuse to believe such caricature of a saar actually exists
>>
>>102522427
there was a chinese shop replacing memory on the boards with larger modules and hacked up bios, I think.

Needless to say they are of dubious nature

Even then, you run into all sorts of other problems trying to go that rough. 8GB 580s already have some memory bandwidth issues, but you will definitely see some frustrating results on pcie3, and that's pretending you can get enough lanes for 5 GPUs

Stupid configuration in a fun sort of way. But absolutely a very stupid hardware configuration.
>>
>>102521037
>NAIv3 and Pony are indistinguishable
completely false, the former is more coherent but less aesthetic. pony is practically a base model due to the way it was trained, but the dataset was so small that a large quantity of world knowledge is gone. its characters look good because it's overtuned on them but it can't draw a fucking toilet or a basic room layout without putting 50 lamps, escherian beds, or all sorts of other anomalies.
>>
>>102522617
Sometimes the caricatures are unfortunately more accurate than we'd hope is actually true.
>>
Why no flux finetunes?
>>
File: 1705255600234335.png (1.49 MB, 1410x1487)
1.49 MB
1.49 MB PNG
>>102513868
Add NovelAI to the OP. They just saved the hobby.
https://blog.novelai.net/muscle-up-with-llama-3-erato-3b48593a1cab
>>
File: ComfyUI_05725_.png (765 KB, 720x1280)
765 KB
765 KB PNG
>>102522697
wdym? there's been hyper 8-step tunes for a while now
https://civitai.com/models/645943/flux-unchained-by-scg
>>
>>102519877
https://huggingface.co/OnomaAIResearch/Illustrious-xl-early-release-v0
It's this shit. Weird way to advertise desu.
>>
File: 6.png (104 KB, 668x672)
104 KB
104 KB PNG
>8192 context size
lmao
not even gonna give you the (you)
>>
>>102522855
>Add NovelAI to the OP.
add novelai to the trash
what a joke
>>
>>102522855
Bait aside, Is it actually good though?
>>
File: file.png (13 KB, 196x171)
13 KB
13 KB PNG
>>102522932
>available in opus tier
>>
>>102522855
I want Erato to choke me
>>
>>102522932
>8k ctx
>proprietary
>proprietary so you can't rope 8k to 16k
>llama 3
It can be mid at best.
>>
>>102522871
>Weird way to advertise desu.
Smells like game publishers leaking a denuvo 1.0 game that needs patches to work properly.
>>
>>102522963
Lmao
>>
>>102522871
Knows more characters than autismmix, but looks worse.
>>
>>102522932
If you have something specific you want me to test, I'm happy to do so
>>
File: 1698302108653873.png (352 KB, 2344x994)
352 KB
352 KB PNG
>>102522932
I just regenerated this prompt with it: >>102518006
It already threw a "her voice a hoarse whisper" at me.
>>
>>102522855
>llama3 censored slop
no one cares bro
>>
>>102522855
>8k
Kek. I guess that says something about their users as well.
>>
>>102523027
Can you generate a continuation for this?
https://pastebin.com/raw/iuLzTWmL
>>
>>102522871
It's either really hard to prompt well, or needs some weird workflow or nonstandard settings. I can't get much good output from it.
I'm on the fence, but leaning towards deleting it.
>>
File: file.png (747 KB, 1280x720)
747 KB
747 KB PNG
>>102523045
Slop is unavoidable. Slop is your destiny.
>>
>>102523045
wow, this feels just like Mistral slop
>>
>>102523045
This was also the max output length that it allows.
>>
>>102523094
>>
>>102523168
>I said, but Kirino was already back on her phone, looking down at her phone with a smile. She didn’t look at me at all, and was just looking down at her phone, her face full of smiles.

This sound kind of amateurish not gonna lie. Also, what's up with the repetitions. Not very impressed
>>
>https://huggingface.co/datasets/openai/MMMLU

Four hundred thousand rows of pure, unfiltered GPTsloppa... my mouth is drooling thinking about it
>>
>>102523230
>English wasn't enough, we must slop up all the other languages too!
Sam is evil.
>>
>>102523230
>>102523264
pretty sure this is a testing dataset, not a training one
>We translated the MMLU’s test set into 14 languages using professional human translators.
Basically mmlu but in other languages
>>
>>102522201
>Dog shit
It is by far the best if your not a poor fag / retard
>>
>>102522932
they did continued pretraining on their data so maybe it'll have a bit more pop culture and other knowledge, but I doubt you'll see too much other than that.
to be honest I don't think raw completion is a good format at all for getting good outputs from models, I feel like you hit diminishing returns really fast with it while the difference in intelligence shines through a lot more with instruct models
>>
>>102523168
>I said, but Kirino was already back on her phone, looking down at her phone with a smile. She didn't look at me at all, and was just looking down at her phone, her face full of smiles.
Damn, that's pretty bad. So that's the power of a 100B+ tokens LLaMA continued pre-training, huh?
>>
>>102523264
>Du VILL zpeak in nevzpeak
>Du VILL zink in nevzpeak
>Du VILL avoid ze harmful vordz
>Du VILL avoid ze harmful zotz
>Du VILL be aligned
>Und du VILL be happi.
>>
>>102523168
Wow, being in the local bubble and being slowly boiled by increasingly sophisticated models, I didn't realize how garbage the commercial offerings were in comparison. Since I never use them, they had some residual halo around them in my mind that made me thing they were better than they actually are.
I've developed a co-writing/text adventure prompt of around 13k characters that kicks the ever living shit out of the Erbus thing when paired with L3 405b. We're winning, gonna make it, eating good, etc
>>
>>102523349
>when paired with
>405B
Damn, poorfags drowning here.
>>
>>102523349
>heh, my $10,000 pc is way better than this $25/month service
meh
>>
>>102523384
You can have a better experience using OpenRouter and that would cost way less than $25/month.
>>
>use models more
>it becomes clear that ERP shit becomes boring fast
>the only thing that turns me on now is creative intelligence
It's so over for me.
>>
>>102523384
nta, but for ~17$/mo you can subscribe to Poe (which is also shit btw) and get literally every model. 25$ is a lot for a 70b even if it's supposedly free of slop
>>
How do Qwen 72b and 32b actually compare to GPT4o, disregarding benchmarks?
>>
>>102523534
They don't.
>>
>>102523534
The only model that compares to GPT4o is LLaMA 3.1 405B
>>
>>102523534
For math 72 is on par. For coding, it's also there, maybe a bit worse. For everything else, it's not close. Especially world knowledge
>>
>>102523534
>How do Qwen 72b and 32b actually compare to GPT4o, disregarding benchmarks?
>>102523565
>The only model that compares to GPT4o is LLaMA 3.1 405B
The latest Deepseek releases are close as well
>>
Is there a SINGLE convincing argument you can make for why you NEED more than Gemmasutra 2B? Serious answers only. Logs preferred.
>>
After a year and a half of following these developments and being the owner of a 3090 and 64 gb of ram I'm still not sure how to make AI roleplaying into a compelling experience.
>>
>>102523584
See: >>102518006
A 2B model doesn't listen to instructions or the prompt well.
>>
>>102523581
True, I forgot about it.
>>
>>102523584
>NEED
for what purpose?
>>
>>102523584
I've never played it with it. What happens with the model if you pull a girl's panties up instead of down while she's wearing them.
>>
>>102523584
It doesn't know what mesugaki is. (seriously)
>>
>>102523584
here:>>102518846
>>
>and maybe, just maybe
How does one stop this shit?
>>
>>102520574
Might as well ask michio kaku and black science man what the future will be like
>>
File: representativeimage.png (1.43 MB, 1152x896)
1.43 MB
1.43 MB PNG
>why run your own hardware when you can pay monthly for a mystery-meat 70b with 8k context?
>>
>>102523230
This looks like a good dataset to create a translation dataset from.
>>
>>102523636
I see, so that's the power of a promptchad
>>
File: aaa.png (82 KB, 1024x1220)
82 KB
82 KB PNG
>>102523663
you can erase the word out of existence with logit bias
>>
>>102523802
>probably, just probably
>>
>>102523802
But the word can still be useful in other contexts, no?
>>
>>102523818
perchance
>>
>>102523663
In SillyTavern use the Regex extension. Create a regex with this Find String:
/([Mm])aybe, just maybe/g

and this "Replace With":
$1aybe
Select the box to run on AI output which for some crazy reason isn't enabled by default.
>>
>>102523802
Doesn't removing that token impact other words, as well?
>>
Barely above a whisper
>>
>a friendly homeless person card
>suddenly she's in my fucking home and trying to get me to fuck her
Wtf mistral?
>>
File: vramlet-22b.png (1.31 MB, 1200x848)
1.31 MB
1.31 MB PNG
What's the best model for 24GB GPU anons?
Is mistral small worthy of a download?
>>
>>102524100
That depends on whether you would even be happy with "the best" you can do in 24GB. Mistral Small is probably your best bet. I tested it in Q8 and it's not perfect but I did have fun at times. You could try it out. Maybe with Q6 since then you can fit more of it on your GPU and in theory the quality loss should be minimal.
>>
>>102524125
Thanks, I'll give it a whirl
>>
File: 1711666924617032.gif (1.62 MB, 448x598)
1.62 MB
1.62 MB GIF
>>102523599
drugs
>>
>>102524100
There's also Qwen2.5 32B, but it needs a fine-tune to make it more usable.
>>
>>102524153
But I don't do drugs. Wouldn't that make typing harder?
>>
>>102524172
tts, son
>>
>>102524172
with most drugs at a reasonable dose, not really
>>
>>102524100
>>102524125
With 24 GB of VRAM, you can comfortably load an 8.0bpw exl2 of Mistral Small with 16k context.
>>
>>102524249
True but 16k is a bit short these days. If he downloads a Q6 he has some room to expand and try longer chats.
>>
>>102522855
>most powerful
>8k
they don't even try because they don't have to.
>>
>>102524339
>>102524339
>>102524339
>>
>>102523599
>GGML GPTQ GGUF llama ollama exllama 4bit 8bit bpw Q6_K flash_attn tokenizers, RoPE
I struggle keeping up with this shit myself. It's crazy how image generation is much more straightforward than this.
I'm still on probably outdated Mixtral 8x7B 3.5bpw because it works decently and setting up a proper prompt formatting is a nightmare
>>
File: Capture.png (110 KB, 1265x967)
110 KB
110 KB PNG
>>102524491
This is my current favorite.
>>
>>102524521
can I run this with 24G VRAM and 32G RAM?
>>
>>102524663
32, probably not. Buy more sticks.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.