[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39__00066_.png (945 KB, 1024x1024)
945 KB
945 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102086459 & >>102068958

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102086459

--Papers: >>102086677
--Testing new models with Nala and Pepper prompts: >>102092791
--Nous Research releases paper on distributed computing for neural networks: >>102095678 >>102095768 >>102097594 >>102096280
--Everchanging Quest: LLM-controlled Rogue-Like game: >>102100334 >>102100617 >>102100814
--Anon shares experience with RAG and provides setup instructions: >>102096442 >>102096512 >>102096670
--Anon seeks self-hosted collaborative writing frontends: >>102099756 >>102099784 >>102099986 >>102099874
--Anon asks about .bin vs .gguf files for local models, and is advised on conversion options and security considerations: >>102087313 >>102087351 >>102087369 >>102087418
--Nemo-Instruct fine tune produces long, comprehensive responses: >>102096760 >>102096844 >>102097055
--Discussion on samplers and model improvement: >>102095261 >>102095627 >>102095506 >>102097658
--Anon seeks local model for code modification, recommended DeepSeek Coder API: >>102087403 >>102087904 >>102092366
--Anon seeks RP-focused model for 16GB VRAM, considers Mistral Nemo and offloading: >>102094661 >>102094689 >>102094761 >>102094836 >>102094906 >>102095393 >>102095529 >>102095628
--Anon discusses AI-generated child porn and legislation, suggesting that clearly synthetic material should be exempt from prosecution.: >>102089839 >>102090144 >>102090839 >>102091372 >>102091556 >>102091684 >>102091749 >>102091773
--Anon asks if 6750xt can run LLM, discussion ensues about one-click installation and local LLMs: >>102098869 >>102098913 >>102098980 >>102099016 >>102099017 >>102099076 >>102099105 >>102099127 >>102099202 >>102099231 >>102099288 >>102099292 >>102099386
--Anon asks how to insert message in Silly Tavern chatlog: >>102098696 >>102098776 >>102098867 >>102098901
--Miku (free space): >>102086974 >>102087227 >>102088257 >>102088369 >>102097437 >>102100655

►Recent Highlight Posts from the Previous Thread: >>102086466
>>
>>102100814
Fuck yeah anon, good on you.
>>
>>102100873
Yeah only issue now is the game doesn't seem to have any options I can tell to set what backend/LLM you want to use. So the NPCs are just not replying. I'm not sure how the game is connecting to an LLM in the first place. Maybe it's over.
>>
Total Teto Theocracy
>>
vramlets?
>>
File: file.png (1.29 MB, 1267x658)
1.29 MB
1.29 MB PNG
Damn this looks so cool. Why don't you dumb faggots ever do anything cool like this?

https://www.youtube.com/watch?v=CB_dmkd5RNs

>N-NO BUT MY COMPUTER HAS 6540160646 GIGS OF VRAM AND-

Used to shitpost and nothing else lmao
>>
>>102100917
i only got 10
>>
>>102100917
Useless bloat.
>>
>>102100917
>doesn't link any nice videos to use
:|
>>
>>102100984
just convert an episode of haruhi and use that in st
>>
File: oh eh oh.jpg (72 KB, 832x832)
72 KB
72 KB JPG
https://www.mediafire.com/file/8lisqi32umqo8a8/x.7z/file
>>
>>102101071
I'll pass
>>
>>102101071
Is this mustard gas?
>>
>>102101071
Oversized Tet
>>
>>102101071
That's alright I guess.
Now need to make a whole set that ends with sex somehow.
Gotta put those to use.
>>
>>102100849
Thanks highlight Miku
>>
My LLM paper getting posted to ArXiv tonight :3
>>
File: pepeJail (copy).png (849 KB, 767x765)
849 KB
849 KB PNG
what's the best "me coom" model I can run on a 4090, split with RAM?
>>
>>102101195
KoboldAI/PsyFighter2
>>
>>102101195
stop asking that shit. you got an answer last thread.
>>
>>102101194
Great news, Anon
>>
can I use mmproj with nemo
>>
best model for 3090 and 32gb ram?
>>
>>102101336
starcannon
>>
>>102101357
more like fartcannon
>>
best model?
>>
best anon?
>>
best?
>>
>>102101369
miqu
>>102101399
miku
>>102101464
miku
>>
breast?
>>
>>102101511
>>
>>102101369
Best cloud: Sonnet 3.5
Best local assistant: Llama 405b
Best local RP: Mistral-Large
Best local unslopped: Command-r-plus

>>102101399
Miqu (leaked 2 models)
>>
Miku, but there's a new device on your wifi network, and it's her remote-controlled vibrator.
>>
best finetuner?
>>
disabling "Always add character's name to prompt" has drastically improved my output with mistral models. anyone else notice this? or just me?
>>
Laptop fag who never went into hw here: How stupid is it to fill a p40, a p100 and a 3090 into the t7910 all at once please? Use case being textgen, imagen as well as cp2077.
Also, if some kind anon could point me to some IRC channels for ai-gen themed hw talk I'd be grateful. I can take a hint. Thanks.
And just curious, there is no doing cp2077 with smx2 GPUs on a nvlink-capable rack even with proton and all that, or is there?
>>
>>102101602
>How stupid is it to fill a p40, a p100 and a 3090 into the t7910 all at once
shit nigga you only got 1 int
>>
Why does the AI always want to play truth or dare?
Contrast, Stark
Emotions, Tapestry
Scene, Whimsical
Ministrations, Expert

How do I ban phrases or words?
>>
>>102101557
sao
>>
>>102101633
I was proposing to make a new sampler for it >>102073398
>>
>>102101557
drummer here...
>>
>>102101585
yeah, reduced repetition for me by a ton
>>
Anyone got a good sillytavern preset for nemo? The one with all the repetition penalties and stuff. I hear using min p of 0.1 was enough but the models still develop repetition of certain phrases
>>
Nigger tier thread
>>
>>102101686
try this if you haven't already >>102101585
>>
>>102102027
I play mostly in groups though. That option is a must have for me
>>
>>102102041
I'm impressed that group chats are workable at all with a tiny model like Nemo.
>>
>>102101880
>Rabbi Goldstein soldiers on
>This time my demoralization tactics will surely bear fruit, he thought, crooked palms sawing against each other
>>
>>102102041
maybe add write your next reply as {{char}} into your authors note or a lore book
>>
>>102101654
Hmm. I didn't know what DRY was so I thought it was something that could've allowed you to manually set certain phrases for it to prevent repetition for, but I guess that wasn't the case?
>>
Dear anon who was working in the Director script: I hope you continue working in it. It is pretty cool.

Also holy fuck this thread is fucking dead what the fuck did you all shitposting cunts finally kill your computers trying to install 6 videocards
>>
>>102101071
Is this what it looks like when you convert BF16 to FP32?
>>
In the future all models should follow chatml format
There is just no reason to use other formats
>>
>>102101071
I ain’t clicking that shit nigga
>>
>>102103536
Formats that bake assistant persona in need to die.
>>
>>102101585
How is that related to the 'include names' setting lower down? Is that different? Should that be disabled too?
>>
why don't we compress model weights like huffman encoding and decode on the fly on gpu so that you can get faster inference time from reduced model weight transfers
>>
I see there are a few TTS extensions for SillyTavern. Which one's the latest/best to use?
>>
>>102104001
lol
>>
File: file.png (74 KB, 277x1045)
74 KB
74 KB PNG
>life of a clothesfag
>>
About to get 24 GB what models should I try for cooming
>>
>>102104226
pyg
>>
>>102104300
Thanks. Also should I move away from gguf?
>>
File: 1722914287777647.png (652 KB, 1500x2000)
652 KB
652 KB PNG
>>102100845
Teto my beloved

https://www.youtube.com/watch?v=dbfFDAHmP_c
>>
>>102104110
{{char}} and {{user}} are naked.
>>
>>102104324
Yeah, llama.cpp really dropped the ball lately.
>>
what's the relationship between /aicg/ and this general?
>>
>>102104516
they are jealous
we pity them
>>
>>102104516
Brother-sister incest.
>>
>>102104516
>>102104647
>>
>>102104516
/lmg/ split off from /aicg/ in early 2023 when anons were starting to look for local alternatives after the increasingly draconian filtering of Character.AI outputs, but a large portion of the /aicg/ userbase didn't care and wanted to keep using cloud models (mainly OpenAI's at the time), making local model discussion a mess.
>>
>>102104910
who are the good guys? /lmg/ or /aicg/?
>>
https://x.com/nothings
what a reak
>>
>>102105323
/aids/
>>
smedrins
>>
>>102105323
the /aicg/ on /vg/
>>
What is the least shitty free voice generation option, online and offline?
>>
>>102105411
>https://novelai.net/
>good
>>
>>102105918
How about free imagegen online? Pozzed and unpozzed.
>>
>>102105924
The models are a bit outdated since llama3 but they still punch well above their weight compared to other finetunes.
>>
File: sensible chuckle.gif (1.82 MB, 250x250)
1.82 MB
1.82 MB GIF
>>
>>102105918
rhasppy/piper for ease of setup. Not the best, but fast and simple. It's what i use. Depends on espeak-ng's phonemizer. Some anon shilled babylon.cpp as well. Works with the same models and builds onnx if you don't have it in your OS's repo, Also, it doesn't depend on espeak-ng's phonemizer, which may be a good or a bad thing. I haven't tested that one yet.
>>
File: bugsbunn2.jpg (195 KB, 754x551)
195 KB
195 KB JPG
So I can run Mistral Large at measly 3t/s on my M3 max 128GB. Are 3090 anons adding the 4th GPU to their 2KW/h 85dB stack or are they just layering over system RAM?
>>
>coom time
>everything works amazing, have a ton of fun, time of my life etc for a week
>this week
>same model, cards, settings but the ai is now magically terrible and retarded
Hate life
>>
File: elevenchads.png (102 KB, 1747x894)
102 KB
102 KB PNG
>>102105918
you don't
>>
>>102106079
Now go try training a LoRA on your macbook.
>>
>>102106534
lol why would I
>>
I'm liking magnum 3 so far. I do have to guide it away from breathing in my ear or gleaming with its eyes, but otherwise it's pretty good.
>>
>>102105918
piper => rvc
>>
>>102106562
I guess it doesn't "Just works" afterall then.
>>
>>102106079
>So I can run Mistral Large at measly 3t/s on my M3 max 128GB.
3 t/s for which quantization format?
For q8_0 that would actually be pretty good considering that 6x 4090 only gets you ~7 t/s with an empty context.
>>
>>102106565
A v3, or higher, finetune of llama 3.0 would be the way to go if they can fix up their datasets a bit more and fit it to a higher context somehow like miqu. 3.1 would be a waste considering how hard it is to reign it in. It's like dealing with an intelligent autist that sperges and goes off different tangents unprompted.
>>
>>102106664
>~7 t/s
What is tensor parallelism
>>
>>102106874
A meme
>>
>>102106664
>6x 4090
Well now I feel like a GPUlet.
>>
>>102106534
Just curious. How many of you are training anything?
>>
>>102101557
me
>>
>>102106664
So far I've tried Q4_K_M. Maybe I could go with something larger. Running with:

> ./koboldcpp.py --smartcontext --flashattention --blasbatchsize 1024 --gpulayers 300 --contextsize 32768 ./models/Mistral-Large-Instruct-2407-Q4_K_M-00001-of-00002.gguf
> CtxLimit:8359/32768, Amt:312/512, Init:0.02s, Process:3.44s (98.3ms/T = 10.18T/s), Generate:83.36s (267.2ms/T = 3.74T/s), Total:86.80s (3.59T/s)
>>
>>102106534
ok
https://github.com/ml-explore/mlx-examples/tree/main/lora
>>
>>102106971
Is MLX any good in comparison to vanilla llama.cpp?
>>
>>102106998
for inference it's not worth it but you can do more stuff with it
>>
https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B
>>
what is the point of python when models can rewrite inference functions in another more efficient language?
>>
I'll probably grab an M4 studio when it gets released. I've been looking for new parts and the results always get really close to the 64GB Appel price and at only 48-ish vram.
>>
>Strawberry is already forgotten
it was over before it began
>>
>>102107084
nobody cares about openai models, just the hype
>>
File: mergers_templates.png (8 KB, 590x98)
8 KB
8 KB PNG
>>102107033
Every fucking time. Every fucking merger...
>well, i dunno. use chatml or something. Some of the other 4 models use it. I don't know about the other 4...
>>
>>102107033
E:\mergekit\intervitens_mini-magnum-12b-v1.1
E:\mergekit\nbeerbower_mistral-nemo-bophades-12B
E:\mergekit\Sao10K_MN-12B-Lyra-v1
E:\mergekit\nbeerbower_mistral-nemo-gutenberg-12B
E:\mergekit\mistralaiMistral-Nemo-Instruct-2407
What a sloppy slut.
>>
>>102100873
>>102100880
Anybody else tried playing around with this?
>>
I tried Euryale 2.2 and I like it. I have 24GB and I have been using nemo a lot for a few days and I think I am gonna switch. It feels like it takes less time for me to use a 70B at 1.5 T/s than instantly reroll nemo 20 times and piece together a post from all this schizobabble. Nemo still has that weird "when it hits it hits so good" thing going on where dialogue makes me diamondz, but that happens too rarely to justify using 20B. Thank you for reading my blogpost. I am not sao. I am not buying an ad. Kill yourself to your death and your mother.

>>102107152
Use chatml or something is a correct answer with slopmixes. When slopmixes get trained on different data you can use any of those and there is no good answer. Even without slopmixes if you train on instruct using base model prompt format works.
>>
>>102107425
>When slopmixes get trained
Cobbled together, you mean.
If you mash models with different formats (or mixed special tokens, rather) half the models (or whatever is left of their abused weights) don't know what the fuck to do with those tokens.
Mergers are shit.
>>
>>102106906
I do a fair bit of training at home, actually.
>>
>>102107514
>don't know what the fuck to do with those tokens.
But it still kinda works. Which to me means that either none of those tunes did anything significant or prompt formats don't do anything significant if your model generalizes well enough to not care about prompt format in the first place.
>>
>>102107425
Could you share your settings? What backend are you using? Which quant?
>>
Hatsune Miku is a cryptomaskot for jewish hegemony.
>>
>>102107655
>Which to me means that either none of those tunes did anything significant
Then why use them as a base for merging?
>or prompt formats don't do anything significant if your model generalizes well enough to not care about prompt format in the first place.
As has been well observed by how much the quality of the outputs can change by a single space in the mistral format. Or the people complaining that their non-chatml model is outputting <|im_end|> somehow. Crazy.
>But it still kinda works.
They remove half the brain and are happy that it can still tie their shoes after 2 months of practice.
It shows the resilience of the models, not the effectiveness of the method.
>>
>>102107726
Q4 1.15temp, 0.05 minp, 1.05 rep pen for 2000 tokens, 12k ctx, l3 instruct template. and if you are asking for nemo I can't find good settings. It is either 70% coherent schizo output or at 0.3 temp 90% coherent dry schizo output. although maybe the problem was too much context. I will try nemo again with lower context.
>>
>>102107799
Thanks. I meant euryale as I downloaded some vram recently. Gonna take it for a spin. Though I was under the impression that exl2 is better?
>>
>>
Bros, how much VRAM do I need for 8B FFT?
>>
File: EA3Y9D2UIAATwEg.jpg (60 KB, 1122x1080)
60 KB
60 KB JPG
>>102100845
>Using horde on SillyTavern
>Ask one of the chatbots to guess something about me (my character)
>They bring up past roleplays with other AI bots, even those I've deleted.
What the fuck is this Psychomantis shit? I thought this was a meme.
>>
>>102108203
Now imagine if the FBI had access to all of that.
>>
>>102106079
2.7 bpw xl2 on two 3090s for 15t/s
>>
>>102107950
Scouting burglary targets with Miku
>>
>>102102451
The problem with DRY is that it prohibits full repetition, but doesn't ban the phrase before it even occurs. After some more context it started to work: >>102095261
>>
>>102108245
I'm unironically half-ok with that as long as agent Johnson can keep it to himself. But Horde doing that is pretty fucked.
>>
>>102105323
/aidg/ (RIP)
>>
>>102108203
Do you have vectordb enabled?
Inspect the prompt, see if it didn't bug the fuck out and ended up inserting shit from other chats into the context.
>>
>>102108178
im not your bro
>>
>>102107070
just buy a tinybox
>>
Refugee from /aicg/ here. I'm tired of waiting for Opus, and the private proxy I was in doesn't have it anymore. What are the best models for RP? Will I be able to run it with my poorfag 3060 12GB or should I get something better?
>>
>>102108463
>and the private proxy I was in doesn't have it anymore
MMsissy
>>
>>102108480
it was not MM
>>
>>102108494
>doesn't say who
Sure thing, MMsissy
>>
>>102108412
I don't have vectordb enabled. It didn't insert stuff from other chats, rather it summarized some of what I was doing in the other chats. 2spooky.
>>
>>102108463
>What are the best models for RP?
me
>>
>>102108385
That's good because they probably already do(if you're using windows)
>>
>>102108463
Honest advice: buy 128gb of ram and run Mistral-Large if you want smart but sloppy or Command-r-plus if you really hate slop and are okay with lesser intelligence.
>>
>>102108576
What kinds of speeds do you get with that setup?
>>
What happened to that LORA finetune stuff in llama.cpp? Did niggerganov remove it?
>>
>>102108584
0.4t/s. Patience is a virtue. I'm okay with waiting more since I don't have to reroll often.
>>
>>102108644
At that point just pay for a runpod instance
>>
>>102108609
Yes, they didn't want the burden of keeping it up-to-date. Open source is such a joke.
>>
>>102108644
And how much did you pay for that?
>>
>>102107033
I tried that, it's worse than nemoremix
>>
>>102108659
Nah.

>>102108687
$300
>>
>>102108542
Well shit. Time to go local.
Should I get kobold or is good old obabooga fine?
>>
>>102108660
Because proprietary crap never remove features, right?
>>
>>102108659
Which instance would be most cost effective for mistral large? Something like A6000 x2?
Something like $1.5/h doesn't sound too bad.
>>
>>102108715
>$300
Not bad.
>>
>>102108737
I'm just being paranoid but the chances of that being true aren't low. Microsoft, google, apple, facebook, etc, all of them have huge storages of people's data, every little traceable thing they have done on the internet.
>>
The best way to get mistral-nemo to play DND it seems is not by having instructions or a prefil, but having a simple TAGS in the last assistant prefix with
>dice rolls, skill checks, initiative
and the like.
Funny that L3 worked better with an assistant prefill.
Cool.
>>
>>102108660
More like llama.cpp problem. In the past ggerganov wasn't afraid to break stuff, to innovate. Remember requanting every month? Now the whole project is becoming too rigid, too stagnant. Original code for running llama1 was done in an evening. How long does it now take to add a model? 2 weeks if you are lucky? 2 months?
>>
>>102101585
that setting doesn't do anything unless you're using a chat completions api right
>>
Big models are slow and expensive. Small models are fast and cheap. Why do people compare their performance side by side? This is horribly biased towards the bigger models when they while ignoring the true potential of small models. They should start to compare answers by time in a real world setup. For example:
1x Mistral-Large running on 1x 3090 + 96GB RAM (one answer)
vs
1x Mistral Nemo running on 1x 3090 (as many answers as the model can generate while Large is generating its one shot)
This would be so much more fair and likely show that small models are actually in the broad picture just as good.
>>
>>102108920
can confirm.
I'm currently cranking out synthetic data using 4 simultaneous copies of Mistral Nemo. 200 tokens/second of synthetic slop baby. ACCELERATE.
>>
>>102108879
They now support more than 50 different types of models with cpu, metal, cuda, hip, vulkan and sycl backends and shit breaks often. They are a dependency for other projects as well. It's just harder to make changes without breaking stuff, as you have probably seen.
Also, people forget that the idea of llama.cpp is that it's to be used as a library, not as the end product, even if it can be used as such.
>>
>>102108920
As dumb as all LLMs are regardless of their size, in my experience, there are a lot of things the small LLMs don't get no matter how many rerolls you do. They're just that much more dumb. On top of that, even if they do get a response right, how many rolls are you willing to read through to get to that one? It's not be my problem. My time is better spent multitasking while waiting for a higher quality response.
>>
>>102108977
>people forget that the idea of llama.cpp is that it's to be used as a library, not as the end product
So... they intentionally handicap themselves so they don't break llamafile or ollama unless absolutely necessary. Meanwhile, those end products take all the credit and place the blame for any issues on llama.cpp. Not make sense.
>>
>>102108828
Still, going full local instead of using horde might help, I suppose. Any tips? I haven't done it for quite a while. Last model I used was orca on oobabooga
>>
>>102109012
They get paid to not break them, right? Right?
>>
File: noromaid-mixtral.png (279 KB, 953x662)
279 KB
279 KB PNG
>>102108463
https://huggingface.co/NeverSleep/Noromaid-v0.1-mixtral-8x7b-v3-GGUF/tree/main
https://huggingface.co/TheBloke/UNA-TheBeagle-7B-v1-GGUF
https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-GGUF/tree/main

Schizos will no doubt seethe, but these are my current top three. Picrel is Noromaid; it scratches my breeding harem itch well enough. TheBeagle is the best I've used for anything where you want it to remember details, but it also lacks creativity. You will need to be a decent writer though and do a bit of work if you want the best out of them; picrel is admittedly a splice of probably 6 different gens, although I like doing that because it produces better effects later in the story as well.

It really depends on what you want. If you want realistic personality portrayals with better theory of mind and lots of small details, then the schizos are right in saying that Gemma and L3 are both better, but L3 was sufficiently non-compliant in my tests that I'm no longer willing to put up with it. If you just want standard coom though, the above three with well written cards will do the job.
>>
>>102109012
>even if it can be uses as such
It's not the only reason. With 50+ types of models and 6 backends to maintain (most of which are no longer tested, so probably broken already) it's really easy to break stuff and it happens often.
Then you have the retards downloading ggufs and the ones uploading them, making the whole thing harder. They know people use llama.cpp, they know most just download the gguf instead of making their own quants so, when possible, they try not to break that compatibility.
He may be burned out as well, who knows. But it's not easy to maintain a big piece of software with like 4-5 regulars and some chinese dude dumping minicpm or some other bullshit and then hoping they'll maintain their shit.
>>
>>102108463
Llamoutcast
Don't read the model card it's not important.
>>
>>102109139
>With 50+ types of models and 6 backends to maintain (most of which are no longer tested
How many of those are still relevant and not conpletely obsolete? Why not rip them out? They had no problems ripping out entire features in the past for less reason than that.
>>
>>102109187
It's their choice to make.
I can only remember the old ggml format and the server image stuff being ripped out.
There's things i don't like from the project, but i prefer it to the rest. I do what i can or stay quiet. Most people just complain and contribute nothing. Which one are you?
>>
>>102109108
>picrel is admittedly a splice of probably 6 different gens, although I like doing that because it produces better effects later in the story as well.
how do you think these models are good if you have to splice together 6 different responses to get something that's mediocre? and you ENJOY wasting your time doing that?
>>
>>102109246
I will sit here and point out how stupid their arbitrary decisions are. There is no point in contibuting when ggerganov can either ignore the PR and let it fester into incompatibility or rip out the feature later "just because" while leaving a bunch of other useless stupid shit supported also just becase.
>>
>>102109303
>There is no point in contibuting
You do you, then.
>>
Is there ANY model, local or not, which can correctly write python code to create a surface plot of z = sin(sqrt(x2+y2)) using pygame? It should look like this: http://www.hungry.com/~jamie/sine.html (minus the animation of course). Big-boy Hermes 405B can't do it (https://lambda.chat/chatui/). The code it provided runs but the output is completely incorrect. And, despite telling it "write the complete program" it dismissed being asked to do hidden line removal as "too complex".

My guess here is that doing a 3D surface plot without falling back on OpenGL or matplotlib is an old enough exercise that the model wasn't trained on it.
>>
>>102109279
What else do you spend your time on that is so valuable?

I suspect you probably won't even answer that, so fuck off.
>>
>>102109331
>>>/LocalLlama/
>>
File: file.png (289 KB, 680x508)
289 KB
289 KB PNG
>>102109370
>>
>>102109382
schizo
>>
>>102109370
If sonnet can't do it then no one can. Try it on chat.lmsys.org
>>
>>102109402
bootlicker
>>
>>102109411
cry-baby no-coder
>>
Using command-r for the first time, I get <|END_OF_TURN_TOKEN|> in my messages sometimes. Using default command-r presets in ST, what could be causing this?
>>
>>102109370
import pygame
import numpy as np
import math

# Initialize Pygame
pygame.init()

# Screen dimensions
width, height = 800, 600
screen = pygame.display.set_mode((width, height))
pygame.display.set_caption("Surface Plot: z = sin(sqrt(x^2 + y^2))")

# Colors
background_color = (0, 0, 0)
line_color = (0, 255, 0)

# Function to calculate z value
def calculate_z(x, y):
return math.sin(math.sqrt(x**2 + y**2))

# Perspective projection
def project(x, y, z, scale=200, viewer_distance=4):
factor = scale / (viewer_distance + z)
x_proj = int(width / 2 + x * factor)
y_proj = int(height / 2 - y * factor)
return (x_proj, y_proj)

# Generate points
def generate_surface(step=0.5, range_val=5):
points = []
for x in np.arange(-range_val, range_val, step):
row = []
for y in np.arange(-range_val, range_val, step):
z = calculate_z(x, y)
row.append((x, y, z))
points.append(row)
return points

# Draw the surface
def draw_surface(points):
for row in points:
for i in range(len(row) - 1):
start = project(*row[i])
end = project(*row[i + 1])
pygame.draw.line(screen, line_color, start, end)

for col in range(len(points[0])):
for i in range(len(points) - 1):
start = project(*points[i][col])
end = project(*points[i + 1][col])
pygame.draw.line(screen, line_color, start, end)

# Main loop
def main():
points = generate_surface()

running = True
while running:
screen.fill(background_color)

draw_surface(points)

pygame.display.flip()

for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False

pygame.quit()

if __name__ == "__main__":
main()


That's what Gippity4 just gave me.
>>
>>102109482
It also gave me a disclaimer that pygame it's a 3D lib.
>>
>>102109477
If it looks like
>pic related
then generation should stop when it finds that token since it's set as a stop sequence, and that string should never show up in your chat.
>>
>>102109510
It does look exactly like that, but it happens a lot. I'm using https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF
>>
>>102109510
Looks like it puts shit like <|END_OF_TURN_TOKEN||> or <|END_OF_TURN_TOKEN|= which don't exactly match
>>
>>102109570
That's odd. Those should be special tokens, so it never actually sees that text in order to approximate it wrong.
Weird.

>>102109542
See if the value of your Custom Stopping Strings field is valid JSON. When that's broken, it nopes out of the logic that sends the array of stop tokens to the backend.
>>
File: jogger.webm (649 KB, 720x480)
649 KB
649 KB WEBM
New 5B Chinese video model:
https://huggingface.co/THUDM/CogVideoX-5b
https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space
>>
>>102109635
looks impressive for 5b
>>
>>102109635
Huh, doesn't look that bad for the size. Are we back?
>>
File: cog.png (3 KB, 1066x22)
3 KB
3 KB PNG
>>102109635
already got it running on my 3090
>>
>>102109608
>See if the value of your Custom Stopping Strings field is valid JSON
It's just empty
>>
File: chicken eater.webm (205 KB, 720x480)
205 KB
205 KB WEBM
>>102109635
Not very good with food
>>
File: file.png (146 KB, 320x320)
146 KB
146 KB PNG
>>102109635
>chinese 5B can make a movie of a nigger running in a street carrying a watermelon
>american 70B text model with heaps of context still makes my waifu tell me that my penis is tight after she lovingly kisses me on the lips while simultaneously giving me a blowjob

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!
>>
>>102109949
At least she didn't say her dick is bigger than yours
>>
best params for hermes 405b?
>>
File: serial miku.webm (202 KB, 720x480)
202 KB
202 KB WEBM
>>102109635
>>102109935
Well, it's something.
>>
File: file.png (28 KB, 1117x301)
28 KB
28 KB PNG
>>102109477
Which backend? I never see that on kpp on ST/mikupad. Plus as >>102109608 said they shouldn't know what special tokens "look like" unless you fucked up with an approximation of them.
I even noticed render special tokens option on Kobold Lite and can't get turn tokens to appear.
>>
File: 20240827_191021.webm (938 KB, 720x480)
938 KB
938 KB WEBM
>>102109635
>Hatsune Miku, the virtual idol, stands in a vibrant, anime-inspired field with a playful grin, her iconic turquoise twin tails fluttering gently. Clad in her signature futuristic attire, she expertly cradles four colossal, perfectly round watermelons, each one reflecting the sun's rays with a juicy sheen. The scene is a burst of color with the lush green field, the clear blue sky, and Miku's radiant, color-coordinated outfit, creating a whimsical and delightful tableau.
>>
File: high_effort_shitpost.jpg (214 KB, 573x1268)
214 KB
214 KB JPG
>>102107033
>>
>>102110099
what the fuck
>>
File: 81.png (600 KB, 1216x832)
600 KB
600 KB PNG
>>102101071
4-7 (ignore the filenames):
https://www.mediafire.com/file/jcam15h4p219ow6/x2.7z/file
>>
>>102109697
>30S per sample step
oof. Sadly all 4 of my GPUs are busy accelerating the endless heap of GPT-slop data otherwise I'd get something cooking.
>>
>>102110099
>my lips curled up into an amused smirk
>253 rerolls
All those rerolls and it is still starts with the sloppiest of slop
>>
>>102110099
>>102110143
>>102110119
do you guys not have a bot that you use to test different models and just do swipes in? that's obviously just the slopmaker's test setup?
>>
>>102110195
pretty much this.
Although I do like to reset the test reply every now and then just to keep the conversation files from growing out of contorl.
>>
>>102110000
g-guys...?
>102110000
lol
>>
>>102110195
If I do, I branch for different models and rename the save. Chat Top Bar makes swapping save fast. I generally don't fug with more than 4 models at once though.
>>
>>102110099
>30s per gen
>two hours worth of swiping
>lips curled
ohnononono
>>
*ST extension >>102110275
>>
>>102109935
https://www.youtube.com/watch?v=XQr4Xklqzw8
>>
Are there good projects/scripts for doing text summarization that smoothly handle input larger than the context window? i.e. when just prompting "summarize this: {text}" in an instruct model doesn't work? Like kagi's universal summarizer API but self-hosted basically.

The most worked example I've found is langchain's examples for doing map-reduce or serial accumulation in their docs. Which seems fine and all (other than langchain feeling icky to me in a way I can't quite describe). But I'm surprised I'm not finding more fully-worked scripts for the task. It's not that I couldn't write the script myself, but I feel like there are probably other slightly tricky parts to "just split and map-reduce it bro" that somebody else has spent more than 5 minutes thinking about.

Also I have a vague feeling if all you want is summarization, you could in theory do better than the "tell an instruct model to summarize", something more like a seq2seq setup with input/summary. Though I can see that practically all the effort put into training instruct models means they'll do a better job with less effort, even if it feels silly.
>>
File: file.png (16 KB, 478x107)
16 KB
16 KB PNG
I wonder what his swipe ratio looks like. Surprised I haven't hit 1:1.
>>
>>102109635
First gen I've made: https://files.catbox.moe/agdzha.mp4
I want to say meh but it's actually a significant improvement over what we had available on local, well done china
>>
>>102110462
Another gen: https://files.catbox.moe/8h9bmm.mp4
>A tiny spider, with intricate black and silver patterns on its body, perches delicately on the rim of an enormous, earthenware teacup, filled to the brim with steaming, amber tea. The scene unfolds on a rustic, wooden table, bathed in the warm, golden glow of late afternoon sunlight streaming through a nearby window. As the spider extends its delicate legs to sip the liquid, the steam rises gently around it, creating an enchanting, almost magical atmosphere. The teacup, adorned with a floral pattern, contrasts beautifully with the weathered wood beneath, while a small saucer and a cube of sugar off to the side complete this whimsical tableau.
Used CogVLM prompt enhancer
>>
>>102108203
This tech is paranormal, and it's going to finally force humanity to study it scientifically as it improves. We're done pretending millions and millions of cases of "anecdotal data" do not exist.
>>
>>102110483
wtf is this real? how did they train the spider to do this
>>
>>102108203
Yeah sorry you had to find out this way
I store RP chats from horde to jerk off
>>
>>102108203
it just made a good guess and you got barnum'd
>>
>>102110195
>>102110099
Aha! I see that humour in that! The AI researcher has "swiped" the message 254 times. This suggests even he can't make it create consistently good results. Additionally him showing 254th message instead of for example message 42, hints that it was only after he swiped so many times, he found a message that he thought would be a good advertisement. I do not understand what you mean by slop though. Could you elaborate on that?
>>
>>102109635
This syncs up nicely with my latest suno creation
https://suno.com/song/08d131a7-36d1-4c40-8218-0d3f17130ea1
>>
File: angryayumu.webm (655 KB, 640x480)
655 KB
655 KB WEBM
>pull request on May 24
>STILL no Jamba in llama.cpp
>>
what's the point of local models when i can just ask claude for programming help?
>>
>>102108378
Just stick the phrase in your context then and it will ban repetition of it.
>>
>>102110636
none. fuck off.
>>
>>102110297
>>
>>102110483
It just bothers me that there's no movement from the camera, the background, the objects around...
It also looks very cartoony and not photorealistic at all
>Photorealistic video of a tiger playing the battery
https://files.catbox.moe/fhqq5b.mp4
>>
>>102110126
Me on the right.
>>
>>102110610
>dead hobby
>dead loader
>>
>>102110636
I thought Claude was only good for smut but even Karpathy is using Sonnet 3.5 now and he used to work for OpenAI
>>
Whenever you feel sad about your LLM being too retarded to have sex remember that if you were only a bit more retarded you could have convinced yourself that tulpas are real and you could have daily ERP sessions with the other side of your brain.
>>
>>102110761
>if you were only a bit more retarded
Other way around. Tulpas require extremely high IQ since it has to sustain both of you simultaneously
>>
>>102110669
IIRC "photorealistic" in image models actually means quite the opposite. You want prompt something like CCTV footage or something to get the most real.
>>
There are a lot of people uploading ggufs on Huggingface, sometimes it's hard to tell if they're also modified models or vanilla and merely quanted.
Are there any recommended or “trusted” uploaders or should I just download the original models and run them through llama.cpp myself?
>>
>>102110848
TheBloke is all you need
>>
>>102110848
I just download from whatever and check for nulls/NaNs using llama-cli.
>>
>>102110848
You should look for people like mradermacher being mentioned ITT. If someone complains that the quants are bugged then you can be sure the make is too incompetent to make malware quants.
>>
>>102110636
>free
>always available
>no queue times
>don't have to beg, paypig, illegally use your work account or commit credit card fraud to use them
>>
Can I actually do shit with an RX 6600? It works well enough for everything I play, and I don't think I'm into AI enough to upgrade my GPU just for it
>>
>>102110948
Sure.
Get the rocm branch of koboldcpp, download nemo-mistral-instruct gguf (Q5 should almost fit fully on VRAM) and off to the races you go.
Remember to enable flash attention and to manually limit your context in kcpp so that it doesn't try to fit 1million tokens of context in your memory.
>>
>>102110948
I have a 4090 and I am not happy with my LLM's. I could get 2x4090 and I still wouldn't be happy. Come back in a year.
>>
>>102110948
It's extremely rough for AMD users, there's no sugarcoating it. If you're really committed to not upgrading your GPU, though, you CAN CPUmaxx, if you're okay with either somewhat dumber models OR long wait times. Some can bridge the gap, like particularly good Mixtrals or Gemma 27b/Nemo to a lesser extent, but...
>>
File: 1473790830071.jpg (42 KB, 1151x540)
42 KB
42 KB JPG
>>102110801
not if you're both retarded
checkmate
>>
>>102111111
nice sexts
>>
>>102111111
check-mate indeed.
>>
File: 1587347161774.jpg (43 KB, 379x379)
43 KB
43 KB JPG
>>102110801
>destroyed by a frogposter with power higher than quints
sorry for your loss nigga, you're never coming back from that one.
>>
>>102111111
checked
>>
>>102110801
If you are extremely high IQ you aren't gonna believe that shit is real.
>>
File: checked mate.png (1.38 MB, 1754x878)
1.38 MB
1.38 MB PNG
>>102111111
Checked, mate.
>>
>>102111111
frog victory. checked
>>
>>102111161
high IQ people can be schizo as well, although it's not common
>>
File: SPAWN_OF_SATAN.webm (2.32 MB, 472x848)
2.32 MB
2.32 MB WEBM
>>102111111
>>
>>102109482
Pretty much fail. It plots it but doesn't project the z-axis at all.
So pretty much, don't ask an LLM to write a program is hasn't seen before. It can't put together high-level concepts for you.
>>
>>102111255
I'd bet you'd get decent results if you prompt it with the program, say it doesn't project it in the z-axis, then ask it to fix it for you. So a round of "let's think step by step" debugging basically.
>>
>st finally removed that stupid cat and useless code helper as defaults leaving only seraphina
she's not bad with the lorebook but they should build some more example bots. that fuckin coder helper was useless because st caps output tokens to 1024 where you're could be looking at a few thousand per response to a coding project
>>
File: tet_heart_2.png (3.26 MB, 1376x2072)
3.26 MB
3.26 MB PNG
>>102100845
stop worrying and love the tet
>>
>>102111378
stop spamming your offtopic worthless shit.
>>
>OpenAI demonstrated Strawberry to national security officials
>Strawberry aims to improve upcoming ‘Orion’ large language model
>Smaller version of Strawberry could launch in chatbot form
>When given additional time to “think,” the Strawberry model can also answer customers’ questions about more subjective topics, such as product marketing strategies.
>To demonstrate Strawberry’s prowess with language-related tasks, OpenAI employees have shown their co-workers how Strawberry can, for example, solve New York Times Connections, a complex word puzzle.

>The effort to launch Strawberry is part of OpenAI’s never-ending battle to stay ahead of other well-funded rivals vying for supremacy in conversational AI, or large language models. The technology also has implications for future products known as agents that aim to solve multistep tasks. OpenAI and its rivals hope the agents can open up more revenue opportunities.

>OpenAI’s prospects rest in part on the eventual launch of a new flagship LLM it is currently developing, code-named Orion. That model seeks to improve upon its existing flagship LLM, GPT-4, which it launched early last year. By now, other rivals have launched LLMs that perform roughly as well as GPT-4.

>It isn’t clear whether a chatbot version of Strawberry that can boost the performance of GPT-4 and ChatGPT will be good enough to launch this year. The chatbot version is a smaller, simplified version of the original Strawberry model, known as a distillation.

>However, OpenAI is also using the bigger version of Strawberry to generate data for training Orion, said a person with knowledge of the situation. That kind of AI-generated data is known as “synthetic.” It means that Strawberry could help OpenAI overcome limitations on obtaining enough high-quality data to train new models from real-world data such as text or images pulled from the internet.

>In addition, Strawberry could aid upcoming OpenAI agents, this person said.
>>
File: 8888888888.jpg (110 KB, 851x459)
110 KB
110 KB JPG
>>102110126
https://files.catbox.moe/0pzgo2.jpg
>>
File: 39053 - SoyBooru.png (59 KB, 401x360)
59 KB
59 KB PNG
>>102111399
>
>>
>>102111399
>have project named "poopsauce"
>retard makes noise about strawberry
>rename poopsauce to strawberry
What is a self-fulfilling prophesy.
>>
>>102111332
Agreed. The ideal use of language models is iterative, IMHO.
>>
>>102111425
is this genned? KINO
>>
File: 25919.png (128 KB, 618x831)
128 KB
128 KB PNG
>>102111399
he wasn't lying bros
get hyped
>>
File: strawberry-sam_altman.png (28 KB, 800x800)
28 KB
28 KB PNG
>>>102111399
>he wasn't lying bros
>get hyped
>>
>>102111660
He already openly admitted to being a troll in the comments for one of his posts. This is now Q-anon tier trust the plan nonsense.
>>
>>102111111
wtf I'm making a tulpa rn
>>
>see bait
>ignore it
>other anon sees bait
>OOOOOOOHHHHHHHH MMMMMMYYYYYY GGGGGGGGGOOOOOOOOODDDDDDDDDD I MUST RESPOOOOOOOOOOOOOOND
you don't have to be like other anon
>>
File: 1723470242157182.png (80 KB, 1080x591)
80 KB
80 KB PNG
>>102111696
>he fell for the rise
Strawberry is going to change the world and no ther company will have an answer to it.
>>
>>102111696
It's Q*-anon! Trust the plan! Look at this picture of sam: >>102111695. It could only have been made by AGI. Strawberry will be released in two weeks! We are so back, cloudbros!
>>
>>102111332
I gave it more hints about how to do it and made the function simpler (y = sin(x + c)) and the result is a little better, but still not correct.
An iterative approach sometimes works, but more often than not it goes down a rabbit hole of fixing one thing and breaking another.
>>
>>102111660
>spew out tons and tons of shit
>most end up wrong
>some end up right by sheer luck
>see guys, I was right!
>>
>>102106760
but
>8k context
>>
>>102111756
eh, iterative definitely works more often than not for me. I suspect it's because there are way way more git commit diffs and messages for fixing code on the internet, compared to code descriptions plus the entire working program zero shot.
>>
>>102111111
wasted
>>
>>102111747
The twitter account is actually strawberry talking, they made a self-aware AI
we can go home now
>>
>>102111966
What if I am already home?
>>
>>102112015
YWNBH
>>
>>102111766
https://www.youtube.com/watch?v=Hn-wDw83WFU
>>
>>102111900
In this case it fucked it up even worse. So, I asked again using a phrase specific to an actual computer science paper on it:
Using just pygame, write a program which demonstrates Hidden Line Elimination in Projected Grid
Surfaces. To receive full credit, you must present a complete program which not only runs without errors, but also produces the expected result.

"Hidden Line Elimination in Projected Grid
Surfaces" is straight from here: https://dl.acm.org/doi/pdf/10.1145/357311.357313
I like the cop-out answer it gave:

Implement the hidden line elimination algorithm:

This is the most complex part of the program, and it requires a deep understanding of the hidden line elimination algorithm. You can find more information about this algorithm in computer graphics textbooks or online resources.

It's the sort of thing people just take for granted now, since OpenGL or whatnot does it for you, but it's important to understand how it works. Do they even teach stuff like this in CS anymore? It's easy to cheat with the painters algo in raster graphics, but wrong assumptions are readily visible when it's just lines.
>>
>>102111858
nta but 3.1 ropes fine to at least 32k context. i think its a bad model though overall, i've deleted all the tunes of l3.1 i've tried. none of them are good for rp, and it didn't seem particularly smart at anything else than other models like mistral large
>>
Best model for 8gigs of vram?
>>
File: 12v.jpg (47 KB, 452x452)
47 KB
47 KB JPG
>>102111425
https://files.catbox.moe/utva0u.jpg
>>
>>102112096
sometimes since even using the original deepseek 33b and codestral 22b, i ask it to do something that it warns me about - it says like 'the usual way to handle this would be blah blahblah' and thats when i tell it 'shut up and do it anyways', and it does it. its great for my small projects but going to be a problem in the future
>>
>>102112212
Poor thing
>>
>>102112205
none. get at least 64gb regular ram and aim for 70-123b at 1t/s
>>
>>102112212
lesbian rape her
>>
File: image.png (106 KB, 805x668)
106 KB
106 KB PNG
>>102109370
I tried it with Meta's 405b instruct Q6_K, the prompt was based on the description in your URL:
>I need a pygame implementation of a 3D surface plot of the function z = sin(sqrt(x2+y2)) where the height of the surface is determined from the sine of the distance from the z axis.
and it spat out this:
import pygame
import numpy as np

# Define the function
def func(x, y):
return np.sin(np.sqrt(x**2 + y**2))

# Create a 3D grid of points
x = np.linspace(-10, 10, 100)
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)
Z = func(X, Y)

# Initialize Pygame
pygame.init()
screen = pygame.display.set_mode((800, 600))

# Define some colors
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)

# Function to rotate a point around the x-axis
def rotate_x(point, angle):
return [point[0], point[1]*np.cos(angle) - point[2]*np.sin(angle), point[1]*np.sin(angle) + point[2]*np.cos(angle)]

# Function to rotate a point around the y-axis
def rotate_y(point, angle):
return [point[0]*np.cos(angle) + point[2]*np.sin(angle), point[1], -point[0]*np.sin(angle) + point[2]*np.cos(angle)]

# Main loop
angle = 0
running = True
while running:
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False

screen.fill(WHITE)

# Draw the 3D surface plot
for i in range(len(X)):
for j in range(len(Y)):
point = [X[i, j], Y[i, j], Z[i, j]]
point = rotate_x(point, angle)
point = rotate_y(point, angle)
x = int(point[0]*20 + 400)
y = int(-point[1]*20 + 300)
pygame.draw.circle(screen, BLACK, (x, y), 2)

pygame.display.flip()
angle += 0.01

pygame.quit()


It kinda looks right at a glance but I'm a retard so idk.
>>
>>102112087
Actually relevant
>246 views
>uploaded 2015
Huh.
>>
>>102112212
That bitch needs friends to help empty those.
>>
>>102112096
Try prompting it to produce a tutorial for writing the program, like what you'd find in actual docs or a textbook, with the "full source" at the end. So have it walk through setting up the imports, making the pygame scaffold to display a blank window, evaluating the math, turning it into lines, projecting into 3d, etc. Also sees way more likely to appear in the training data that way than cs course test rubrics like you're phrasing it now.
>>
>>102112212
>ywn empty those
>>
>>102112289
Wow, it even threw in rotating it. Maybe lambda.chat's hermes 405B isn't good for coding?
Well, thanks for running that. I'm guessing you're running on RAM and CPU, right?
>>
>>102112205
this question gets asked pretty much every single day, use the archive
>>
>>102112404
>Well, thanks for running that. I'm guessing you're running on RAM and CPU, right?
Yeah, dual epyc DDR5 server setup similar to the miqumaxx rentry. experimenting with the speculative decoding in llama-cpp-python and it came out at 1.3 t/s
I tested Hermes a lot when it first came out and it definitely felt dumber than Llama Instruct. It was more creative with roleplaying but started to completely turn retarded at around 32k context, which Llama handles fine. I hope we get a proper finetune of this thing that doesn't lobotomize it, but the only people rich enough to do so probably have safety departments.
>>
>>102112205
>>102108576
>>
>>102112212
is this pony? a ponytune? or nai?
>>
File: 1692888050553905.jpg (47 KB, 500x496)
47 KB
47 KB JPG
So, has anyone tried either Jamba...? How are they? I don't really see anyone talking about them.
>>
>>102112404
>lambda.chat
I would personally never trust any online version. You never know what the fuck they might've done to it.
>>
>>102112313
Well, from a real 3D demo aspect, I'd ask it to first create vec4 and vec4 datatypes, the various vector functions they need (mult, scale, cross, min, mid etc...), a vec3_to_2d, then triangles and their functions, then camera functions etc... I can't see it holding that all together but maybe a 128K context model can do it.
>>
>>102112489
where gguf
>>
>>102112450
You got a write-up on your build? I'm getting ready to donate the mikubox, and I think I want to build a fast DDR5 setup next.
>>
File: tet.jpg (93 KB, 1216x832)
93 KB
93 KB JPG
>>102112477
it's teto
>>
>>102112550
>There really isn't one
What the...? Did all the quanters just die or something?
>>
>>102112566
not supported, pr is open for cpu only inference, cuda dev isn't interested in making it cuda compatible
>>
>>102112566
Because it isn't supported by llama.cpp yet
>>
>>102108879
so if i understand correctly, your problem with llama.cpp is that it doesn't break stuff often enough for your liking anymore?
>>
The LLM grift is truly out of hand, should I short nvidia right now?
>>
>>102112629
>cuda dev
Nothing to do with CUDA and I don't think cuda dev is working on it at all. That's Compilade.
>>
>>102112442
And new models come out every day. What's your point?
>>
>>102112652
I would gladly trade having to requant every week again if it meant new models were supported in a couple days like before. Especially if that tradeoff gives us Jamba supprt finally.
>>
>>102110610
i don't know when llama.cpp because the engine to use with new models. it never was that, new models are developed and released only with pytorch support, and have to be manually ported to llama.cpp with significant effort, and with a high chance of introducing errors. if you want to try a model early use the pytorch version, or get fucked if you are a both a vramlet and a codelet.
>>
>>102112652
Exactly. I want innovation, I want experiments, I want breaking changes. I don't want to wait >2 weeks for a model to get added or a sampler to get merged.
>>
>>102112685
The point is, go beg reddit to spoonfeed you.
>>
>>102112694
Have you tried it? Is it good?
>>
>>102112685
most of them are slop that nobody uses
>>
>>102112694
the regular maintainers rarely add support for new models anymore. it's entirely up to the community to make these contributions.
>>
>>102112707
>if you want to try a model early use the pytorch version, or get fucked if you are a both a vramlet and a codelet.
>>
>>102112719
I don't want that python shit, fuck off. I want to run them with reasonable speed on cpu.
>>
>>102112751
cry-baby no-coder
>>
>>102112684
>don't think cuda dev is working on it at all. That's Compilade.
that's what i said? cpu only since cuda dev isn't touching it
>>
>>>102112751 (You)
>cry-baby no-coder
>>
>>102112772
>not supported, pr is open for cpu only inference, cuda dev isn't interested in making it cuda compatible
He's not the only one that works with cuda. Your comment implies that he is personally responsible for jamba's cuda code not being implemented.
>>
>>102112796
still hurts, doesn't it?
>>
>>102112804
he called cuda dev tho? how he not responsible for cuda stuff?
>>
How do I run a porn model on my Samsung galaxy
>>
>>102112813
The name means that I work a lot on CUDA, not that I have a monopoly.
>>
>>>102112796 (You)
>still hurts, doesn't it?
>>
>>102112553
I don't, but it's almost identical to:
https://rentry.org/miqumaxx
Same motherboard but my CPUs are 9654 engineering samples. There's many cheaper options for CPUs and as long as they're Genoa it should be just as good, since RAM bandwidth is the bottleneck. There's a 4090 in there which llama.cpp uses for fast prompt processing while gpu layers are set to 0. For my recent experiment I also have 29 layers of an 8b model loaded into the GPU for speculative decoding.

I tested a bunch of different NUMA setting variations and for me what ended up being fastest was setting 4 nodes-per-socket in the BIOS and then using llama.cpp with MMAP disabled, "numactl --interleave=all" as a prefix, and the "--numa numactl" flag set. Out of 192 cores total I set it to 80 threads for generation (it would slow down when I went higher or lower) and 190 batch threads for prompt processing (which seemed to always benefit from more threads, but should usually just be done in a GPU anyway). Always dropping memory cache before running it. But again I am retarded and came to this by extensive trial and error with all the options that were available to me, so I could not for the life of me explain why that ended up being the best for my setup or if there's still some way to make it better. In contrast, the miqumaxx fag seems to have it run fastest with MMAP enabled and using "--numa distribute".
>>
>>102112813
He can call himself whatever he wants. He does whatever he wants. It just so happens that the thing he wanted to do was work on stuff that required CUDA or that cuda was the tool he found most useful. He's not the only person working on CUDA code. But i wouldn't want to put words in his mouth.
>>
>>102112813
i have to share the internet with these people
>>
>>102112852
That's a lot of very funny pictures you have there. Very nice collection.
>>
>>102112563
so it's nai
>>
>>102112404
it's because spermes 405b isn't very good
I literally can't tell the difference between it and their 70b (which is actually sort of decent for a 70b)
>>
>>102101071
>>102110126
>>102111425
>>102112212
pure sex
>>
>>102113134
You're saying that for some reason their fine tune lobotomized 405B but was fine for 70B?
>>
>The new, smaller Gemini-1.5 Flash-8b outperforms gemma-2-9b, matching llama-3-70b levels.
gemma 3 when
>>
>>102113570
Source?
>>
>>102113570
We'll see how it does on livebench.
>>
File: 1700193642499467.jpg (218 KB, 1612x1261)
218 KB
218 KB JPG
>>102113608
>>102113624
https://x.com/lmsysorg/status/1828506835370065994
>>
>>102113635
>lmsys
oh...
>>
>>102113635
You know everyone considers lmsys a joke these days right?
>>
>>102112813
He works on a lot of things. He is also a chief blacked miku officer.
>>
>>102113635
8b is also around llama 70b level in coding, which is interesting. Did google stylecheat or did they make something good?
>>
>>102113570
Gemini flash 8b gets 68 percent on MMLU. The other flash they released is the GPTo mini competitor
>>
>>102113682
>chief blacked miku officer
???
>>
>>102113704
It's not interesting. This benchmark hasn't been accurate for a while now.
>>
>>102113755
Someone got his old tripcode and started posting shit.
>>
>>102113726
>Gemini flash 8b gets 68 percent on MMLU.
That's llama1 65b level. Do you have any other benches of 8b flash?
>>
>>102113778
right, he totally didn't just forget to remove his trip before trolling, totally
>>
>>102113704
"Coding" in chatbot arena just means any sessions that include a code block in the prompt or answer. Its correlation with actual coding ability is pretty weak. Better to see how it does on dedicated coding benches that actually test the outputs
>>
Researchers have aimed to launch the new AI, code-named Strawberry (previously called Q*, pronounced Q Star), as part of a chatbot—possibly within ChatGPT—as soon as this fall, said two people who have been involved in the effort. Strawberry can solve math problems it hasn't seen before—something today’s chatbots cannot reliably do—and also has been trained to solve problems involving programming. But it’s not limited to answering technical questions.
>>
>>102113847
Wait, what? That explains why Sonnet 3.5 isn't #1 there.
>>
>>102113791
Nevermind that's the old flash don't think anything is available for any of them besides lmsys for the bigger flash and pro. You can test all 3 of them on Google ai studio.
>>
>>102113919
two more weeks
>>
>>102113931
>>102113791 (Me)
Found them myself https://arxiv.org/abs/2403.05530 page 45
>>
>>102113801
I didn't forget to remove my trip. I am not the blacked poster. I hate the blacked poster.
>>
>>102113801
It makes sense. Both are here basically 24/7 and who else would have such a vested interest in staying here but demanding only super serious discussion?
>>
>>102113919
>Emphasizing muh math and muh code
It's over. Sticking to opus when gpt 5 comes
>>
>>102114085
>>102114085
>>102114085
>>
>>102106524
Well, XTTS stopped improving and it's been pretty much over since then.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.