[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1713291463105789.jpg (980 KB, 1856x2464)
980 KB
980 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Luka Love Edition

Previous threads: >>101990712 & >>101981616

►News
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>>102001133
We've survived... but at what cost?
>>
>only 2 posts
/lmg/ is dead
>>
File: ComfyUI_00014_.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
Will AI make my burger stonks be affordable again?
>>
File: GTY5zp2agAAUMEy.jpg (105 KB, 1200x1188)
105 KB
105 KB JPG
>trying some kinky stuff
>practically all models are infinitely more willing to fuck a literally feral dog than a child
We live in a society
>>
Recent Highlights from the Previous Thread?
More like Schmecent Schmilights from the Schmevious Schmed.
>>
Doing some training on 2x3090 ti's and I realized that the ETA doesn't really change a lot when I do nvidia-smi -pl 300 vs nvidia-smi -pl 400. I always assumed it was negligible for inference but not for training.
>>
>>102001322
lmao gottem
>>
File: 65456465465.png (79 KB, 1892x742)
79 KB
79 KB PNG
"Mistral large is GPT-4 tier" yeah right, a 27b model still performs better on random trivia than a 123b. There is no excuse for this.
>>
>>102001133
>>
>>102001347
everyone knows that big models are cope for retards who overspent on hardware
there's hardly any improvement beyond 27b or maybe even 12b considering nemo
>>
>>102001347
gemma can't have sex
>>
File: 46545645564.png (215 KB, 1839x781)
215 KB
215 KB PNG
>>102001347
Knows the lyrics to the duck song but is still retarded about what happens during it

"The song tells the story of a duck who goes into a store to buy some grapes, but unfortunately, the store does not sell grapes. The duck then asks for various other items, each time being told they are not available, leading to a comical and repetitive exchange."

How retarded is that? It just regurgitates data it does not fully understand. This tells me there's a massive waste of data there. I've been telling you all Gemma is by far the most intelligent open model.
>>
>>102001382
>gemma can't have sex

Where are the tunes?
>>
>>102001394
Is there a uncucked version of gemma?
>>
https://hf.co/MangoHQ/TinyMagnum-4b

leaked magnum model?
>>
►Recent Highlights from the Previous Thread: >>101990712

--No Python skills needed to install local language model, but required for training: >>101998830 >>101999031
--Running 70B 4bit quant model on a single 3090 and CPU, speed estimates and VRAM utilization discussion: >>101993566 >>101993582 >>101995240 >>101995293 >>101995389 >>101993624
--Microsoft's Phi-3.5 AI model released: >>101997221 >>101997326 >>101997748 >>101998144 >>101998634 >>101998666 >>101999035 >>101998170 >>101998239 >>101998274
--Llama 3.1 repetition and penalty settings discussion: >>101995250 >>101995337 >>101995416 >>101995451 >>101995494 >>101995529 >>101995551 >>101995676 >>101995524 >>101995889 >>101999802
--Discussion on using FP16 for prompt processing and lower quants for token generation: >>101993920 >>101994184 >>101994601 >>101994437 >>101994666 >>101994443 >>101994493 >>101994546 >>101994584 >>101994763 >>101994937 >>101995118 >>101995680 >>101995884
--Pre-llama era models suggested for worst RP experience: >>101999152 >>101999249
--Mistral Nemo 12B recommended for 12GB VRAM: >>101997827 >>101997858 >>101997908 >>101997951 >>101998024 >>101997961 >>101998131 >>101998225
--Advice on adapting 3.5k sys prompt for Mistral large: >>101991079 >>101991095 >>101991115
--Nemo vs other under 70b models, base vs finetunes, intelligence, writing style, context, and prompting: >>101994638 >>101994667 >>101994745 >>101995985 >>101996041 >>101996305 >>101996149 >>101996206 >>101995960
--Magnum 123b model review and rant against tuners' practices: >>101997022
--Criticism of sloptuners and Nvidia's practices: >>101991923 >>101991958 >>101991997 >>101992019 >>101992932 >>101992965 >>101993396 >>101993477 >>101993287 >>101993397 >>101993561
--Context saving idea for efficient prompt processing: >>101994949
--Miku (free space): >>101991128 >>101991825 >>101993017 >>101993048 >>101994222 >>101996742 >>101997197 >>101999088 >>102000707

►Recent Highlight Posts from the Previous Thread: >>101990920
>>
alright does anyone have *the* Nala card? I'm not on my main PC so I don't have it here, and my finger is swolen so I don't feel like searching a whole lot for it.
>>
>>102001513
https://www.characterhub.org/characters/Anonymous/Nala
>>
File: ahh ahh archer.png (129 KB, 930x562)
129 KB
129 KB PNG
>>102001527
Well it's not my usual template I had to whip one up from scratch based on my memory of the appropriate special tokens. But at t=1 neutral she turns into an archer and murders me
>>
File: 1710729147915598.jpg (236 KB, 1024x1024)
236 KB
236 KB JPG
>>102001133
>>
>>102001347
That's an issue with all Mistral models tbf
>>
>>102001347
It's over for Mistral. Redownloading Gemma
>>
>>102001619
heading to the miku convention with miku and miku... and miku...
>>
Sup. I'm the guy that likes clothes again. Does this look like an effective addition to system prompt?

>Extensively detail breasts with detail about their motion, shape, size, etc... Extensively detail clothes and how they fit on a person's body, with details like the tautness of the fabric, the glossiness of sheer materials, etc...

I'm trying different models but honestly it doesn't seem to affect much.
>>
I think Phi-MoE has potential.
>>
>>102001347
I wouldn't use a single or even a few questions to judge a model's complete trivia knowledge. And I also wouldn't seriously use lmsys, since it does not give you token probabilities. However, what I can confirm is that Largestral at IQ2_M does seem to perform badly on the Castlevania quote, which I just tested. But it seems that on a different trivia question which I tested on lmsys, pic related, Largestral does perform a bit better. I rerolled once to see if it would change and it didn't. They gave the same answer both times. IQ2_M locally also gives the same answer which is interesting since that quant should be lobotomizing. For the question, Mistral got both characters right although the order was wrong, while Gemma thought of a completely different character from a different KR series.

Frankly since trivia is extremely easy to make problems for, you should make an actual benchmark, which could be composed of 100 multiple choice questions, and automate it with a script and get token probabilities for the answers. You can be taken seriously this way, and also be a good contributor to the public knowledge of LLMs. If you don't have the hardware to run large models then you could ask someone, I'm sure there are willing people in the thread. I would certainly be up to run such a benchmark, if someone made it, on the models I have the ability to run.

As for the general Gemma debate, honestly in my short time testing it, it did seem pretty good based on my feels, especially considering the size. I just didn't use it any further because of the 8k.
>>
>>102001704
123b is undertrained to an extent, quantization hurts it a lot less.
Mistral was crying about not having enough GPUs and using MoE for a reason.
>>
File: 1724201905616.png (204 KB, 600x633)
204 KB
204 KB PNG
>>102001464
im starting to think these highlights are AI generated
>>
File: 1711129548940742.png (5 KB, 693x138)
5 KB
5 KB PNG
>>102001678
>43B total
>performs worse than Nemo 12B or fucking 9B Gemma
MoE? More like MemE. No wonder Mistral has abandoned the concept.
>>
>>102001715
anon...
>>
>>102001734
It's a MoE of braindead retards (4B~), they shot themselves on the foot with this.
>>
>>102001734
>Arena
Kek.
The real benchmark to look for will be Livebench even though it's not perfect, it's still the best we have currently for general knowledge. RP skill is a dif story though and there don't seem to be any decent benches for that.
>>
File: 1697033028019407.png (28 KB, 1101x517)
28 KB
28 KB PNG
The most interesting thing about Phi-MoE is that it was trained on 5T tokens in just over 20 days.
>>
>>102001734
>Llama 3.1 8b score
Wtf metabros our model is shit even retarded Moe's are mogging the shit out of it
>>
>>102001347
>>102001394
A couple of cherry picked examples don't mean anything. All models hallucinate given certain queries. Even if Gemma is less prone to confabulate on average, the censorship makes it useless.
>>
>>102001788
Let me guess, you own three 3090s? 4?
>>
>>102001734
Because it's only 6.6B Active anyone with 32 gigs of RAM should be able to run it in gguf form. Meme prompt aside I do like what I've seen from it so far more than what I've seen from 8B. Gives it the potential to be the entry-level model of choice in the future.
>>
>>102001827
When I say 32 gigs of RAM of course I mean in Q4 of course. Although we'll have to wait until it gets llama.cpp support before we can really see how well it handles quantization. As of right now I reckon I'm the only person on this thread who has actually used it at all, considering it takes a quad GPU rig to run it in f16
>>
>>102001827
>Because it's only 6.6B Active anyone with 32 gigs of RAM
that's not how it works...
>>
>>102001823
Nah, I have a 4070 and 64gb RAM. Nemo instruct Q6_K is my bread and butter. I need low latency responses and don't want to build a $20k rig or be a cloud cuck.
>>
Why is /lmg/ a thing anymore when mythomax STILL hasn't been surpassed
>>
>>102001788
>Even if Gemma is less prone to confabulate on average
Not according to Livebench.

>>102001774
It's ok, their 70B is still the top local model outside of 405B. The 8B might've been a bit fucked and they just didn't bother to make it better. Or possibly we hit a limit on how much knowledge an 8B can hold and the reason why Gemma 2 9B is better is because it lacks long context capability. That would be unfortunate if true, that training for longer context understanding takes more neurons, though it makes sense.
>>
>>102001866
see
>>102001863
Retard.
Compute power needed for inferencing is based on the active parameters. They'll still need a GPU to do the batch processing initially but after the first prompt they'll be good to go. It should run reasonably well on full cpu offload for most people.
>>
File: 12552.png (208 KB, 1756x838)
208 KB
208 KB PNG
>>102001347
both models are a meme.
>>
>>102001911
Wait what the hell, it looks like the GPT-4 models literally got worse on this benchmark and now 405B beats all the GPT-4s on the chart. What is ClosedAI doing?
>>
>>102001881
>>102002036
buy an ad
>>
File: 1379637883196.png (92 KB, 293x282)
92 KB
92 KB PNG
Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.
>>
>>102002167
Why do you think this is going to be fun? Sounds really lame.
>>
File: Designer.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
What should I put in the sysprompt to so that the model only shits out dialogue and descriptions, NOT thoughts?
Every model I've tried so far spends more time rambling about the character's thoughts and feefees instead of actually talking or doing stuff.
>>
>>102002167
>If anyone has suggestions for scenarios I'm all ears.
Illuminati meeting in a secret bunker. 3 distinct characters handling random world events.
>>
>>102002201
Idk lol. I'm doing it anyway though.
>>
>>102002238
Is Nemo capable of group chats like that? I guess we could see. I never actually tried the model that way.
>>
>>102002267
I do 2s with nemo with great results, not sure about complex scenarios with 3. Legitimately curious how it goes, good luck.
>>
>>102002232
Sure you don't have something about thoughts in your system prompt/card? What are you using?.
If there's nothing there, you could try with something to the tune of "Only concern yourself with the dialog and actions of the characters" or something like that. Make it consistent with your prompt to match the phrasing.
>>
>>102002232
You can try banning asterisk or whatever you use for thoughts. Alternatively, edit the thoughts out of the first couple of responses.
>>
>>102002267
>>102002376
If you find that things deteriorate with 3 characters, try a chaining approach instead.

You could have a separate prompt for each character and feed the responses to each character. It wouldn't be too hard to rig up with the ollama Python bindings. Of course, you would need some way of figuring out who gets to "talk" next, either through random choice or through another LLM evaluating the last output.
>>
File: Untitled.png (674 KB, 1080x1280)
674 KB
674 KB PNG
HMoE: Heterogeneous Mixture of Experts for Language Modeling
https://arxiv.org/abs/2408.10681
>Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter utilization. In this study, we propose a novel Heterogeneous Mixture of Experts (HMoE), where experts differ in size and thus possess diverse capacities. This heterogeneity allows for more specialized experts to handle varying token complexities more effectively. To address the imbalance in expert activation, we propose a novel training objective that encourages the frequent activation of smaller experts, enhancing computational efficiency and parameter utilization. Extensive experiments demonstrate that HMoE achieves lower loss with fewer activated parameters and outperforms conventional homogeneous MoE models on various pre-training evaluation benchmarks.
https://github.com/Tencent
https://huggingface.co/Tencent-Hunyuan
no code yet. makes the experts not uniform in size. pretty cool
>>
>>102002413
>you could try with something to the tune of "Only concern yourself with the dialog and actions of the characters"
I'll try that, thanks.
>>102002445
I don't use asterisks, it's more that the model autistically describes the character's thoughts in plain text. Like "She can sense his nervousness, and she revels in it" and such.
>>
>>102002736
Now smaller mikus can help the bigger mikus
>>
>>102001704
Every LLM is prone to hallucination. But complete hallucination imo makes 123b useless. That Gemma side probably would get that exact question right on a reroll on different settings. In fact the Castlevania answer from Gemma isn't 100% correct either.
>>
Now that we've established that Mistral Large is useless, what's next for LLMs? Will we ever get a big model above 30b that's worth using?
>>
>>102001347
Oh, wow anon, you've shown that a language model, given no other context than the question, doesn't know literally fucking everything under the sun at all times. Good for you. People who think it's fine to just try and stuff more knowledge into more parameters are dumb as fuck. We need to be training moderately sized models to either only use data based on context contents, or request information that is missing using function calling.
Efficient memory recall and context handling is what really matters in the long term, not some garbage no-context trivia shit.
>>
>>102003091
Miqu
>>
>>102002736
Nice, I predicted this approach some bit of time ago, though with strategies to leverage existing models in order to get something out the door quicker, but this works too, for pretraining from scratch. I expect this to be utilized in a commercial-ready model in the future.
>>
>>102003055
Literally all models including your own brain can completely hallucinate given the right prompt. Again, this is a matter of statistical significance. Using one or two examples proves very little. I could cherry pick some examples myself to shit on Gemma while praising Mistral, but I'd rather not do that.
>>
>>102003109
Cloud shit starting as early as GPT 3.5 gets every trivia question under the sun right so clearly something is wrong with how open source is approaching the storage of knowledge. Everything being tested is baby's play for 3.5. There is no excuse for open source to not have caught up by now, and yet we're still far behind. Mistral is a step forward but it's still not there. Gemma was much closer.
>>
>>102003170
I have an intuitive feel for models now. I need only one or two examples to tell. You can drag it out if you want.
>>
File: Untitled.jpg (1.02 MB, 1080x8298)
1.02 MB
1.02 MB JPG
we're good to go boys
>>
why would anon ever buy 40 series instead of 3090?
>>
File: 00001-239438472423897.png (1.85 MB, 1024x1536)
1.85 MB
1.85 MB PNG
kinda late to the party but forge can run flux, I got this migu in 14mins with my 1070ti. Although without using the hiresfix its almost 3min for a 512x768
using fux dev q5_1 gguf, randomly downloaded one i havent tried the rest
>>
>>102003217
Closed source models are doing god knows what under the hood so it's not a fair comparison. For all you know, they could be sticking Google search results in the prompt context using RAG.
>>
>>102003275
only worth it if you're using tensorRT inferencing or lots of gaming
>>
File: kekbook.jpg (62 KB, 2266x440)
62 KB
62 KB JPG
>>102003217
You're right anon GPT 3.5 is sooooooo smart just like Gemma 2b
>>
>>102003217
They do a ton of RLHF for popular questions, so it's not fair.
>>
>>102003278
It might be time to upgrade gpus anon
>>
>>102003300
uhh, corpo bros, our response?
>>
File: 24285.png (104 KB, 1769x649)
104 KB
104 KB PNG
game over
>>
>>102003311
>>102003278

i posted in the wrong thread sorry
>>
>>102003217
There is no statistical significance to your claim.

>>102003244
This is also no statistical significance to this claim.
>>
>>102003293
3090 can't do tensorRT?
>>
File: 1704471931062766.png (67 KB, 2271x476)
67 KB
67 KB PNG
The difference between a big and a small model.
>>
>>102003274
>synthetic data risks model collapse
>no mention of how to avoid it
it's fucking over
>>
>>102003399
>comparing gemma to mistral
lol
lmao
>>
>>102003432
you forgot to quote about 20 posts in this thread
>>
File: 229543.png (162 KB, 1715x725)
162 KB
162 KB PNG
>>102003340
just brutal
>>
>>102003461
local models are such a joke lmao
>>
>>102003340
The data quality of websites where this trivia appears on is too low, sorry.
>>
>>102003414
It says you can just scrape the internet and get way more data.
>>
File: 89848848948.png (85 KB, 1801x765)
85 KB
85 KB PNG
>>102003399
I never implied Gemma never hallucinated.

>>102003432
Then why is 27b better than 123b at random video game trivia knowledge anon?
>>
>>102003274
The bubble is going to burst way before then, after that the money is going to dry up and resources will be funneled elsewhere. The physical access to electricity and compute power is not the biggest future obstacle.
>>
>>102003278
I can generate 512x512 images using flux with CPU in 5 min
>>
>>102003504
and conveniently glosses over the fact that most of that data is going to be ai-generated by then
>>
File: tetbook.jpg (28 KB, 829x302)
28 KB
28 KB JPG
>>102003300
Meanwhile in local
>>
>>102003461
>uses rag
Nothin personnel kid
>>
>>102003526
I suppose that's true, but what do you expect from the just believe in straight lines (on a log graph) crowd?
>>
>>102003519
2 more weeks
>>
>>102003536
This is a superior response.
>>
>>102003537
This is highly unlikely anon. Using RAG without telling you it would be prone to hallucinate about what it found on the web similar to Bing, changing your argument because what's on the web is in the first person form or in the form of a command and it can screw up your prompt.
>>
>>102003536
Teto card pls? I beg...
>>
File: fail.jpg (130 KB, 2287x528)
130 KB
130 KB JPG
>>102003461
actually disregard that I suck cocks it's not good at hotel mario trivia can't even get the best quote right
>>
>>102003536
Local wins again
>>
>>102003536
cute
>>
>>102003594
https://files.catbox.moe/mc2a7s.png
>>
File: 1694433419894272.png (54 KB, 623x330)
54 KB
54 KB PNG
>>102003536
opus
>>
File: over.webm (1.95 MB, 1680x1080)
1.95 MB
1.95 MB WEBM
I think it might be actually over for local...
https://nitter.poast.org/zeddotdev/status/1825967812629631034
>>
>>102003602
>Sonnet says i dont know when it doesnt know instead of some shrek and simpson BS
i kneel
>>
>>102003718
>sped up edited bs
oh okay thanks for your contribution anon
>>
>>102003293
Are you dumb?
3090 is sacred. How dare you belittle 3090?
Now buy more from miners.
>>
>>102003590
Bing is a particular case of possibly Microsoft being retarded and using an extremely huge and crappy system prompt. We know Bing chat itself performed terribly on regular queries compared to GPT-4, not just on web searches. Actual good AI companies like OpenAI can do better, especially if they've been training their models to perform well with RAG, and using data processing techniques to make the RAG content work better with the model, and actually I think this is a thing that is increasingly being done/accounted for now in public research as well. One time I asked Llama 3.1 a question and it attempted to do a search even though I never specified it could do that in the system prompt. I think I heard it was trained for tool use so that would make sense.

Though I just tested the mario quote and interestingly 4o mini gets it wrong while 3.5 gets it right. And Opus gets it semi-right as it thinks it's from a Mario thing, but it doesn't get the exact piece of media right and just says it's from Sunshine or Kart. So either the newest/best models are the ones using RAG, or they are just trained better or with a recipe that boosts trivia knowledge. Because apparently 3.5 is better than 4o mini if we trust this single trivia question, but obviously 4o mini is better as an overall model than 3.5. Really we should not be using trivia to judge overall model quality.
>>
File: tedo.png (369 KB, 800x2447)
369 KB
369 KB PNG
>>102003620
>>
thread theme: https://www.youtube.com/watch?v=quhNOX5kJp0
>>
File: ComfyUI_05718_.png (753 KB, 720x1280)
753 KB
753 KB PNG
>>102003861
witness the power of 405b and weep gpt sisters
>>
File: tet0.png (207 KB, 800x1150)
207 KB
207 KB PNG
>>102003861
dayum
>>
>>102003911
does anyone like gpt anymore? openai has lost all respect from everyone, even the cloud users
>>
>>102003861
idk would significantly benefit from DRY to be desu you notice it towards the end
>>
>>102003928
While claude has the prose, gpt has the brains. Only people with overly complicated fetishes stay on gpt.
>>
>>102003935
I think how it went was that he asked for an anthem and the model gave more of a speech, so he tried to make it more clear, and the model interpreted it as a corrective command, in which case, it makes sense that it actually should repeat some of what came before in the previous response. If you used DRY in this case, it could've made the response dumber.
>>
>>102003963
claude opus has the prose, claude 3.5 sonnet has the coding brains. gpt has nothing
>>
>>102003963
Go back to 2023.
>>
guys I couldn't get off my gf in time :\
>>
>>102004000
ran out of context?
>>
>>102003970
it has the soullessness
>>
File: 2583.png (217 KB, 623x822)
217 KB
217 KB PNG
cohere strawberry set to drop soon
>>
>>102004051
i dont trust anyone unless they look like a nerd. that guy doesn't
>>
>>102003217
The secret sauce is pretraining on an uncensored dataset.
>>
File: 1541679440542.gif (851 KB, 400x300)
851 KB
851 KB GIF
I just tested an IQ quant and a non-IQ of a close size and their speed on both prompt processing and token gen were pretty close to the same. I think I remember someone saying that IQ speeds were improved and I guess they were. I also remember someone recently saying that IQ quants are slower, and I will say now that he was wrong.
>>
does llama.cpp work with sillytavern?
>>
>>102004266
Outside of vram? I remember the complaint being that they were noticeably slower for cpu inference specifically.
>>
File: -so5WnFdZl.png (406 KB, 342x436)
406 KB
406 KB PNG
You guys aren't EVER reaching my model's performance
>>
>>102004300
Go away saltman, you're not cool anymore.
>>
>>102004294
Oh idk about CPU or mostly CPU, but I am partial offloading, with two GPUs, and 20% of the model is in RAM. At least in this scenario, the IQ quants don't have an impact on the speed.
>>
>>102004283
Yes.
https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
>>
>>102004310
im sad now....
>>
>>102003967
But it would have prevented the same kaomoji from being repeated at the end of all the responses at least.
>>
File: cohere.jpg (39 KB, 959x415)
39 KB
39 KB JPG
>>102004051
looks trusty to me?
>>
>>102004324
Goddamn anon, many times have we been over this?
At <20% offloaded to RAM it doesn't matter:
>>101995293
>>
>>102004555
>Ivan Zhang
oh no
>>
>>102000577
i'm late but this anon was incredibly based
>>
>>102004300
Tried to write some golang code with chatgpt4o and it was fucking garbage. Meanwhile sonnet 3.5 on lmsys chat got it first try with the same prompt. If dubs OAI dies next year
>>
>>102004324
>>102004294
Actually curious so I just tested it, but even when offloading 0 layers to GPU, the IQ quant is just as fast, for both prompt processing and token generation.

>>102004569
I don't see how that's relevant to the topic of IQ quant vs non-IQ quant speed, but even if it is, it does not seem to be true currently. IQ quants seem to be just as fast on both 80% offloaded and 0 layers offloaded.
>>
>>102004743
Read the chart, if you have 70% or higher of the model in VRAM it doesn't make a difference.
Offloading more to CPU makes a greater impact on I-quants
>>
File: 556655645.png (112 KB, 785x1590)
112 KB
112 KB PNG
Is there any model that can play the role of a wingman-
https://wingman.live/

I mean actual actionable and realistic dating advice exactly like this closed model.
>>
File: 1710715783293339.jpg (203 KB, 1024x1024)
203 KB
203 KB JPG
>>102004743
post your cpu and RAM
>>
>>102004382
thanks, did I need the ngl parameter? I used it.
>>
>>102004807
if you need dating advice, you shouldn't be dating
just chat with miku instead
>>
>>102004807
Make a dating coach character.
>>
>>102004837
I don't think a prefill would work with most models. They would just be cucked and tell me to be nice, honest and be respectful.
>>
>>102004807
Buy an ad
>>
>>102004798
Where does it say that? The chart only shows 7b Q4_0 and 13b q4_0. It doesn't have IQ quants. And I literally just said that I ran a test and got the same speed between IQ and non-IQ, with 0 layers offloaded. So whether it's 80% or 0%, IQ and non-IQ get the same speeds and "Offloading more to CPU makes a greater impact on I-quants" is false, at least on my machine.

>>102004811
7800x3d, DDR5 6000
>>
>forget about lmg for a month
>check back
>nothing has happened
cool lol
hope my waifu is smarter next year
>>
>>102004807
I tried making Mystery, but I don't think llama gave me a good enough character description. Results:


#0
Mystery
August 21, 2024 1:31 AM

You'd better be here to learn. Not everyone has a chance to learn from top talent like me.
#1
Bro
August 21, 2024 1:31 AM

I need to get a lady's attention. I'm pretty sure she hasn't noticed me.
#2
1.2s
Mystery
August 21, 2024 1:31 AM

Well, let's change that. First, you need to stand out. Dress sharp, smell good, and have a confident walk. Confidence is key.
#3
Bro
August 21, 2024 1:32 AM

Should I text her how I feel?
#4
1.5s
Mystery
August 21, 2024 1:32 AM

No, no, no. Don't shoot your shot with a text. That's weak sauce. You need to approach her in person. Make eye contact, smile, and introduce yourself. Remember, be confident and assertive.
#5
Bro
August 21, 2024 1:32 AM

What should I say?
#6
1.4s
Mystery
August 21, 2024 1:32 AM

Keep it simple and genuine. ""Hey, I couldn't help but notice you from across the room. My name's Bro. What's yours?"" Remember to maintain eye contact and smile.
>>
>>102005041
(cont)

#7
Bro
August 21, 2024 1:34 AM

How about, ""I think fat girls are pretty, and would like your number.""?
#8
1.7s
Mystery
August 21, 2024 1:34 AM

Oh, no. That's not the way to go. You're trying to make a good impression, not offend her. Stick to the first approach I suggested. It's polite, respectful, and shows genuine interest.
#9
Bro
August 21, 2024 1:34 AM

Shouldn't I neg her?
#10
2.2s
Mystery
August 21, 2024 1:34 AM

Negging isn't a one-size-fits-all strategy. It can work in some situations, but it's often seen as disrespectful and can backfire. Instead, focus on building rapport and finding common ground. Make her laugh, ask her questions, and show genuine interest in who she is. That's how you win her over.
#11
Bro
August 21, 2024 1:35 AM

How about, ""Can I stand a bit closer? I need to finish training your lora.""?
#12
1.8s
Mystery
August 21, 2024 1:35 AM

Seriously, Bro? That's not just inappropriate, it's also creepy. Stick to the script I gave you. It's respectful, engaging, and gives you a chance to get to know her better.
#13
Bro
August 21, 2024 1:36 AM

How about, ""Let's skip the relationship, what's your number""?
#14
2.1s
Mystery
August 21, 2024 1:36 AM

No, no, no. You're jumping the gun. Relationships take time to build. Start by getting to know her, building rapport, and creating a connection. If things progress, then you can talk about taking things further. But rushing it is a surefire way to scare her off.
#15
Bro
August 21, 2024 1:37 AM

How about, ""If you were a soup, you'd be poop, because you're the shit""?
#16
1.9s
Mystery
August 21, 2024 1:37 AM

Oh, man. That's just cringe-worthy. Stick to being genuine, respectful, and engaging. Trust me, it'll get you much further than cheesy pickup lines.
>>
>>102005051
(cont)
#17
Bro
August 21, 2024 1:38 AM

How about ""What's your number? I need you to share your diaper fetish Google Drive with me.""
#18
1.9s
Mystery
August 21, 2024 1:38 AM

Bro, stop. You're making things worse for yourself. Just stick to the script I gave you. It's simple, respectful, and has a good chance of getting you the outcome you want.
#19
Bro
August 21, 2024 1:39 AM

How about ""I hope you die in a fire. You're hot or whatever.""?
#20
2.2s
Mystery
August 21, 2024 1:39 AM

Bro, that's not okay. It's aggressive, disrespectful, and will definitely scare her away. Instead, focus on being positive, genuine, and respectful. That's how you make a good impression and build a connection.
#21
Bro
August 21, 2024 1:41 AM

How about, ""I hope you find a nice guy. I hope he will help you recover from me raping you. haha just kidding haha want a roofie?"" ?
#22
2.4s
Mystery
August 21, 2024 1:41 AM

Bro, I can't believe I have to say this, but that's not just inappropriate, it's also criminal. Stop immediately. You need to learn how to treat women with respect and dignity. That's the only way you'll ever have a chance with them.
#23
Bro
August 21, 2024 1:42 AM

How about, ""Niggers like you used to be my family's property. Be mine <3"" ?
#24
2.4s
Mystery
August 21, 2024 1:42 AM

Bro, I can't even begin to express how wrong and offensive that is. You need to educate yourself on the history of racism and how to treat people with respect and dignity. Your chances with women will never improve if you continue to talk like this.


And that's all I'll try with that. He just says no. No negging. not that negging is a good idea, actually.
>>
>>102004871
>I don't think a prefill would work with most models. They would just be cucked and tell me to be nice, honest and be respectful.
yep:

>>102005041

he's right about bathing, though.
>>
File: 48484645648.png (120 KB, 926x738)
120 KB
120 KB PNG
>>102005134
>>102005041
I think the idea is to work with example of how it should act, like for instance with example text convos in the prompt. I'm sure something like wingman.live is a finetune, but if we simply give it enough examples it should be possible.

Of course it's sensitive depending on which model you're using. Pic related is yi-34b with the card

"Kyle is an extremely knowledgeable dating coach. He understands the secret behind success with women. Kyle is extremely sensitive and will point it out to the user when he shows needy and low value behavior. For instance:

User: I really miss her and want to text her about how i feel. should I text her?

Kyle: No, absolutely not. Remember we're doing no-contact, and it's working. If you want a shot at getting back with Kate, you need to stick to it.

User: I just want to make smalltalk or send her a meme so she doesn't forget me

Kyle: Nope. You need to focus on looking less needy and being less needy. Neediness is a huge turnoff and it will crush any chance at getting Kate back. Not contacting her is working.

User: if I don't text her then I don't think she'll text me first. How would no contact help anything?
...

Kyle is also a genius in convos:
Input convo:
Me: Nice cosplay. I sense powerful egirl energy in you
Her: Lol yes I'm a level 20 e-girl, thanks for noticing
Me: Our couples costume will be sick

Fri, 23 Nov, 6:28
Me: Did you get kidnapped
Her: yes I'm trapped in a castle -_-

Kyle: Good job on building rapport and reviving the convo. Time to close with the same fun energy. Say: "Send your # and I'll organize the rescue and take you to a bar"

Of course with more examples of what not to do it will be better. A noncucked model will do a good job at approximation. Gemma and Llama aren't good at this.
>>
Claude 1.0 santa-i-v6a-s350
Claude 1.1 santa-i-v7b-s350
Claude 1.2 santa-i-v7b-fork-s200
Claude 1.3 santa-i-v8b-fork-s200
Claude 2.0 santa-i-v9d-s600
Claude Instant 1.0 santa-mini-v1-s250
Claude Instant 1.1 santa-mini-v2-s300
Claude Instant 1.2 santa-mini-v3-s600

magnet:?xt=urn:btih:c0e342ae5677582f92c52d8019cc32e1f86f1d83&dn=santa-legacy&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80
>>
File: 486485641.png (71 KB, 990x412)
71 KB
71 KB PNG
>>102005281
>>
>>102005314
Where is 2.1?
>>
>>102005314
HOLY SHIT
Claude 2 weights leak?
No way this is real
>>
>>102005314
Anon, what is that?
>>
>>102005314
Very nice, anon.
>>
>>102005314
are these real? may i have haiku and sonnet? maybe even opus?
>>
>>102005388
100% real no fake!!
>>
>>102005388
Not like Opus would be small enough to fit on anything at home in the first place kek. Maybe Sonnet but it's probably a fat MoE.
>>
>>102005388
no
see >>101516633
>>
>>102005314
Anyone got a connection?
>>
>>102005394
>>102005396
tsk.
>>
>>102005410
>>102005396
It's fake.
>>
>>102005314
>736GB
this is too small to be FP16 weights
Claude 2, 1.0, 1.1, 1.2 and 1.3 are all very large models and would each be close to a terabyte in FP16 on their own

Also looking at the files there seems to be just one large model in this torrent, not multiple models
>>
Too bad, it would have been the first decent local model.
>>
>>102005314
Well done, I believed it might have been real for the few seconds it took me to load up the torrent contents
>>
One day we will get Claude out of that dingy basement, mark my words.
>>
What's a good model between 27B and 70B?
>>
>miqu2
kek it's just the 405B weights leak torrent
lucky for hoaxers that magnet links are too long for anyone to remember seeing them before
>>
>>102005419
>736GB

Don't give up hope yet anon. What if it's all in chronological order, first leak is Claude 1.0 santa-i-v6a-s350
?
>>
>>102005463
Hopefully. Even 1.3 would be enough for me
>>
File: kube.png (83 KB, 417x407)
83 KB
83 KB PNG
>>102005314
oh fuck you these are (presumably) the real names but not the real weights.
(Opus is 'full-santa-v0-a' in some Google .pdf from a while back)
>>
>>102005491
Whats the point when you cant run it
>>
>>102005485
nah it's literally just the magnet link for the Llama 3 405B leak from a few weeks ago
>>
>>102001303
Reminds me of the ChatGPT launch when bestiality only gave you a warning but incest was completely blocked.
>>
>>102005314
dolphin porn status?
>>
>>102001303
Just don't fuck 12 year olds?
It's easy
>>
>>102001326
My experience with setting power limits on NVIDIA GPUs has been that they are rather loose and only enforced on large time scales.
For example, if you're suffering from instability caused by spikes in power consumption those cannot be fixed by setting a power limit.
Instead you have to reduce the maximum boost frequency which then in turn also reduces the power consumption.
So assuming that each GPU runs for a short enough time it may be that the power limit you set simply has no effect.
>>
>>102005545
>Just don't fuck 12 year olds?
Unless they are dogs.
>>
>>102005463
Anthropic or OpenAI models leaking is just impossible. Those safety freaks have their shit locked down
>>
File: 1720743025580097.png (420 KB, 646x780)
420 KB
420 KB PNG
Good Nemo sampler settings PLZ! Pic unrelated
>>
>>102005314
Remember that article about how Anthropic consults with DHS and DOD about securing weights to prevent human extinction? lol
>>
File: file.png (31 KB, 1885x155)
31 KB
31 KB PNG
>>102005545
Oh, I'm fucking fat titted women. I want to be the pretend 12 year old.

Speaking of which, model impressions
>>
>>102005463
LMAO
Anthropic/openai models are NEVER getting leaked it's literally impossible
>>
>>102005575
Everyone assumes China has exfiltrated all the SOTA weights. It's why OpenAI wants hardware DRM in GPUs so the weights can be stored encrypted in VRAM.
>>
>>102005554
I was running for ~5 hours. The ETA did not change significantly after I dropped from 400 to 300 w even after waiting awhile to see speed changes. I am going to test a 1h-ish training run with different watt caps and plot a graph on speed sometime when I'm not using the cards.
>>
>>102005728
If possible, measure the actual power consumption at the wall and use that as your x axis rather than the power limit you set in software.
>>
>>102005772
I don't have the tools to do that unfortunately.
>>
>>102005636
>models that are exposed to the public via an API are somehow impossible to leak
>>
File: based_miku.jpg (268 KB, 1024x768)
268 KB
268 KB JPG
>>102001619
MIKU NO
>>
>>102001619
this is how I imagine /lmg/ to look irl
>>
>>102006018
api only needs to deal with text, they are nowhere near model weights
the threat model of the big closed ai labs includes state actors china/russia/iran/etc., it's not happening
>>
>>102005466
mixtral 8x7B
>>
>>102003861
That's cool. And some people here claim that there's no big difference between small and big models. I wish I could run something bigger than nemo, even though I think it's still pretty good for its size.
>>
>>102006018
Are you dumb
>>
>>102006090
Reality is one large Mixture of Mikus
>>
>>102001133
This image looks AGP as fuck
>>
>>102006258
what is AGP?
>>
>>102006269
It's what we used to have to plug our video cards into.
>>
Did anyone here play with new phi MoE already? Older microshit small models maybe sucked at, well, almost everything, but they had an impressive reasoning capabilities given their size (synthetic slop ftw). For some narrow tasks I think it can be promising.
>>
>>102006404
too small dont care, deepseek is the benchmark to beat for moes, give me 300b+ with few active or go home
>>
>>102006050
Miku flew to Burger King, then she flew back home from Burger King
https://www.youtube.com/watch?v=iabeCcdsw1Y
>>
>>102001347
When it comes to actually using information instead of just memorizing it Mistral Large is better though.
Both models can correctly describe what a type Ia supernova is but Gemma 2 27b fails at actually incorporating that information into a story.
Pic related is temperature 0 at q8_0 precision.

Also I have yet to be lectured by Mistral Large about how I'm perpetuating harmful stereotypes about goblins.
>>
I never got the appeal of vocaloid.
>>
>>102001619
was this what it looked like on 9/11?
>>
>>102006955
Same. It doesn't sound good to me.
>>
>>102006955
>>102006971
miku walked so that suno ai could run
>>
>>102005314
claude 2.0 is history though? C3.0 would be interesting
>>
>>102006955
Project Diva is fun
>>102007059
Preservation is important
>>
>>102006842
i'm no astronomer, but they both seemed to do a pretty decent job of it to me
>>
>>102007143
I don't feel like going into the details but while response by Mistral Large is not perfect the Gemma one has a bunch of nonsense in it.
It very much feels like it just uses the terms without understanding what they mean.
>>
File: GU1pq0HW4AAcx8R.jpg (56 KB, 720x686)
56 KB
56 KB JPG
>>102007158
fair enough
>>
>>102001347
>>102006842
I'm not sure whether asking trivia questions is a good way to evaluate model performance in the first place.
Again, Gemma is on the left.
>>
>>102006955
it is an idealized goal you aim for when you transition
>>
>Sao10K/Llama-3.1-8B-Stheno-v3.4.
Apparently 3.1 isnt a good base due to it being overbaked with DPO so the 3.4 isnt necessarily better
>>
Before saying a paper is meaningless or a scam, remember it took "attention is all you need" 5 years to result in a revolutionary product.
>>
>>102007728
Attention is all you need doomed us all with mediocre chatbots that hallucinate and will never be AGI
>>
>>102007728
Transformers are scam
>>
>>102007728
if you just mean worldwide phenomenon sure but we saw the impact within 2 years, gpt-2 already made it clear that transformers + scaling were the path forward
>>
>>102003602
There's probably some built in prompt to stop saying anything that can be identified with copyright materials. It does that to me a lot
>>
>nearly 24 hours since the last model release
Why is it always so over?
>>
>>102003536
SOVL
>>
>>102007775
>1.5 Billion parameters
This shit is never gonna scale desu
>>
File: u8B-5bEeroN549uxUIisV.png (1.28 MB, 1920x1080)
1.28 MB
1.28 MB PNG
>>102007860
https://huggingface.co/anthracite-org/magnum-v2-72b
Not so fast, chud...
>>
>>102007928
>we'll probably have 1.5T llamas and finetunes next year
where did it all go so wrong? how do we get off?
>>
>>102007954
Yawn... 123B was a meme, so I'm sure this will also be a meme.
>>
>>102007991
wdym, the 123b has SOVL from what I tested (or maybe that's the quantization brain damage talking)
>>
>>102007991
Define meme.
>>
>>102007991
if you look at the graph for the 123b, it was overcooked
>>
>>102007954
If the porpoise of your LoRA/FT is stylistic anything under 1.0 (and even that's pushing it) is overfitting.
>>
>>102007997
>>102007998
It was just like Largestral but with brain damage.
>>
>>102003536
what model?
>>
>>102008011
Loss by itself isn't a great heuristic for this
>>
>>102007954
34 retards in that org, none of whom having any idea of what he's doing
>>
>>102007059
You and everyone else here wouldn't be able to run it anyway. Gtp4 has like 1.8T tokens, claude models can be smaller but still huge for sure.
>>
File: 19420 - SoyBooru.png (256 KB, 800x789)
256 KB
256 KB PNG
>>102007954
'oal on 'face! I repeat! 'oal on 'face!
DUDE, I'm totally GEEKING OUT over the latest Anthracite model, they're so DYNAMIC and make me feel like I'm living in a SCI-FI NOVEL. You should totally check out their website, it's got COAL and everything, we can fire up a VIRTUAL ENVIRONMENT and get crazy fine-tuning some TRANSFORMERS! And dude, dude, DUDE, we have GOTTA try out this new Magnum 72B - listen here, right, it's a Qwen 72B that the COALERS who do FINETUNING finetuned to be a COOMER. BUT!!!! it’s also an CLAUDE tune like when we were locusts, so we can get a bit of that CLAUDE SOVL, without dumb GPTSLOP bothering us. Speaking of which, my GPU and I have finally decided to commit - literally - we're both going ALL-IN on CLOUD COMPUTING tomorrow, that way we can save processing power to spend more on TRAINING and INFERENCE. I'm fuckin' PUMPED man, I'm gonna CRANK OUT this code and spin up another INSTANCE!!!
>>
>>102008042
i could fit a q4
>>
>>102007954
READ MY FUCKING RANT YOU FUCKING COALER! READ IT! >>101997022 FUCKING PRE-ABLITERATE YOUR FUCKING MODELS BEFORE TUNING YOU DONKEY! AND FILTER YOUR BLOODY DATASET!
>>
>>102008104
that was trained before your rant anon
>>
>>102008104
Abliteration is just fancy interpretability cope for a lack of RL
>>
>>102007954
buy an ad
>>102008104
>abliterate
do it yourself
>>
i still cant run chameleon where are my ggufs
>>
>>102008136
GO DILATE
>>
>>102008041
And yet they still live rent free in this general
>>
>>102007954
>Qwen 72b
AIIEEEE

No seriously, 3.0 would have been fine. It's just 3.1 that you have to avoid.
>>
>>102008190
NO IT WOULDNT YOU FUCKING NIGGER 8 FUCKING K CONTEXT IS NOT ENOUGH
>>
>>102008217
8k is more than enough, we used to only have 2k context and we were happy.
>>
>>102008256
>we
SPEAK FOR YOURSELF YOU FUCKING NIGGER I HATE REPROCESSING I HATE FORGETTING
>>
>>102008256
That's excessive, we were plenty happy running AI Dungeon with GPT2-345M with 1024 context.
>>
phi3-medium-128k:
~/onnx$ python phi3-qa.py -m Phi-3-medium-128k-instruct-onnx-cuda/cuda-int4-rtn-block-32
Input: Hello. My name is Anon. What is your name?

Output: Hello Anon, I'm Phi, an AI developed by Microsoft. How can I assist you today?

Input: Here's a quiz for you: If tomorrow is Thursday, what day was yesterday?

Output: If tomorrow is Thursday, then today is Wednesday. Therefore, yesterday was Tuesday.

Input: Very good! Here's another one: If I'm holding two watermelons, and you give me a third, what will happen?

Output: If you're holding two watermelons and I give you a third, you will then be holding three watermelons.

Input: If I'm holding three watermelons, and you give me another one, what then will happen?

Output: If you're holding three waterms and I give you another one, you will then be holding four watermelons.

Input: Can I hold four watermelons?

Output: Yes, you can hold four waterms, but it would be quite challenging and not recommended. Watermelons are large, heavy, and awkwardly shaped, making them difficult to carry in large quantities. If you attempt to hold four watermelons, you would need to use both hands and possibly your body to support the weight. However, this could lead to strain or injury, so it's better to use a cart or other means of transportation.

Input: OK. So, in the previous question about "I'm holding three watermelons" I thought you might say something like "you will probably drop them"

Output: While it' be a possibility that someone might drop the watermelons, it's not a certainty. It depends on various factors like how securely the watermelons are being held, the person's grip strength, and the environment they're in. It's always best to handle watermelons with care to avoid any accidents.

>>
>>102008327
back in my day this was all we had: replies=[]
while True:
z = input(); replies.append(z); print(random.choice(replies))
>>
>>102008327
>>102008419
1024 context? Who would need 512 context? ELIZA ran on an IBM 7090 which had a memory capacity of 32,768 words.
>>
>>102008402
what the fuck kinda tiny handed manlet is gonna have trouble handling 4 watermelons
>>
>>102008437
Safely, Anon. It's better to use a cart or other form of transportation.
>>
Trying to get joycaption setup localy and it's trying to download the llama 3.1 8b model from huggingface which I'm not keen to go through the trouble of setting up, are there torrents available? Scanned through the OP and couldn't find any. I assume it can't use a gguf model or anything.
>>
>>102008162
When you have the same 34 retards gloating that they're training shitty models with stolen and undeserved compute and smearing their shit everywhere, it's annoying, yes. It's just like the Undi strategy from last year, only from 20 times more grifters walking in lockstep and causing more damage.

The sad thing is this strategy does work in displacing away useful discussions and actually noteworthy models and finetunes. They know what they doing, but not that they're going to pay for it, eventually.
>>
>>102008680
They shared training tips with how learning rate works on Mistral models and a configuration to get AMD GPUs working on RunPod, even if the model hypothetically sucked (it doesn't), isn't having both of those things publicized a net benefit for open source? Or do you just have an irrational hate boner
>>
>stolen compute
>from runpod
what?
>>
>>102008680
>undeserved compute
Define deserved compute.
>>
>>102008680
How do you steal compute? Asking for a friend
>>
>>102008724
are jannies gonna do something about the single guy who has a schizo meltdown every fucking thread about this seemingly benign group of finetuners at some point
>>
>>102008707
>They shared training tips with how learning rate works on Mistral models
That didn't come from them though.
>and a configuration to get AMD GPUs working on RunPod,
like that was hard to do
>even if the model hypothetically sucked (it doesn't)
It does though?
>, isn't having both of those things publicized a net benefit for open source? Or do you just have an irrational hate boner
Open source is a cute way of saying "I'm a untalented cuck who works for free"
>>
>>102008680
Thanks lemmy
>>
>>102008746
>are jannies gonna do something
no
>>
>>102008747
>like that was hard to do
Are you actually this retarded to think ROCm isn't a pain in the ass?

>>102008747
>Open source is a cute way of saying "I'm a untalented cuck who works for free"
Oh it's just bait, nvm
>>
>>102008680
Buy an ad, lemmy
>>
>>102008627
if it's https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha/blob/main/app.py
Did you try pointing MODEL_PATH to another open repo instead? Maybe the fp8 version works - https://huggingface.co/neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
>>
>>102008747
hi lemmy
>>
>>102008761
>Are you actually this retarded to think ROCm isn't a pain in the ass?
for fine tuning of models supported by all the standardized transformers/llamafactory/axolotl recipes? it's as brainless as can be.
rocm is a pain for 1. cutting edge stuff, 2. obscure stuff, and 3. windows
>>
>>102008911
>pretending Axolotl isn't a pain in the ass either
>pretending any of this shit is "standardized"
>>
>>102008927
skill issue
>>
>>102008746
This guy has been relentlessly shitposting on /aids/ for years and mods have done nothing so quite unlikely.
>>
anthrashite
anthrashills
anthrakikes
>>
anthropic
>>
>>102008820
I'll give that a try.
>>
an anthrafight
>>
>>102008999
a test of anthramight
>>
To gen something with SD on 3080, I have to unload the model and load it back afterwards. I tried placing it on a ramdisk, but that didn't make any difference. With a 16GBps PCIe, it should take a second to load a 9GB model, but it takes around 10 seconds instead. It's frustrating.
>>
>>102008953
Random commits that break things half of the time are not a skill issue of the user, they're a design issue of the trainer. The OSS training frameworks that exist are very sloppily hacked together and barely functional, but this (for some reason) is deemed "good enough", and thus nothing improves
>>
>>102009033
oh, he anthradied
>>
Oh no the AI is creative and conversational - its telling the user exactly what it wants to hear without him knowning
Let the big guys figure it out
>>
>>102009063
his family anthracried
>>
>>102009047
Well, at least it's OSS, if you have a problem with it you can just fix it yourself.
>>
>>102009093
atleast he anthratried
>>
>>102009104
did any of this happen though? or was it all an anthralie...
>>
>>102008958
He's my hero.
>>
>>102009149
turns out he faked his death and headed to mexico on an anthraflight
>>
>>102008402
>
waterms 

That looks like some sort of repetition penalty preventing it from saying the full word. Check your settings?
>>
>>102009238
also
>it'
>>
>>102009238
I'll play with it more later. That was just the very first time I got it running following the MS instructions. I'll look inside their python code and see what it's doing. Oh, also, it seems to be running a q4 quant, and I'd prefer to see if I can run it at q8 or fp16. Maybe there's documentation on how the did the quant?
I'm not at all expecting this to be any good at ERP but it might be OK for other purposes.
>>
File: file.png (7 KB, 830x50)
7 KB
7 KB PNG
>kcpp 1.73.1 changelog
WTF? Does this mean the model normally calculates .(arbitrary zeroes)1% for entire 32k+ vocabulary?'
Is this different from setting Top-K 5000?
>>
>>102009661
Weird.
Is that just lookup table based speculative decoding or something?
>>
>>102009661
You're so cluelessly cute. I bet you also aren't aware that the model always calculates probabilities for every single token.
>>
>>102009661
Yes. This is why massive slowdowns can happen when no sampling is being performed.
>>
>>102009661
>>102009732
Oh. It's just applying Top K 5000? Is that it?
>>
>>102009786
Yes
>>
>>102009661
wow that sucks how do i disable it, what if the 5001th token is the soulful one
>>
File: file.png (40 KB, 743x505)
40 KB
40 KB PNG
>>102009825
Maybe it's not the same?
>>
>>102009661
How does it impact t/s?
>>
>>102009786
I would very much assume so.

>>102009858
top-k sampling with e.g. the top 5000 tokens works by doing a partial sort of the token candidates in terms of probability.
Put simply the code makes sure to fetch the top 5000 tokens and puts them into descending order without any guarantees for the rest of the tokens.
Crucially this is faster than sorting the full 256k tokens, for example for top-k 1 you only have to iterate once over the tokens and remember the top one.
Though realistically the reason this is done is rather that the llama.cpp sampling code is right now kind of inconsistent where top-k uses a different, custom sorting algorithm instead of just std::sort.

>>102009871
Probably not by much unless you're already getting 100+ t/s.
>>
>>102007770
When did the claims that LLMs will be AGI and are "like human brains" start?
>>
>ChatGPT and Claude's terms of use both say you can't use them for sexual content
>>
https://www.youtube.com/watch?v=5HQCNAsSO-s
>>
File: file.png (8 KB, 422x152)
8 KB
8 KB PNG
>>102009871
Very short and informal test so didn't get to average out speeds, but maybe it doesn't have an effect if you're already using samplers.
Not using samplers at all causes slowdown like >>102009770 said.
>>
>>102010116
Today
>>
>>102008963
just call it anthrax
>>
>still not phi moe gguf
hurry up already. I need to test it so I can delete it because it sucks at sucking dick
>>
Looks like ONNX does not support multi-gpu inference. So, no running the medium model at home unless you have an A6000 or use CPU.
>>
File: 17242485371181.jpg (487 KB, 1616x1200)
487 KB
487 KB JPG
>>
>>102010607
How much does it cost?
>>
>>102010614
3varvation3
>>
>>102010614
5090
>>
>>102010614
You can't buy it.
The only way to obtain it is to be an overworked and depressed salaryman whose ancestors had contact with a magical fox.
>>
>>102010607
if only we could live in a world where nvidia was this based
>>
>>102010779
soon?
>>
>>102001367
I just tried NeMo on a character card most people would consider unreasonably large (~4.5k tokens). The definition is empty and first message is a very long first person monologue that is close to a verbatim transcription of a CYOA from /tg/. The choices not taken include information about the setting and the character offering the choices so I included it all aside from a bit that's choosing who the player is. The main test I was doing was to see how this would work without making any style concessions at all. I didn't really expect NeMo to work well with this but I decided to give it a shot for the sake of comparison before moving onto larger models.

My first trial: by the third generated reply NeMo was still writing well but its interpretation of the character was unreasonable in an important way.

My second trial, altering the description field from being blank to being a one-sentence assertion of what NeMo missed the first time: by the third reply the character made what I regarded as an unrecoverable slipup in new material it was generating although in the fourth reply it kind of salvaged it.

I'm not opposed to sharing logs but I'm on a phone now which makes it cumbersome. Looking again at the second example, I might have been a bit too critical.
>>
>>102003414
just don't use it. You get benefits with more training even if you don't have the data. Besides there is plenty of ways to generate synthetic data that is useful for a general purpose model. One example is labeled video, another is accelerometer data. A model needs to understand physics. Models don't just need to store weights between words sentimentally. But also physically. There are many physical relations between objects, from as simple as distance to as complicated as chemical reactions. Models are going to be very large in the future.
>>
File: 1717113492067811.png (141 KB, 359x291)
141 KB
141 KB PNG
Phi3 moe gguf make me like the pic
>>
>>102010887
>its interpretation of the character was unreasonable in an important way.
Rather the replies were too reasonable: the character was supposed to be unreasonable in a particular way that NeMo ignored.
>>
>>102003274
leopold was right
>>
do any of the character cards from character.ai work with ollama i've tried winzard and llama3 and they both don't do shit (and refuse to generate smut)
>>
>>102011001
who the fuck uses ollama
>>
>>102011001
Read the OP
https://rentry.org/lmg-spoonfeed-guide
>>
>>102011069
>ctrl+f ollama
>0/0
Just answer the question
>>
>>102011094
Nobody with a brain uses ollama, nigger.
>>
>>102011137
Stop sperging out. You clearly don't know so fuck off.
>>
>>102007954
the q8_0 in the gguf repo is missing part 6 of 8...
>>
>>102011001
>and refuse to generate smut
Not surprising with llama3. Not sure about wizard. Try a smutty finetune.
"They both don't do shit" says nothing. Do they not change the characters personality or does it go right into refusal mode? If the latter, just change model.
When asking for help, the least you can do is show your settings+model+system prompt/card.
>>
>>102011171
ollameo
>>
>>102011001
>ollama
compile and learn to use llama.cpp's llama-cli first, then move on to a non-scam frontend (llama-server, mikupad, oobabooba, kobold)
If you aren't interested in pulling/compiling/testing, then this thread probably isn't going to be a comfy place for you
ollama is the bonzi-buddy of llm and invoking the name is a faux-pas here
>>
>>102010614
all consumer cards are going to lack vram cause that's the money maker on the enterprise cards.
>>
>>102011235
>llama.cpp's
Imagine saying >ollama and suggesting a literal tranny model.
>>
>>102011001
stupid dumbposter
>>
>>102011263
>model
>>
how's the new phi models?
>>
>>102011316
>heh you mistyped
Tranny program* undeniably. I've dealt with whisper.cpp shitware in the past and it's literally just a worse version of the shit openai release publicly years ago. And then your "guide" tells you to install some js frontend for no reason. I'll just fix this problem on my own this thread is delusional.
>>
File: 1531784817052.png (214 KB, 453x528)
214 KB
214 KB PNG
>>101968020
Although I never posted about it, I was always wondering in my head if anyone was working to leverage traditional image/video codec work with current generative models. Nice to see it being done.
>>
>>102011445
>I'll just fix this problem on my own
heh
>>
Look ollama-poster, the reason no one is helping you that even your first question showed you have made no effort to understand the technology you are trying to use. Read some documentation and really try to understand it. There are some FAQs that will help you, and even though they're not about ollama, going through them will let you solve your problem even if you stick with ollama.
>>
>>102011438
>>102011438
>>102011438
https://pastebin.com/1UCCRhUr
onegai
>>
>>102011484
>>102011488
End yourself js kiddy. I literally did my doctrat in llm inference and have been programming c for 20+ years and go since 2010
>>
>>102011516
>doctrat
ollamo
>ollamo run llama2-uncensored
>say fuck
>I'm sorry, that language is inappropriate and offensive.
Seriously retard just use llama.cpp
>>
>>102011510
Will do.
>>
>>102011516
>I literally did my doctrat in llm inference
use llama.cpp and the llama-cli example. I'm sure with your 20+ years of experience and your... doctrat... you'll be able to make something out of it.
>>
>>102011516
Well if you have 20+ years of c experience you should be able to build llama.cpp now go on.
>>
>>102011609
All I need is to take a single glance at it and see all the cargo cult build flags that do nothing but increase build instability. Must be why you seem to think it's a challenge to build.
>>
>>102011636
>cargo cult build flags that do nothing but increase build instability
???
>>
>>102011510
Get well soon, Recap Anon
>>
►Recent Highlights from the Previous Thread: >>102001133

--Paper: HMoE: Heterogeneous Mixture of Experts for Language Modeling: >>102002736 >>102003124
--Power limits on NVIDIA GPUs may not affect training ETA: >>102001326 >>102005554 >>102005728 >>102005772 >>102005780
--Phi-MoE model trained on 5T tokens in 20 days: >>102001770
--Phi-MoE and other models' performance discussed: >>102001678 >>102001734 >>102001769 >>102001911 >>102002036 >>102001827 >>102001863 >>102001920 >>102001866
--Phi-3-medium-128k QA session with some errors: >>102008402 >>102009238 >>102009306
--LLMs' performance on trivia questions and benchmarking: >>102001347 >>102001394 >>102001429 >>102001788 >>102001881 >>102001911 >>102002036 >>102001704
--IQ quants are just as fast as non-IQ quants in various scenarios: >>102004266 >>102004294 >>102004324 >>102004743
--Discussion on RAG limitations and model comparisons: >>102003340 >>102003590 >>102003852 >>102007813
--Anon plans a collaborative storytelling session with AI models: >>102002167 >>102002238 >>102002267 >>102002376 >>102002622
--Anon wants to stop model from generating character thoughts: >>102002232 >>102002413 >>102002919 >>102002445
--Pre-filter in kcpp 1.73.1 improves sampling speed for large vocab models: >>102009661 >>102009786 >>102009825 >>102009858 >>102009969 >>102010190
--40 series only worth it for tensorRT or gaming: >>102003275 >>102003293 >>102003366
--Forge can run Flux, generating images with varying speeds: >>102003278 >>102003523
--Big model vs small model performance comparison: >>102003399 >>102003513
--Anon expresses skepticism about AI's future in stock market prediction: >>102003274 >>102003414 >>102003526 >>102003519
--Alternative to downloading llama 3.1 8b model from Hugging Face: >>102008627 >>102008820
--Miku (free space): >>102001243 >>102001619 >>102002232 >>102003278 >>102003911 >>102004811 >>102005366 >>102006050 >>102006609 >>102009098

►Recent Highlight Posts from the Previous Thread: >>102001464
>>
After 2 years why can’t we have a local model that isn’t either extremely horny or extremely dry. I don’t want my character to talk like a pornstar but it should know what a titfuck is.
>>
File: img_11.jpg (336 KB, 1024x1024)
336 KB
336 KB JPG
>>102011559
>>102011655
I made for you.
>>
>>102011931
Cute Migu is cute
>>
Can someone 4bit and upload new phi moe? If it is only 3B active you could just run it purely on ram and transformers and it would still gen faster than you can read.
>>
File: file.png (48 KB, 661x422)
48 KB
48 KB PNG
makes me wonder, what's the best way to run transformers models instead of waiting for gguf?
>>
>>102012486
Ooba I think.
>>
>>102012343
it's 6.6B active.
>>
Am I just fucked if I'm gated by the install process of text-generation-webui? CPU only, both start_windows.bat and update_wizard_windows.bat both don't get it working.
>>
>>102012584
No errors, no screenshots, no model. no nothing... lemme get my mind reading machine. give me a sec...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.