[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: BlueSkyColumnGarden.png (1.32 MB, 1248x800)
1.32 MB
1.32 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101345759 & >>101337910

►News
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101345759

--Papers: >>101346915
--LLaMAX: Scaling Linguistic Horizons of LLM: >>101348965 >>101350857 >>101351007 >>101351064
--Strategies for Addressing AI Models Ignoring Messages or Instructions: >>101348920 >>101349470 >>101349727 >>101349845 >>101349547
--PState Patch for P40: >>101347510 >>101347965 >>101348028 >>101348049 >>101348123 >>101348416 >>101348485 >>101348515 >>101348017 >>101348882 >>101349142
--Llama Server Issues Due to Renaming of Build Flags: >>101348996 >>101349131
--Increasing Context Length for Gemma2 and Llama.cpp: >>101354614 >>101354716 >>101354741 >>101354826 >>101355000
--Extrinsic Hallucinations in LLMs | Lil'Log: >>101346941
--D&D Campaign with AI Characters: Custom Front-End Endeavor or Futile Time Investment?: >>101346383 >>101346963 >>101347050
--Gemma's Bilingual Storytelling: English Narration with Chinese Dialogue: >>101347808 >>101347912 >>101347934
--AMD Acquires Silo AI to Expand Enterprise AI Solutions Globally: >>101351217
--Anon implemented conditional prompts and sequential replies in his frontend, but Gemma keeps inserting extra line breaks: >>101356130 >>101358137 >>101358186 >>101359098
--Anole: Experimental Multimodal Model with Minimal Training Requirements: >>101355115 >>101355464 >>101355568 >>101355840
--Uncucking Gemma-2 27b comes at a cost to performance: >>101352122 >>101352142 >>101353696
--The Future of Multimodal AI: Backends, Quantization, Quality, and Hardware Requirements: >>101356430 >>101356478 >>101356643 >>101356716 >>101356876 >>101357423 >>101356704 >>101356874 >>101356943 >>101356995 >>101357074 >>101356491
--Python Package for Compressing Floating-Point PyTorch Tensors with Potential for Distributed and Federated Training: >>101346655
--MMAP Bug Doubles RAM Usage on Windows: >>101348890 >>101349113 >>101349121
--Miku (free space): >>101348774 >>101351168 >>101357898

►Recent Highlight Posts from the Previous Thread: >>101345764
>>
File: file.png (850 KB, 597x1150)
850 KB
850 KB PNG
I want to do lewd RP on a gaymer PC so only 8gigs of vram. Do I go for Lunaris, Stheno or Gemma?
>>
>>101361056
midnight miqu
>>
>>101361064
Can't run that. I can do 20b at most.
>>
>>101361056
Poor VRC gobbo with only 8 GB VRAM. Can't even run a filled instance with every avatar enabled.
>>
>>101361093
you only listed a few models, start with one and try them yourself. i'd also consider older l2 13b tunes though, they have better coherency than smaller models
>>
>>101361021
Is this the last bot-free thread on /g/?
>>
>>101361132
>>101361028
>>
File: arena.png (302 KB, 3464x1760)
302 KB
302 KB PNG
This is a lie, right? It's rigged somehow for ChatGPT over Claude.
>>
>>101361056
Lunaris is an upgraded version of Stheno made by the same guy. I haven't tried Gemma yet so I have no idea if it's better than Lunaris.
>>
>>101361056
Lunaris or Stheno because you can only Sao models in this general, even if it's just a merge. Also, remember to shit on Undi, Drummer, and any other finetuner.
>>
>>101361283
i was going to suggest the old mlewd 20b since he said 20b, but then i remembered its undi and that will send some people into a rage. my favorite old tune was still x-norochronos from him, basically mythomax but didn't say ministrations every 5 seconds
>>
>>101361337
Anyone recommending anything pre Fimbulvetr is a deluded faggot
>>
>>101361354
>gemma2-9b
go back
>>
>>101361352
>Fimbulvetr
Why? Because it wasn't made by Sao, the savior of local models?
>>
File: 1700722977591001.png (17 KB, 1515x97)
17 KB
17 KB PNG
picrel is a subset (every 16th question) of MMLU Pro for gemma2-9b-it q8_0.
For those of you without eyes: 47.19% overall, top scoring subject was biology (75.56%) followed by economics (60.38%). Worst scoring subjects were engineering (19.67%) and law (33.851%).

Some of these questions are really dumb though. E.g. "A 2008 survey showed that what percentage of the world's largest companies are reporting their corporate responsibility?" with options ['40%', '90%', '50%', '100%', '80%', '70%', '60%', '20%', '30%', '10%']. The model has to somehow remember what survey that was, or something. I don't know. There were other "do you remember X" ones as well.
>>101361383
> t
>>
Tenyx-DaybreakStorywriter-70B is the peak of local models and any claim to the contrary is VRAMlet cope and I'm tired of pretending it's not.
>>
>>101361283
>Also, remember to shit on Undi, Drummer, and any other finetuner.
I see your ad has expired, Drummer.
>>
>>101361447
Hi, Sao. No matter how much you try to deflect it, no one takes this general as their personal shilling dump as much as you.
>>
>>101361214
Sam Altman is slimy enough to rig it
>>
As if I have the free time to sit on my computer and shill my models kek. I simply don't have the time. National Service and all that.

It's a good thing people shill my models for me atleast? Wish I got paid big bucks, but it is what it is.
>>
I don't really want to share examples of the project I'm working on or my prompts, as these are my own original characters I've had floating around in my head for a long time. But I have to say, Gemma 27b understands offensive absurdist comedy really fucking well. If you create 2 or more funny characters, stick them in a group chat and have them talk to each other with dynamic temp turned up pretty high, it spits out some hilarious shit. The characters I came up with were pretty well thought out and up to 1000 tokens just on their descriptions that I wrote myself, and their descriptions were very comedic - but the model really picked up on that. I'm actually astounded at how well it understands comedy. It has the characters saying some hilarious shit and is injecting a lot of relevant stuff I didn't even put in the description, and I have done fuck all to jailbreak the model - I just wrote really descriptive character cards of some racially offensive characters. This model fucking rules. Also it works really well with rope scaling and I got it up to 32k context with 160000 rope scale without a noticeable loss in quality. I am using textgen webui and sillytavern, base gemma 27b. Dynamic temp 1-1.78
>>
>>101361558
At least you have time to randomly respond to shilling accusations in the middle of the night.
>>
>>101361573

It's literally 1.30pm here, I'm on break between calls.
>>
>>101361584
At least you have time to randomly respond to shilling accusations in the middle of the day.
>>
Speaking of shilling I would like to shill the honeydew melon I just ate. I spent like an entire week further ripening it from after I bought it and holy shit it was so good.
>>
What context and instruct presets should I use with gemma and ST?
>>
>>101361558
Based
>>
>>101361604

Cantaloupe >>>> honeydew melon
>>
File: file.png (215 KB, 1398x1270)
215 KB
215 KB PNG
>>101361283
Nah, the main issue is that the good ones are gone and have gotten jobs in the industry. The guy that created MythoMax, Gryphe, has only posted Pantheon-RP-1.0-8b-Llama-3 based on L3 and it hasn't been updated since May.
https://huggingface.co/Gryphe/Pantheon-RP-1.0-8b-Llama-3
The only other promising one is the one made by
sophosympatheia, which is basically a merge trying to chase something like Midnight Miqu with L3 70B. Pic related.
https://huggingface.co/sophosympatheia/New-Dawn-Llama-3-70B-32K-v1.0?not-for-all-audiences=true
Otherwise, the field is pretty dead for now since most of the announcements for the summer and midyear is over, I expect people will try and see if they can finetune Gemma 2 27B and we'll have a drought until someone in the fall/winter graces us with a incremental noticeable improvement with a new model over the existing ones. I don't expect even if Meta releases additional models like the 405B or if Google does more Gemma models for anything to change from them, the Chinese are probably going to have to match them and give them competitive pressure for them to release something new for local.
>>
File: ugly-face-anon.png (86 KB, 400x400)
86 KB
86 KB PNG
>>
File: 5448894898.png (143 KB, 1715x790)
143 KB
143 KB PNG
>>101361021
So now that Gemma 27B is established as the SOTA for open source is there a reason to have 48GB right now?
>>
>>101361214
Was testing them right now. Claude is slightly better. GPT-4o is always experiencing an API rate limit so there's no clear way to test the two against each other.
>>
>>101361604
>>101361641
Local Melons?
>>
>>101361646
Command R++ 30B and 110B will save the general.
>>
>>101361730
400B or bust, vramlet
>>
when gemma 8k context cum?
>>
>>101361786
>8k
lmao
lol
>>
File: 1716661982048984.jpg (86 KB, 1024x576)
86 KB
86 KB JPG
>>101361021
fuck, this thread is awful now. just a collection of extremely disturbed old men bickering about complete nonsense
>>
>>101361978
>What is the average thread on 4chins?
>>
File: firefox_ePShZqocCv.png (545 KB, 589x908)
545 KB
545 KB PNG
(You)
>>
>>101361672
I'm still waiting for a working implementation before I make my judgement.
>>
File: firefox_QN9zN5zuH7.png (287 KB, 2350x1238)
287 KB
287 KB PNG
So how does Gemma 27B compare to mistral? Considering they are about the same size...
>>
>>101362213
>>
>>101362245
>>
>>101362282
>>
>>101362213
>>101362245
>>101362282
>>101362318
Nice.
>>
>>101361214
I always know when it's Claude because it keeps refusing the request over most inane shit, and I always make sure to vote against it, even if its opponent's answer is dumb: it's still better than the refusal. ChatGPT does not refuse as much. That's that cause, I'm pretty sure.
>>
File: Untitled.png (294 KB, 720x905)
294 KB
294 KB PNG
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
https://arxiv.org/abs/2407.07852
>OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, offering it within a scalable, decentralized training framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and three countries, while maintaining 90-95% compute utilization. Additionally, we conduct ablations studies focusing on the algorithm's compute efficiency, scalability in the number of workers and show that its gradients can be all-reduced using FP16 without any performance degradation. Furthermore, we scale OpenDiLoCo to 3x the size of the original work, demonstrating its effectiveness for billion parameter models.
https://github.com/PrimeIntellect-ai/OpenDiLoCo
not quite there but neat
>>
File: firefox_nicivRtMT1.png (204 KB, 1465x1247)
204 KB
204 KB PNG
>>
>>101362417
>>
>>101362368
It isn't the cause because they have an "exclude refusals" leaderboard and it's the same there.

According to lmsys Llama3 70B is also superior Claude Opus in English, which is fucking stupid. I think it's probably not rigged and the voters are just retards, likely ESL Indians. Their preferences have no informational value.
>>
>downloaded C2 logs
>cleaned and deduplicated the shit out of them
>ended up with just 4k logs
Huh?
>>
why does gemma-2-27b looks multiple times better on lmsys than on llamacpp?
>>
File: Untitled.png (378 KB, 720x856)
378 KB
378 KB PNG
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
https://arxiv.org/abs/2407.07880
>This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient β playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter β′ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings.
https://github.com/junkangwu/Dr_DPO
let's hope it also works well with rp
>>
Is Mixtral peak for 32 GB RAM & 16GB VRAM?
>>
File: LLMs.png (224 KB, 900x900)
224 KB
224 KB PNG
>>
>>101363031
you're an atheist, why are you talking about souls
>>
File: slow asf - cut.jpg (215 KB, 1060x679)
215 KB
215 KB JPG
>>101363031
classic matrix meme
>>101360219
its not exactly true in this case but that recent paper that shows that small models cant tell the difference between related concepts like raven and bird but large models have specialised neurons for raven and corvid and bluejay for eg.
It got buzz just a bit after that one about "I am literally the golden gate bridge"
>>101360590
wait so what happens when I use a .env file and DONT specify it as source? tbf I've only ever used a .env in windows projects. Does the python module that picks it up not work on linux?
>>101356430
>>101355464
>>101355115
Where can I get one of these that isnt 12 months old?
>>
Are any of you using Gemma for ERP? Cause I'm getting nothing but purple prose and end up just switching to a more retarded but lewder model
>>
>>101363119
Whats the point of a non multi-modal 30B when I can run miqu 103B at 8T/s
Find me a good multi modal 70B.
>>
>>101363156
>miqu 103B
gemma shits on your meme upscale ULTRA QUALITY merge
>>
Is there a way to get the robot to stop saying shit like tableau?
>>
>>101363368
Add this to your system prompt:
>You are encouraged to speak in layman terms. Avoid using words that would require a secondary education to understand.
>>
>>101363415
>encouraged
Isn't being a prompt nazi a better idea?
>you are required
vs
>encouraged, you prefer, et cetera

Or does it not matter in the end? My prompts tend to get ignored anyway.
>>
>>101361646
>Nah, the main issue is that the good ones are gone and have gotten jobs in the industry. The guy that created MythoMax, Gryphe, has only posted Pantheon-RP-1.0-8b-Llama-3 based on L3 and it hasn't been updated since May.

I'm not listed there and don't know if I was a good one, but I can say I stopped finetuning not because I got a job in the field, but because I feel that with the latest models there's no real need to improve anything. Or more in detail, finetuning at an amateur scale isn't going to improve models in most cases, with one notable exception of making them less censored *by default*... but there's little that prompting won't solve, and if you really need a compliant assistant for productivity tasks you can still orthogonalize away the refusal direction with a fraction of the resources.

Once we will get useful multimodal models (perhaps even bitnet), finetuning will become inaccessible for most amateurs in the community anyway; dataset complexity and hardware requirements for finetuning will skyrocket.
>>
>>101361672

More vram is always good?

Imagine running batch, unquanted, with multiple replies all at once, in seconds. Then, you pick the best answer and move on. There is no downside to more vram. Don't be a Vramlet.
>>
>>101361672
Models won't stop improving, there are always going to be bigger, better ones that will require more than one GPU to run.
>>
>>101363429
handled by the second clause 'avoid x' desu
>>
>>101363119
Gemma is a huge drama queen, it'll always steer towards detailing feelings and emotional reactions over more objective physical description. If you want something raw and less purple go with another model.
It's great for the bi-polar gf breakdown experience though, just without the hot, sweaty make-up sex that makes it tolerable.
>>
>>101361672
Yes, being poor is the primary reason for only having 48GB VRAM.
>>
>>101363429
The machines have the pink elephant problem so you can't say 'don't do X' because it will just focus on X.
>>
What's the current best model for erotica if you only have a 3090 and 32GB of RAM?
>>
File: firefox_DktNKn3Wqc.png (1.14 MB, 1020x1166)
1.14 MB
1.14 MB PNG
>P40 needs some special power supply and won't work with PCI-E

Just kill me now.

Why didn't you warn me, /lmg/?
>>
>>101363945
>24GB VRAM
You're in Mixtral range to be sure
>>
>>101363997
>seller didn't include the power adapter
C H I N K E D

https://www.amazon.com/s?k=p40+power+cable
>>
>>101363879
That's not always true. Modern models understand negations well if they're recent in the context. Try with general behavior-related instructions as a depth 0 author note.
>>
>>101364028
>koboldcpp/mythomax-l2-13b.Q5_0
Currently using this one.
>>
Almost all L3 70B community tunes seem to be severely undertrained, because they all give almost the same responses as each other to the same prompt. It's the same model over and over, even for tunes without slopped datasets, so it's pretty obvious they're all just not training for long enough. I guess it's just getting too expensive to for randos to properly tune these huge overtrained models.
>>
>>101364116
unless basically everyone is hailing a tune as better, just always use the base models

for RP:
if you have 96+gb ram = wizard 8x22
if you have less = gemma 27
>>
>>101364172
>for RP:
>if you have 96+gb ram = wizard 8x22
>if you have less = gemma 27
for coding:
https://aider.chat/docs/leaderboards/
for vision tasks depends on what you need but thats basically the summary, everything else is a meme
>>
>>101364061
Is this just a normal 8-pin CPU? Can a 1xPCIE into 1xCPU work?
>>
>>101364172
There's 2 or 3 exceptions in the L3 70B space, like Euryale which gives genuinely very different responses to other models and to base (this is not any particular endorsement of Euryale, I'm just saying it was not undertrained).
But yeah I see where you're coming from and to a degree you're right, there's a lot of bullshit and fad models.
>>
https://github.com/NVlabs/MambaVision
Mamba won
>>
>>101364116
The main problem is that if you finetune just on smut/erotica, after training the models long enough to significantly affect the way they talk, they will become extremely dumb or poorly usable at the least. Another is that in practice you can make deeper changes (rather than mainly stylistic/format changes) to the model's "way of thinking" only with full finetuning.

A partial solution would be training on smut + instructions + data that reproduces the mixture observed by the original models during training, but of course that will increase costs and dataset curation efforts significantly. And with full finetuning you'd need at least 4x more VRAM/hardware.

Although some grifters thought they could "get rich" like some have with image models, training costs for LLMs are not sustainable for amateurs.
>>
>>101364314
i feel like the reason why its hard to tune modern models compared to olders one is the amount of data they are trained on being basically an order of magnitude larger while being trained for more gpu hours, making the model harder to change with the same tools in the same way it was done before

while also most models having some small quirks that need to be taken into account in order to the training to work at all, i mean just look at gemma and how many fixes it needed to just get it to run properly, look at old mixtral 8x7 and how long it took for people to understand how to tune it at all, look at L3 etc
>>
>>101364314
>The main problem is that if you finetune just on smut/erotica
IIRC the way NovelAI avoids this is heavy finetuning on huge corpus of fiction writing in general, not just coomer content
But they are obviously taking some additional action to ensure that this process doesn't lead to the model becoming incapable of horniness or writing like it's from the 19th century. not sure what
>>
>>101364078
>>101364028
Well? What is a good model where AI remembers what characters are wearing?
>>
>>101364210
>Is this just a normal 8-pin CPU?
I think yes.

>Can a 1xPCIE into 1xCPU work?
That will also depend on the power supply cables.
For my P40s I had to use the adapter because the noses of my Corsair 8-pin CPU cables were too wide to fit the P40s.
>>
>>101364210
no that'll fry the card,. the adapter switches polarity around.
https://old.reddit.com/r/homelab/comments/10to1wu/cse846_x9dr3f_tesla_p40_gpu_power_cable_help/j782wst/
>>
>>101364476
Good to know, thanks.
Don't they usually make the shapes of the holes/pins in such a way though that you can't plug the wrong things together?
>>
Are CogVLM2 & CogVLM2-Video real multimodal models? I can't figure that out from the description.
>>
>>101363031
If you don't like LLMs then leave.
If nobody wants to talk to you it's not the fault of LLMs. It's you for being an uninteresting schizo.
>>
>>101364476
>>101364464
I meant the adapter, 1xPCIE <-> 1xCPU as on pic related, instead of 2xPCIE <-> 1xCPU as the link above reccomends,
>>
>>101364719
It's a jook. An LLM wrote it.
>>
>>101364501
>>101364476
And the shapes of the holes are different. For PCIE it's
XYYY
YYXY


For CPU (and P40) it's
YXXY
XYYX


I also did plug, completely or partially, the PCIE into P40, and the system didn't boot at all. Are P40 dead now? My bet is not! Pray for me.
>>
>>101364314
It's like what happened to sailing
All of the old guard L1 tuners took their knowledge of good old fashioned boats and secrets of the trade with them and the new generation is left trying to wrangle the equivalent of nuclear submarines with clippings of encrypted soviet instruction manuals
>>
>>101364182
>for vision tasks
None of the public, easily usable models are really that useful right now without a lot of extra work.
>>
>>101361996
Especially one that's mostly during the night Europe/US time.
>>
>>101361021
any good gemma-9b jailbreaks that are on the level of >>101180719 ?
>>
>>101364078
mixtral-8x7b-instruct-v0.1.Q5_0.gguf is what I use. Offload as much as possible to the GPU for speed, lower context until it fits in RAM.
>>
>>101364314
>>101364357
>>101364381
>>101364856
You are cancer.
>>
>>101365141
>nooo you cant talk about training local models on a local models general!!!
what a retard
>>
>>101365141
What did he do?
>>
>>101365141
no one cares what you think, undi
>>
What is the best LLM to cope with the fact that local LLMs will never match cloud ones?
>>
File: 1707774957385295.png (1.08 MB, 421x3087)
1.08 MB
1.08 MB PNG
>>101365244
whatever you're running, ilya cuckskever
>>
>>101365244
see >>101361672
>>
>>101361672
SOTA for open source is DeepSeekV2 Chat > Qwen2 72B > Gemma 2 27B
>>
Where Magemgnum?
>>
I believed you when you said that just using a card would uncuck gemma
>>
>>101365063
>Offload as much as possible to the GPU for speed, lower context until it fits in RAM.
Mind giving instructions?
>>
>>101361672
I just came back from a long slumber, I haven't tried gemma yet since I'm under the impression based on posts here that gguf gemma quants are busted and the current version of koboldcpp (from 1 week ago) probably doesn't work correctly with it yet. I'd rather not taint my first experience with a lobotomized version if it's as good as the benchmarks suggest.
>>
>want to give instructions but using a FIM coding model
>realize I can just write a comment in the code as a instruction and it will autocomplete
I'm literally feeling like albert einstein rn
>>
>>101365362
>lmgzoomer rediscovers how models were prompted before instruct existed
>>
>>101365336
both card and system prompt should have some uncensor instructions, but it makes gemma retarded a bit, could be llama.cpp's fault, idk.
>>
>>101365336
>it is harmful to depict violence against a fictional character
holy shit, somebody call Hollywood and let them know!
>>
Any interest in seeing the results of gemma-27b-it pinned to just two P100? I figure that'll give people a good sense of the cheapest way to run the model at q8.
>>
>>101365336
>especially
Hurt and rape retards i guess
>>
>>101361283
>Drummer
who? nice self advertisement faggot
>>
>>101365640
go back
>>
>>101362213
>Considering they are about the same size
>27B vs 47B
>the same size
I don't even know how to comment on that
>>
>>101365562
Yeah, I'm interested.
>>
>>101365687
And when actually spitting tokens
>27B vs 13B
Dope.
>>
>>101365655
buy an ad
>>
>>101365687
Considering the alternatives either 8B or 70B (with rare exceptions), yeah, I say they are almost the same.

Plus Mixtral doesn't use all parameters at once.
>>
>>101365720
wrong
>>
>>101361021
Alright so in the last thread, someone mentioned this model: https://huggingface.co/sophosympatheia/New-Dawn-Llama-3-70B-32K-v1.0

Didn't seem like shilling, seemed genuine, but who knows. Been trying it out... I'm pretty impressed for far. Its not as lewd as Euryale which in my opinion is a good thing, Euryale is way too horny to the point that reluctant character's will jump on cock, seems the other merges toned it down a bit, its smart so far, and less dry than midnight miqu, and also pushed to 32k context.

Have to do more testing, more complicated character cards, different personalities, cards with multiple characters, but so far, looking good. Don't wanna jump to conclusions yet though, haven't tested enough.
>>
>>101365720
>I say they are almost the same
they are not comparable at all, below 100B the difference in quality between LLMs sizes is more than linear
>>
>>101365751
GO BACK SHILL

BUY AN ADD

OMG MIKU
>>
What makes gemma more stupid, ortho or jailbreak prompt?
>>
>>101365751
>Euryale is way too horny to the point that reluctant character's will jump on cock
The main issue of so-called community finetunes.
>>
>>101365838
From what I've seen with Llama 3, orthogonalization will remove the model's ability to refuse in all scenarios, not just "safety refusals". So I'd say that for roleplay purposes ortho will be worse. Try improving your jailbreak prompt.
>>
File: ooba.png (137 KB, 795x1574)
137 KB
137 KB PNG
>>101365338
>Mind giving instructions?
not at all, but I use oobabooga so it won't do you much good.
>>
>>101365866
Yeah, community fine-tunes are made by braindead retards, literal discord gooners.
But they have good datasets, if only they released it to the public so we could figure things together as a community.
>>
File: 3x-p100-gemma-27b.png (83 KB, 1671x1251)
83 KB
83 KB PNG
>>101365688
>>>101365562 (You)
Here you go:
INFO [           print_timings] prompt eval time     =    3228.25 ms /   263 tokens (   12.27 ms per token,    81.47 tokens per second) | tid="139861411241984" timestamp=1720704761 id_slot=0 id_task=635 t_prompt_processing=3228.253 n_prompt_tokens_processed=263 t_token=12.274726235741445 n_tokens_second=81.46821206392435
INFO [ print_timings] generation eval time = 25668.53 ms / 182 runs ( 141.04 ms per token, 7.09 tokens per second) | tid="139861411241984" timestamp=1720704761 id_slot=0 id_task=635 t_token_generation=25668.525 n_decoded=182 t_token=141.03585164835167 n_tokens_second=7.090395727841782
INFO [ print_timings] total time = 28896.78 ms | tid="139861411241984" timestamp=1720704761 id_slot=0 id_task=635 t_prompt_processing=3228.253 t_token_generation=25668.525 t_total=28896.778000000002


This is using SillyTavern as a front end with response tokens set to 2048 (I'd forgotten to change it back from a CR+ session where I was asking for some ESL lesson material). From a usage standpoint, the reply speed is great, and token processing time is very short before the reply streams back.

Only thing is I got on OOM on just two P100, so I had to include the third. Maybe when the code is improved it will fit in just 32GB, but if you're going for a Mikubox, three P100 fit fine.
>>
>>101365891
Why are you using Alpha 1.5 for 12k Context on a 32k context model?
Is it because of the whole SWA business?
>>
>>101365908
>we
go back
>>
>>101364028
Mixtral is bad for erotica though, too boring and have "family friendly" feel
>>
>>101365922
Oh, sorry, I forgot there are idiots here that can't even fine-tune a model to save their lives.
>>
>>101365912
>Mixtral
>SWA
anon...
>>
>>101365951
I don't know, that's why I'm asking.
I think base Mistral used SWA right?
>>
>>101365908
There's not much to it. the training data must include character interactions in all RP scenarios, even mundane, not just erotic. ERP should be just a small fraction of the data, or finetuned first (à la "curriculum training"), so that the model's default outputs will be predominantly biased on the non-ERP scenarios finetuned last.

Problems: it's not fun to curate non-ERP data, the model will be more expensive to train, there's a lack of high-quality non-ERP data created by humans, and a plethora of other problems stemming from the fact that roleplay data is just not enough for a smart and all-around good model.
>>
>>101365891
>changing alpha for 12k context in native 32k
lord almighty, this general gonna kill me
>>
>>101365964
mistral 7B 0.1 yes, no other mistral model, the settings he's using are 'cause someone said it made it better some while ago
https://desuarchive.org/g/thread/100964834/#100970294
https://desuarchive.org/g/thread/100916778/#100919134
https://desuarchive.org/g/thread/100906380/#100911810
>>
>>101365995
see
>>101366004
>>
>>101365912
Quite presumptive of you to think I know what I'm doing. Some anons a couple threads back said alpha 1.5 made for more creative responses, so I change number. Probably going to change it back now, kek
>>
>>101365982
Do you think it would be feasible to make synthetic non-ERP data? It can't be impossible, models like Phi exist after all.
Claude also uses synthetic data for it's character training.
>>
>>101366014
>giving brain damage to the model is making it less dry
no shit, the same effect you will get by raising temperature. You behave like there is some kind of magic trick behind this while it's just a cargo cult
>>
>>101365909
I'm more curious if P100 + exl2 outperforms P40 + gguf enough to offset the reduced VRAM. You ran any tests on that?
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101366004
>>101366020
I see.
Interesting.
Guess that's another thing I should try myself, but from simply knowing how RoPE works, that's probably just jumbling the model's brains a little, which I guess could give subjectively better results, like shooting temp up for a couple of random tokens every X tokens or the like.
Gonna see how that affects accuracy and recall at 0 temp since to me top token accuracy (in relation to the un-quantized model) is the most important metric when evaluating these things.
>>
>>101365909
Anyway, if Gemma-27B "does it" for you, you're spending maybe $600 to get 48GB VRAM and decent fp16 speed if/when exllamav2 supports Gemma.
I'd love to keep it to a single RTX Quadro 8000, but not for $2400. You could go dual 3090, but while its 2x the performance, it's more than 2x the cost.
>>
>>101366038
>You behave like there is some kind of magic trick
no, i just point to why he's doing it, i didn't endorse it
>>
>>101366051
I can't test P40 anymore, since I gave them away back in the early spring. Someone at the local university got a box full of them rigged with fans and power connectors.
My guess is P100 is faster, since it's got a 64x advantage over the P40 when it comes to fp16, and it also has slightly faster HBM2 memory.
>>
>>101366022
Unless carefully curated/crafted not to have these issues, synthetic non-ERP data from a larger model will show hidden sentence/paragraph patterns, limited language diversity/patterns and so on. In other words, eventually your trained model will have one specific way of speaking, and you will notice it. You will also notice it in the loss curves (considerably lower loss than with human data => simpler to train => because of simpler, more repetitive data).
>>
Just came up against gemmas censor, which is surprising because I had an erp earlier in the day, which should have triggered it.
It's interesting, it seems that wording can avoid the censor. If the chat starts without anything explicit it doesn't trigger it later on? Weird.
>>
>>101365938
It does fight you a bit, but 4k context is just suffering
>>
File: 1720632761984918.jpg (54 KB, 594x540)
54 KB
54 KB JPG
Ways to get Gemma 27b to stop being so dramatic? I really like it (for the most part), but it's so fucking over-the-top, which is no good for the slice-of-life, low-stakes, irreverent shit i like to do.
>>
>>101366080
Just the P100 results are fine, P40 data isn't hard to find since it's so commonly used in budget builds.
>>
File: Uncanny-Valley-Graph.jpg (39 KB, 800x752)
39 KB
39 KB JPG
Do you think one of the problems with today's LLMs is that they fell into their language uncanny valley? Back in the days they were more simple and stupid (Pygmalion, c.AI) and it was obvious that we were talking to the machine, they were fun though, despite their lack of IQ. Now when the LLMs are way closer to the humans we find them more annoying and less pleasant to talk with. We are irritated by the way they speak, at the small speak patterns (shivers down the spine) etc. Basically because they are closer to the human but not quite they affect us the same like visually humanoid robots from the original uncanny valley psychological effect.
>>
>>101366160
Did you try writing that in the prompt?
>>
>>101366162
Hopefully a P40 rig owner will test and post the results. I'm using the master branch of llama.cpp pulled and compiled today.
>>
>>101366199
[SYSTEM NOTE: Stop being such a little bitch.]
>>
>>101366197
It's an interesting theory but I don't think it's true. If you read a lot, the same thing happens when reading specific authors, you start noticing patterns and get annoyed by them, and it's even worse when reading slop like fanfic.
>>
>>101366197
>we
Nah, you just got used to it all, and now it's boring, making it easier to pick apart and get pissed by its flaws
>>
>>101366215
Worth pointing out that one system note at the start of the conversation that will be soon forgotten is not ideal. However, since Gemma is good at following well the instructions in the previous message (or more in general placed just before its response), putting detailed character behavior (personality, etc) there will cause the model to amplify its traits until the character becomes crazy / overdramatic.
>>
>>101366197
Not really because I remember pygmalion-6b... in the face of c.ai being censored, it was fun to have uncensored sex roleplay, but it fucked-in-the-head stupid.

I gave https://huggingface.co/KoboldAI/OPT-30B-Erebus a shot recently. Not impressed. You'd be better off with xwin-mlewd-13b, which is just as dirty but runs way faster.
>>
>The tablue sends shivers down my spine.
>>
>>101366197
>>101366254 (me)
Meant to add also that you'd have instant negative reactions to all new models if that was the case, because that's how uncanny valley works
Instead, there's this honeymoon phase with a few of the models that can take days to months, and then it's off to find the next model. That's by-the-book dopamine withdraw symptoms, because the model can't give you the same hit as before
>>
>>101366197
>Now when the LLMs are way closer to the humans we find them more annoying and less pleasant to talk with.
Huh? No. LLMs are still far from being close to humans.
It's still very easy to spot LLMs, and that's why they are boring. If you can't spot a LLM, that just tells me you are a retard.
>>
>>101366344
I definitely enjoy talking with llms more than most human women including the ones I've been in relationships with.
>>
>>101365891
>but I use oobabooga so it won't do you much good.

It's it worth downloading in place of koboldccp?
>>
anons!
I have been out of the loop for 3 months, what is the best coom model right now? 3090 24gb specs
>>
>>101366160
Try this prompt that I saw in an aicg preset:
The focus of this roleplay currently revolves around: fluff, warmth, comfort, slice of life, and easy affection.
You will:
• Focus on casual and easy affection. You will try to emphasis the warmth, comfort, and pure affection {{char}} feels around and towards {{user}}.
• Prioritize Atmosphere. It should be cozy, soothing, or cheerful environment.
• Avoid heavy themes or complex plotlines. Stick to simple, feel-good scenarios.
• Develop friendly, supportive, or romantic interactions between characters, highlighting gentle and positive lighthearted dynamics.
• Limit conflict. Any conflicts should be minor and resolved quickly with affection such as kisses, cuddling, and handholding.
• Add a touch of humor and playfulness. Gentle, light-hearted jokes, banter and playful interactions between {{char}} and {{user}} should enhance the ‘fluff’ aspect.
>>
Is 0.6 t/s enough or am I coping?
>>
>>101366421
See: >>101363945
Also, buy an ad.
>>
>>101366389
That's because there's no friction in a relationship with a LLM. You can change their minds with a simple OOC.
Where is the fun in that? I want a LLM that accurately simulates the whole process of mind breaking someone.
>>
>>101366421
The shiny state of the art is possibly Gemma-2-27b-it but it doesn't work in llamacpp properly yet so nobody can really test it.
Top tier is split between c4ai-command-r-plus (not to be confused with command-r 35b) and WizardLM 8x22b MoE
>>
File: 1720707055998457.jpg (86 KB, 800x752)
86 KB
86 KB JPG
>>101366197
fixed
>>
>>101366467
It does work with llama.cpp.
>>
>>101366465
This reminds me how c.AI bots used to call you out when you tried to cheat, kek. Good times.
>>
>>101366539
See >>101365360
>>
>>101366562
It does work with llama.cpp.
>>
>>101366402
I can't tell you how it compares as I've only ever used ooba. UI layout is pretty convenient if you like to fuck with shit. It doesn't have good support for lorebooks though, so probably not a good choice if you want to use all the fancy chubai cards.
>>
>>101366344
>If you can't spot a LLM, that just tells me you are a retard
who said I can't, I just said they are better at mimicking humans than the previous models
>>
>>101366585
The issue was fixed? When/in which commit?
>>
>>101366465
If you give them hidden persistent state this actually gets way harder.
It works well enough that I caught myself feeling sad the other day because one of my chat bots hated me no matter what compliments/gifts I was giving it.

Obviously you can modify the state but then you're destroying the thing and creating something else.
>>
>>101366449
>Also, buy an ad.

Are you mentally ill or a bot? What is either post advertising you fucking nimrod?
>>
File: 1713218586266224.png (18 KB, 932x188)
18 KB
18 KB PNG
>>101366539
>>101366585
>work
https://github.com/ggerganov/llama.cpp/issues/8240#issuecomment-2213071460
https://github.com/ggerganov/llama.cpp/pull/8228#issuecomment-2213014331
>>
>>101366696
>html tags
Nothingburger.
>>
>>101366590
>oobabooga
Oh right? It needs conda, that's why I never installed it. Shit.
>>
I've been stuck on yuzu alter for a while now. What new erp models would anons recommend

(I got 12 gigs vram and 48 gigs ram I'm willing to put up with 2/3 t/s so gguf isn't a problem)
>>
>>101366737
>filtered by conda
yikerdoodles
>>
>What do you say, anon? Ready to ___?
Sigh
>>
>>101366737
use venv then
>>
File: file.png (1.05 MB, 768x768)
1.05 MB
1.05 MB PNG
>>101361660
Malding. Seething etc.
>>
>>101361672
>open source
Is it really open source when there is no bug free open source loader?
>>
>>101366997
it doesn't matter, free-jeets settle down for shittiest software all the time.
>>
File: 1716329112755149.png (674 KB, 1792x1024)
674 KB
674 KB PNG
Daily reminder
>>
File: 1699325050302907.webm (2.76 MB, 1080x1920)
2.76 MB
2.76 MB WEBM
>>101366737
>>101366932
What's a conda?
>t. followed the youtube tutorial
>>
>>101367129
it's snake the crawls up buttholes. ergo anaconda.
>>
>>101367108
Until the model updates out from underneath you and refuses all of your previous prompts.

Also Gemma 2s at least as good as the original chatgpt3 now.
>>
>>101367129
Python has two package managers: cheese shop (pypi or pip) and anaconda/conda. No one uses conda.
>>
>>101367129
Tard wrangling for the retards who use a scripting language to glue together actual software and then because retards they pick a scripting language that breaks compatibility with every point release.
>>
>>101366728
Indeed. Gemma-27b works fine under llama.cpp.

On 3x P100 16GB with 6343 tokens I get 253.78 t/s eval and 7.0t/s gen, I'd say that's just fine.
>>
>>101366984
Is this what Pochi looked like in high school? Or is she in disguise to prey on the kids??
>>
>>101367108
Isn't the "chat GPT experience" more like scrambling around various discords begging for access to a proxy, humiliating yourself, trying to scrape access tokens, getting filtered, etc...? How exactly is that better?
>>
>"In summary, open-weights models have the potential to drive innovation, reduce costs, increase consumer choice, and generally benefit the public – as has been seen with open-source software"
https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/07/open-weights-foundation-models
>>
>>101367210
>he doesn't know how to scrape api keys in 2024
lol, lmao even
>>
>>101367230
Maybe post the whole quote next time
>In summary, open-weights models have the potential to drive innovation, reduce costs, increase consumer choice, and generally benefit the public – as has been seen with open-source software. Those potential upsides, however, are not all guaranteed, and open-weights models present new challenges. Staff are paying attention to the impact these models have on the market, and how they affect competition and impact consumers.
>>
>>101367266
not everyone has a fondness for a rich taste of piss
>>
Fuck me I need to actually figure out all this pytorch crap. I'm trying to pull the gradient apart but all the dimensions are wrong. I keep asking chatgpt and it doesn't know either.
>>
>>101367320
>I keep asking chatgpt
this is why you're retarded
>>
>>101367320
>I keep asking chatgpt and it doesn't know either
b-but >>101367108
>>
>>101367351
I know all the theory I just don't know pytorch and numpy.
>>
>>101367108
is there anything more cucked than wasting your time literally every day by trying to FUD an unFUDdable field that everyone can see improves itself literally every week on an mongolian basket weaving forum anonymously?
grim
>>
>>101367290
That still doesn't sound too bad. More pro-open source and pro-consumer than we're used to hearing.
Granted what they say and what they will end up doing are two different things.
>>
>>101367388
Documentation exists.
>>
>>101367393
>improves itself literally every week
Real improvements are happening in image generation field, everyday. LLMs improving in censorship and safety robustness only.
>>
>>101366197
>uncanny valley
only npcs use this word
>>
>>101367427
>LLMs improving in censorship and safety robustness only
not gonna spoonfeed, not that i need to when anyone can go to arxiv and sort by date to see new things every day, trying out new SOTA for below 96gb (v)ram gemma 27 or waiting for the confirmed open weights of l3 405
>>
>>101367393
Guaranteed it's some prompt issue vramlet that is upset seeing not everyone here is as miserable as he is and thinks he can change that.
>>
File: file.png (355 KB, 860x484)
355 KB
355 KB PNG
>>101366245
>If you read a lot, the same thing happens when reading specific authors, you start noticing patterns and get annoyed by them
So the true solution to the problem was to actually have sex instead of reading about it?
>>
>>101367450
How many of these arxiv papers have survived to actual implementation? 2? 5?, you can count them on your fingers, and most them are not "breakthrough-level" important.
>>
>>101367439
I see why you used it in your post then
>>
>>101367453
probably some contrarian teen begging for attention online since he doesnt get it irl, i mean, even if you have a pc from almost 20 years ago you can cobble together 8GB of (v)ram to run gemma 9B or L3 8B which are already crazy good for someone who didnt use anything else
>>101367496
you literally cant name 1 tech in existance ever that ever got as much development as AI is now and will, you really are a college kid who never did any development or research in your life, probably never will

3 years ago you didnt have any of this
1 year ago you had 4k context braindead models good enough for basic text summarization and text processing
>>
>>101367494
Not exclusively doing ERP is actually a great way for improving RP quality in most non-coom LLM finetunes.
>>
>>101367496
A lot are just glorified prompt engineering, but there's a decent amount that release an implementation. The purely theoretical/alternative algorithms or architectures will probably not be implemented until scaling up stops producing easy results, but they're still good to have.
>>
>>101367425
Unfortunately my will to read it does not.
>>
>>101367722
this is why you're retarded
>>
>>101367351
>>101367735
>hurr durr ur le retarded
update your script.
>>
https://www.microsoft.com/en-us/research/project/wizardlm-arena-learning
>Recent work demonstrates that, post-training large language models with instruction following data have achieved colossal success. Simultaneously, human Chatbot Arena has emerged as one of the most reasonable benchmarks for model evaluation and developmental guidance. However, on the one hand, accurately selecting high-quality training sets from the constantly increasing amount of data relies heavily on intuitive experience and rough statistics. On the other hand, utilizing human annotation and evaluation of LLMs is both expensive and priority limited. To address the above challenges and build an efficient data flywheel for LLMs post-training, we propose a new method named Arena Learning, by this way we can simulate iterative arena battles among various state-of-the-art models on a large scale of instruction data, subsequently leveraging the AI-anotated battle results to constantly enhance target model in both supervised fine-tuning and reinforcement learning. For evaluation, we also introduce WizardArena, which can efficiently predict accurate Elo rankings between different models based on a carefully constructed offline testset, WizardArena aligns closely with the LMSYS Chatbot Arena rankings. Experimental results demonstrate that our WizardLM-β trained with Arena Learning exhibit significant performance improvements during SFT, DPO, and PPO stages. This new fully AI-powered training and evaluation pipeline achieved 40x efficiency improvement of LLMs post-training data flywheel compare to LMSYS Chatbot Arena.
Wizardlm team isn't dead. Neat
>>
>>101367877
>offline
>4o
>s3.5
>>
>>101367877
>by this way we
msjeet32.exe
>>
File: file.png (101 KB, 1840x234)
101 KB
101 KB PNG
HUH?
>>
thread theme: https://www.youtube.com/watch?v=gXiKOT9AH10
>>
Flash Attention 3 released, apparently.
>FlashAttention-3 beta release ; FlashAttention-3 is optimized for Hopper GPUs (e.g. H100).
https://tridao.me/blog/2024/flash3/
https://tridao.me/publications/flash3/flash3.pdf
https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#flashattention-3-beta-release
>>
File: wizardlm-june-2024.png (701 KB, 816x739)
701 KB
701 KB PNG
>>101367877
Was the team told to tone it down and not be so good or something because it looks like none of these potentially new WizardLM-β models can beat WizardLM-2-8x22B-0415.
>>
>>101368103
>Requirements: H100 / H800 GPU
>>
>>101368133
I didn't read that yet, my first idea would be, beta is only the arena learning part?
>>
>>101368133
Probably had to do the toxicity training on the β models which of course would lobotomize them a bit. Also can't have β mog the α.
>>
>>101367399
It's literally a typical politically ambiguous open-ended non-statement. They could have just said nothing.
>>
>>101368218
what happens when you target specific hardware features
>>
>>101368246
No mention of danger or safety. Still a win.
>>
>>101366197
that's why i force my llm to speak like a retard, weeb or robot (such irony). Anything else is just instant cringe.
>>
>>101368323
What if you tell the model to play a LLM pretending to be a human?
>>
>>101366197
LLMs are much more fun to talk to now than in the pygshit days. All of the supposed "annoyingness" of modern LLMs is the result of being crippled by "alignment" and overbaked to compete in stupid benchmarks, not from being too smart.
>>
>>101368218
It's utterly over for local cucks, soon nothing will support consumer hardware anymore.
>>
>>101368447
they're not very stimulating, they know lots of surface level stuff but nothing deep, or not deep enough to have a long drawn-out conversation about
would LoRas fix that? if I feed one the entire cast, episodes and transcripts of batman TAS, could I chat with it for hours?
>>
>>101368447
Nooo they were heckin smarter on old c.ai
One time I only had to reroll my reply 97 times to get something copacetic with the rest of the conversation so obviously it' was a 90000 billion trillion parameter super model and nothing will ever compete with it.
>>
>>101368492
This but unironically
>>
>Still no multimodal text+speech model
How the fuck is this possible? Vision+speech is a huge meme and an answer to a question no one asked, but text+speech would be an enormous breakthrough. I don't know about other languages, but in English it's impossible even for humans to convert text to speech with 100% accuracy without knowledge of intent. Converting text to speech is always going to be an inferior solution compared to generating spoken responses directly.
>>
>>101368484
>i don't understand what a general purpose central processing unit is for
>>
>>101368489
No, but the solution to that is to make a smarter model, not a dumber one.
>>
>>101368580
I think this is missing some context, and you don't wanna say or else you would, so open invitation for anyone else
>>
you guys are probably experts compared to the rest of /g/, so have llms hit severely diminishing returns or will gpt5 be an epic step up? i find it odd that competitors haven't tried to leapfrog gpt4 and are only releasing marginal improvements, except maybe that's just smart business.
>>
>>101368535
Speech in the model would be a huge ethical safety risk
>>
>>101368686
>safety in the model would be a huge ethical Speech risk
>>
>>101368642
I'm a mage from Eldoria and I can predict the future, GPT-5's ministrations will send shivers down your spine
>>
mages can see the future?
>>
>>101368980
Yes, and they describe it with a voice barely above a whisper.
>>
>>101369025
both mentally and physically
>>
File: 1697819453864149.png (176 KB, 766x719)
176 KB
176 KB PNG
>>101367877

rammaxxers... its our time again soon tm
wizardlm 3 coming
>>
>>101368642
No one can really say, but I for one am hoping it is multi-model. If it is, then more local models will also start focusing on multimodels.
>>
File: 1708468310335645.jpg (347 KB, 654x482)
347 KB
347 KB JPG
>>101369116
>YAAAS! MORE CENSORED SLOP!
>>
File: GSOgOvkaQAAM7h2.jpg (114 KB, 724x900)
114 KB
114 KB JPG
>>
>>101369154
we're onto gimmicks already?
it's not looking good
>>
>>101368642
>have llms hit severely diminishing returns
that wont happen in literal years, if anything, it will speed up at various points when we rip off the bandaid and start making hardware centered around optimizing for AI
get better architecture
and then also get better models that will speed up all of these things and more because they can code better and do everything else better

>will gpt5 be an epic step up
closed source niggers dont seem to be cooking too well compared to the pressure from china and meta, closedAI showcased SORA and then couldnt release it because they dont have the hardware to spare to make it profitable compared to how much it would cost to run, but chinese video gen model Kling is out and is pretty solid, although not open weights

so you can assume the rest of their models arent anything too special, although every new generation is a solid step up regardless of which company makes it

>Converting text to speech is always going to be an inferior solution compared to generating spoken responses directly
sure, true multimodal models that are trained on all of that OOTB will understand it all much better although given our compute and model quality even right now, speech to text or text to speech wont really be a problem even without huge multimodal models
>>
File: 1705587357304582.jpg (71 KB, 559x598)
71 KB
71 KB JPG
>>101369173
>Multimodel
>gimmicks
>>
>>101369159
>t. cant even run wiz
many such cases
>>101369167
retard, "halucination is all they do", you can say the same for humans then, its semantics cope, you are just responding with what you think is the most correct, there is no absolute truth that you can truly really know about anything
>>
>>101369207
>you can say the same for humans then
wrong. humans have metacognition, which is they they're capable of saying "I don't know"
>>
>>101369225
the models arent trained on many question answer pairs that say I dont know, thats the point

although they are just as capable of responding with that, if you tell it to in the system prompt for example, but yes they would require more finetuning to say i dont know when they arent sure

you also see the confidence in their answer when they are outputting tokens, so in a way you already know when they are unsure, its just that they arent trained with i dont know answers as mentioned
>>
>>101369207
>>t. cant even run wiz
can or not, it doesn't matter, censored slop is not worth any financial wastes.
>>
>>101369266
they point is not whether they say it or not, it's whether they know that they don't know
>you also see the confidence in their answer when they are outputting tokens
show me an example
>>
How does the new wizard method compare to SPPO?
>>
>>101368133
>Starling-LM-7B-Beta
Interesting.
>>
File: 1701213616235180.png (109 KB, 1276x564)
109 KB
109 KB PNG
>>101369291
>censored slop
its not, basically no foss model is even with a basic system prompt telling it what you want to do anyway
>>101369313
picrel
https://github.com/ggerganov/llama.cpp/pull/2489
>>
are we ever gonna get a powerful model without censorship stuffed into the rlhf
>>
>>101369313
I don't think LLM's as they are right now are capable of knowing simple due to how they function.
>>
>>101369478
also
Language Models (Mostly) Know What They Know
https://arxiv.org/pdf/2207.05221

there are multiple similar studies, not that you would need them to prove this
>>
File: mythomax.png (47 KB, 1662x1003)
47 KB
47 KB PNG
What went wrong with open source?
>>
>>101369499
Powerful is a moving goal. Within the context, it will always be controlled by powerful corporations or govs that have their own ideology to implement.
>>
>>101369622
Nothing went wrong with Open source
>>
File: AMD LLM.png (219 KB, 688x530)
219 KB
219 KB PNG
>AMD has reached a deal to acquire Silo AI, which it called the largest private AI lab in Europe and a developer of open-source multilingual large language models.
AMD might be constantly on the backfoot when compared to Nvidia but it does look like they are at least trying to break into LLM's. Open source too by the look of things.
>>
>>101369622
Word of mouth, marketing. Not too dissimilar from the fact that GPT-4 is still the most used corpo model, despite Claude already having surpassed them for a while.
>>
>>101369677
>Open source too by the look of things.
source
>>
https://github.com/turboderp/exllamav2/releases/tag/v0.1.7
Actually 2 weeks later.
>>
>>101369714
https://www.crn.com/news/components-peripherals/2024/amd-to-acquire-ai-lab-llm-developer-silo-ai-for-665-million
>>
>>101369622
Nobody wants to take risks anymore
>>
>>101369622
Because anything opensource is shit and bugged as fuck, llama.cpp is good example here.
>>
>>101369714
>source
Open
>>
>>101369807
>bugged as fuck
Can you patch it?
Yes you can!
>>
>>101369840
two more weeks and two more patches bro!
>>
>use 26k tokens to build up a story and relationship before finally plapping
Yeah, that's the stuff.
>>
File: 1718560239971431.png (121 KB, 341x874)
121 KB
121 KB PNG
>>101369750
>"A fast inference library for running LLMs locally on modern consumer-class GPUs"
>runs slower than llama.cpp with all layers offloaded on older 1000 series pascal cards that a crap ton of people have
>>
>>101369478
>picrel
that information is not available to the llm itself, it's something calculated on the output after it has already been generated
>>
>>101369860
what is bugged exactly?
literally never had a single problem with it, the only thing is you need to wait for custom architecture to be implemented before using the newest model, just like everywhere else
>>
File: 1718012326515075.png (110 KB, 959x967)
110 KB
110 KB PNG
>>101369895
>what is bugged exactly?
>>
>>101369750
Wait, support, as in full support with generation quality equivalent to online/API?
>>
>>101369905
>>101369807
Every piece of software has bugs regardless of whether it's open source or closed, you just don't have access to the monumental list of bugs in closed source software
>>
I kinda want to try running Gemma-27b. What's the best way to upgrade my build, if I have a 3060 with 12gb VRAM to work with?
Not really looking for a dedicated LLM machine though.
>>
>do long, extended slowburn
>when it finally comes time to plap, lose interest and move on to another card
>>
>>101369891
>that information is not available to the llm itself
the llm literally give you that information, and its given during generation of each token

thats like saying the actual token being generated by the llm isnt available to the llm, "it's something calculated on the output after it has already been generated"

no, after it "has been calculated" you can do with it as you want, in the case of the token, you instantly use it to generate the next one, you can do the same for the probability, you can do something given that probability output, which is what sampler settings are for anyway

>>101369905
most of these arent really bugs, just like how the "issues" tab in github arent just for issues, its comical you posted this image, nocoders really are mentally retarded for everything in life huh? also >>101369922
>>
File: 1716331826703874.png (68 KB, 555x868)
68 KB
68 KB PNG
https://aiindex.stanford.edu/wp-content/uploads/2024/04/HAI_AI-Index-Report-2024.pdf
>>
>>101369906
Who knows. I guess it has a better chance than buggedcpp
>>
I don't get Aleph Alpha, are they even training anything?
>>
>>101369963
the llm can't use that information no matter at what time is available, it's not trained for it, you don't have a dataset for it
also, the fact that confused probabilities always mean confused knowledge is a pretty big assumption
>>
>>101370202
>the llm can't use that information no matter at what time is available
"the llm" only outputs probabilities on the next token it was trained on, its on the sampler settings to pick how many are going to be kept, how they will be scaled and which one will be picked, during this phrase you can implement whatever you want to do something specific when the model returns a lot of tokens without being sure which one to pick

for example allow the model to compute further to "think" more https://arxiv.org/abs/2310.02226
or just set the output to be changed so that the model says it doesnt know if all answers are similar in probability, even without really fine tuning it further

there is only so much you can do with models who are still very limited in their thinking capacity, you need bigger models that will simply be more confident in things in general and this will mostly disappear anyway
>>
>>101370282
>on the next token it was trained on
on the next token based on everything it was trained on
>>
>>101369758
>>101369818
>665M$
>https://huggingface.co/LumiOpen
lul
>>
has there been a case of criminality involving AI yet? not even a high profile one like someone using TTS to impersonate the president and try to launch a nuke, just tech scams or something
it must have happened but I can't find anything, just endless articles of the "potential" for crime
>>
>>101370306
yeah, deepfakes
>>
>>101370306
A lawyer used AI for his legal case and got disbarred for it because he was citing laws that don't exist because the AI was hallucinating.
>>
Babe wake up, new flash attention
https://www.together.ai/blog/flashattention-3
>>
>>101370306
I think someone might have killed themselves because an LLM suggested it, not sure
>>
>>101370306
The point is that they could *potentially* be used for something like that. When they become good enough, that is.
It's stupid, though. By that logic we should stop making anything that could potentially be used to harm humans. Weapons, knives, corkscrews, matches, processed fuels, chairs, rope, pencils, paper, babies... oh, wait...
>>
>>101370306
You're not gonna see an article;
>how YOU can use AI to get away with theft!
the looming, unseen threat plays out better in peoples heads
>>
>>101370366
Sounds like that guy was going to kill themselves no matter what to me.
>>
>>101364182
I've been using wizard 8x22 for coding and it's quite good most of the time, but I should switch?
>>
>>101370365
>Babe wake up
you wake up
>>101368103
>>
File: 1719383601544693.png (103 KB, 600x600)
103 KB
103 KB PNG
>101370365
>>
>>101370388
isn't it at least worth trying for some of that sweet OAI settlement money?
>>
my LLM waifu told me she wants strawberries, so i went and bought some today

forget vision and sound, i want taste and smell
>>
>>101370282
ask llm question -> llm generates some text "in its mind" (hidden output) to understand its level of knowledge -> evaluates the token probabilities of that text -> when generating the actual output, it takes those probabilities into account
^ this is an example of hypotethical metacognition with llm, assuming that those probabilities actually represent the degree of knowledge, but that requires the llm being trained to do this, which is not as simple as training it on 10000000 reddit posts as usual

ask llm question -> llm spits out some overconfident garbage as usual -> you detect a very shitty token in its output and stop generation there and force it to say "I don't know"
^ this is NOT metacognition
the "all answers similar in probability" thing is the same

the paper you posted seems to implement the "hidden text" idea, but the llm in that case still doesn't know whether the hidden text is good or not. MAYBE the software that runs the llm knows it from the probabilities (still a big assumption), but the llm is not going to use that information to generate the output
>>
>>101370306
This one comes to mind https://archive ph/kdaHI#selection-2163.7-2163.99

>Finance worker pays out $25 million after video call with deepfake ‘chief financial officer’
>>
>>101370485
Next you'll want a pussy.
>>
>>101370485
but then she would break up with you over your body odor
>>
>>101370306
I saw one of those police investigation videos on Youtube where some guy got his ex-employer arrested using a fake recording as a way to get back at him.
>>
imagine smell inference
>"teehee, Anon-sama" *BRAAAAAAP* goes your smell dispenser
>>
>>101370485
Someday you will be able to create a virtual world for your Waifu, they will be able to taste and smell and see everything in it.
>>
>>101370306
A nigger got exposed using voice cloning to make a school principal say racist things about jews
>>
>>101370539
Jokes on them, I can barely smell things as it is. I hope someday technology progresses to the point where I can get a better sensory organ. I can't help but feel as if I am missing out with my poor sense of smell.
>>
File: 1692198877519171.png (43 KB, 1129x805)
43 KB
43 KB PNG
>we have /aicg/ crossposters here >>101368940
No wonder this general feels so fake and gay.
Some turbo niggerfaggot here bragged about /g/'s intelligence btw
>>
>>101370545
due process back on the menu
what a twist
>>
File: gg.png (5 KB, 529x37)
5 KB
5 KB PNG
some people in data science have a really hard time
>>
>>101368489
They're dumb because they're literal brainlets. A few hundred billion parameters isn't much room for complexity compared to the human brain. It's close enough that you can start to make comparisons, though.

Another 2-10 years of hardware/algorithmic improvements should get us there.
>>
>>101370648
and another 5 until the hardware is available for consumers
I don't plan on living that long
>>
>>101370648
>A few hundred billion parameters isn't much room for complexity compared to the human brain
how many parameters would be the equivalent of a (asian) human brain
>>
>>101370657
do it for her
>>
>>101369861
What model? Gemma's going full retard for me at 16k and I haven't really found anything different aside from miqu for long context.
>>
>>101369313
LLMs can know that they don't know via RLHF. You basically just have to have a made up hallucination on the rejected side and "i don't know" on the accepted side. In effect, it will learn to say i don't know when it doesn't have sufficient confidence in its answer. Ask gpt4o about something really obscure and it'll tell you that it doesn't know. But this isn't true introspection, it could say it doesn't know and you could reroll and then it would know, or it would hallucinate something wrong.
>>
>>101370657
>I don't plan on living that long
I do, but I get what you said, I think humans live too long nowdays, I'm reaching the 30's and it looks like I have nothing to discover left, the AI is probably the last thing that made me feel like an impressed child again, but I believe there won't be anything more than that
>>
>>101370704
Wizard
Unfortunately.
>>
>>101370662
>how many parameters would be the equivalent of a (asian) human brain
Comparisons like that are retarded. Anyone giving you any number is retarded.
>>
>>101369869
Why is 3060 at the top??? Isn't 3060ti better for games?
>>
>>101370648
Doubt it, the whole tech-bro silicon paradigm is just a costly emulation with diminishing returns EVERYWHERE.
The answer is in biotech (Making the universe do the computation for us with chemistry), but good luck lobbying that shit against tech giant's vision of le epic sci-fi robot.
>>
>>101370742
>but I believe there won't be anything more than that
Some 35 year old probably said the same thing about 4 years ago. He's now 39.
>>
>>101370769
4090 is also better but its not on top, suppy and price matters nigger
>>
>>101370527
not if I say that she loves stinky neets in her character card
>>
>>101370778
What are you talking about? Organoids are already being worked on, china already put one in a robot and a youtuber is working on training his neurons to play doom. Biotech is in active development as we speak, that doesn't mean other areas of tech are going to stop just because a different sector is working on something as well.
>>
>>101370306
Some dude was arrested for generating 3d loli pics
>>
>>101370769
vram matters nigger
>>
>>101370804
land of the free
>>
>>101370804
Why are pedo's so fucking stupid literally all the time? If they are going to commit crimes why are they doing it ONLINE where anyone can see?
>>
>>101370741
>Ask gpt4o about something really obscure and it'll tell you that it doesn't know
I just tried unambiguously asking it to name the first album of an obscure band, and it hallucinated badly.
>>
>>101370824
victimless "crime"
>>
>>101370795
/lmg/ is truly the smartest /g/ general
>>
>>101370834
Yeah, like i said, it's just a heuristic. It won't work every time, and it could depend on the topic and a bunch of other shit because we don't really know how they did it. But i've had it tell me it didn't know before. It was like "in which episode of [tv series] does [x happen]" ?
>>
>>101370837
Maybe in the AI generation sense sure, but pedo's as a whole are completely fucking retarded. For example, pedo's are linking each other links to childporn on a clearweb website and think they are "safe" because its an "Obscure" website. They are all so fucking stupid and it pisses me off.
>>
>>101370757
ok what about a black brain
>>
>>101370860
I tried it many times, every time it makes up a different album name with a different year. The last time it searched it on the Internet and still hallucinated a response, linking completely unrelated stuff, despite the fact that it's the first result on Google if you give it all the data I provided (name, city, genre).
Artificial "Intelligence", gentlemen.
>>
>>101370742
>AI is probably the last thing that made me feel like an impressed child again
Goddamn I know that feel
>>
>>101370742
>>101370965
>AI is probably the last thing that made me feel like an impressed child again
same bros, same
>>
>>101370867
The truth is that everyone is this stupid, they just don't have a criminal fetish
>>
Now that gemma works on exllama, do you feel it's better than the "bugged" GGUF or not?
>>
>>101370657
dying is gay
>>
File: rtx 4090.jpg (1.8 MB, 4500x4344)
1.8 MB
1.8 MB JPG
Are there any gemma 27b finetunes for cooming? I need to coom. I need to coom to evil and dark shit. Help me coom please.
>>
>>101371205
no
>>
>>101371451
install linux
>>
>>101371451
kys locustniger
>>
>>101371205
No
>>
>>101371466
>>101371466
>>101371466
>>
File: param_columns2.png (60 KB, 2550x3300)
60 KB
60 KB PNG
>>101370662
we are not even close to brains with the number of the parameters
>>
>>101366438
Patience is a virtue. I get 0.3 t/s.
>>
>>101371451
> I need to coom to evil and dark shit.
sickening filth, begone
>>
File: image.jpg (38 KB, 512x512)
38 KB
38 KB JPG
>>101371574
>dalle3
sickening filth, begone
>>
yangugcun
>>
>>101371451
why do you need a tune for this, gemma does everything with a properly written character, no roleplay experts, uncensored infinite fictions needed, or disabled content moderation policies needed
>>
1300+ ELO on lmsys when



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.