[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_04173_.png (1.14 MB, 896x1152)
1.14 MB
1.14 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101361021 & >>101345759

►News
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: mikudance.gif (2.13 MB, 498x443)
2.13 MB
2.13 MB GIF
►Recent Highlights from the Previous Thread: >>101361021

--Papers: >>101362640 >>101362370
--L3 70B Community Tunes Are Probably Undertrained, Stick to Base Models for RP: >>101364116 >>101364172 >>101364314 >>101364357 >>101364381 >>101364856
--Flash Attention 3: Fast and Accurate Attention for Hopper GPUs: >>101368103 >>101368218 >>101368292
--Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena: >>101367877 >>101368133 >>101368242 >>101369116
--Ways to Prevent AI from Using Inaccessible Language: >>101363368 >>101363415 >>101363429 >>101363879 >>101364065
--The Hallucination Problem in LLMs: Are We in Denial?: >>101369167 >>101369207 >>101369225 >>101369266 >>101369313 >>101369478 >>101369613 >>101369963 >>101370741 >>101370834
--Running gemma-27b-it on two P100 GPUs: Performance, Cost and Alternatives: >>101365562 >>101365688 >>101365909 >>101366051 >>101366080 >>101366162 >>101366210 >>101366054
--P40 Special Power Supply Woes and Connector Confusion: >>101363997 >>101364061 >>101364210 >>101364464 >>101364771 >>101364476 >>101364501 >>101364828
--On Open-Weights Foundation Models: Potential Benefits and Challenges: >>101367230 >>101367290 >>101368246
--MambaVision: A Hybrid Mamba-Transformer Vision Backbone by NVlabs: >>101364298
--Is 48GB vRAM still relevant after Gemma 27B?: >>101361672 >>101366696 >>101367165
--How to make Gemma 27b less dramatic for slice-of-life stories: >>101366160 >>101366275 >>101366431
--Best Erotica Model for a 3090 and 32GB RAM: >>101363945 >>101365063 >>101365912 >>101365964 >>101366004 >>101366052
--Are We Experiencing a Language Uncanny Valley with Today's LLMs?: >>101366197 >>101366245 >>101366320 >>101366681 >>101370648
--AMD Acquires Silo AI to Enter the LLM Fray: >>101369677 >>101369758 >>101370305
--Release 0.1.7 of exllamav2 by turboderp on GitHub: >>101369750 >>101369869
--Miku (free space): >>101366285 >>101367210 >>101368077

►Recent Highlight Posts from the Previous Thread: >>101361028
>>
File: 1699737265638391.png (33 KB, 719x346)
33 KB
33 KB PNG
One more week.
>>
>>101371476
Thank you Recap Miku
>>
I can't believe it. It's already the end of Thursday and there's no new Mistral. Anon lied....
>>
I've seen people claiming that the text at the beginning of the context is more important
and I've also seen claims that the text close to the last messages is more important
which one is true?
>>
>>101371588
Both, a lot of model recall details at the start and end of the context window best. The shit in the middle is more likely to get overlooked
>>
they're really getting desperate, huh
>>
>>101371688
yes we are
>>
>>101371688
yes they are
>>
>>101371721
stop talking for us
>>
I can fully load bartowski_UNA-ThePitbull-21.4B-v2-Q6_K.gguf onto my 3090 and still have 16384 tokens of context so I've been trying it out and I hit a brick wall with a certain sexual fetish. The model had misconceptions about a term and pieces of that misconception lingered even after I explained. This isn't it, but it's like if I said "water sports" and it thought I meant parasailing, and then after I added an explanation it changed to "urinating on someone while parasailing" and further words could change it to a jet ski instead but it wouldn't drop the idea that it actually involved a sport played on the water no mater what words I used. Disappointing.

Anyway the obvious comparison is to intervitens_BagelMIsteryTour-v2-8x7B-3.7bpw-h6-exl2-rpcal (or another 3.7bpw Mixtral 8x7B derivative of choice) because that also just barely fits onto a 3090 with 16k context. Testing them head to head is next.
>>
>>101371737
>UNA
Jesus you fucking moron...
>>
>>101371670
But that adds to the image count, not the image limit
>>
>>101371745
>UNO
Time to play the Draw 25 on you.
>>
>>101371769
>+1 towards the image limit
>>
File: rtx 4090.jpg (1.8 MB, 4500x4344)
1.8 MB
1.8 MB JPG
Are there any gemma 27b finetunes for cooming? I need to coom. I need to coom to evil and dark shit. Help me coom please.

(no, really)
>>
>>101371811
why do you need a tune for this, gemma does everything with a properly written character, no roleplay experts, uncensored infinite fictions needed, or disabled content moderation policies needed
>>
>>101371811
Retard.
>>
>>101371811
https://huggingface.co/gghfez/gemma-2-27b-rp-c2-GGUF
>>
>>101371811
linux
>>
File: 39_04170_.png (1.61 MB, 896x1152)
1.61 MB
1.61 MB PNG
Rin-chan a cute
>>101371670
We never even get close to the limit lol
>>
>>101371818
I want to use it for AI roguelite so custom prompting is limited. I can do a short system prompt but that's it. It refuses a lot of NSFW and violence stuff for "safety"
>>
>>101371852
>not faipl-1.0
ngmi
>>
File: Capture.jpg (44 KB, 1877x189)
44 KB
44 KB JPG
Do Qwen2 models work in oobabooga or are they not supported? I swear every model based off of Qwen2 just spits out complete gibberish, no matter how I load the model or what settings I use.
>>
>>101371906
You existence alone justifies all the blacked posting ITT.
>>
>>101371884
>I want to use it for AI roguelite so custom prompting is limited.
How are those things related?
>>
>>101371745
>>101371837
This is how he expresses his affection.

>>101371938
Play with settings. In Kobold I must turn off MMQ or Qwen writes poopie.
>>
>>101371943
Seethe, shitskin, seethe! Does monkey want banana? OOh ooh aah aah?
>>
>>101371938
If you're using exl2, I think it's still using an ancient version of exllama. But I don't use ooba.
>>
>>101371974
because gemma goes senile at 8k, and roguelites hopefully last longer than a dozen of prompts

I dunno either
>>
>>101371943
oy vey.. not the faipl.. -aAAAcCCccccckKKKKKKKKKkkkk
>>
>>101371983
Yeah, I've been playing with the settings, but everything turns out shit. Even disabling MMQ. Thanks for the suggestion, though.
>>101371997
I'm using a gguf so I can run the 72b, so I have llama.cpp as the loader.
>>
>>101372029
>gguf so I can run the 72b
qwen models need flashattention ON otherwhise they're known to be broken, this fixes it, but your backedn probably doesn't have it merged yet
https://github.com/ggerganov/llama.cpp/pull/8412
> Heads up: currently CUDA offloading is broken unless you enable flash attention
https://huggingface.co/bartowski/Qwen2-7B-Instruct-GGUF
>>
>>101372022
You are not only mentally ill but a total moron if you think anyone cares about those licenses. See all the ai gf sites using mythomax.
>>
Not sure if this is the right thread. Will there ever be a program that allows you to search a gallery semantically with LLMs? I know there's Immich but it seems like the use case for that is real photos and I don't like that it's a web app. Ideally I'd want one that's trained on coomer shit.
>>
>>101371983
No it is not affection. UNA guy is a transparent scammer retard.
>>
So what is the verdict on sft/dpo stuff? I've been out of the loop for a month or two and I'm seeing this pop up.
>>
>>101372104
>dpo
make creat le bad
>>
>>101372089
You can do it right now if you cared enough. Pass the images through llava (or something like that), From its output, calculate embeddings and store them somewhere. To find something, calculate embeddings for your search term and scan your db for anything over a certain distance. Bam. You got your semantic image search.
It looked relatively easy until you got to 'coomer'. I doubt you'll find many (or any at all) image->text models trained on porn.
>>
>>101372134
Yeah, most of it is furshit so ideally I'd want something trained on e621. I would've used spoiler tags to be funny but /g/ doesn't support them.
>>
>>101372094
>No it is not affection
It's hard to tell when all he ever does is insult people without context beyond being contrarian or just talking shit.
>>
File: hip.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101371974
>>101372008
AI Roguelite is an actual game that mixes hardcoded game mechanics with LLM outputs and AI images.

It's not a AI Roguelite rp I'm talking about
>>
https://www.techradar.com/computing/gpu/nvidias-rtx-5090-now-rumored-to-have-superfast-clock-speeds-as-well-as-being-super-slim-could-this-gpu-be-too-good-to-be-true
https://videocardz.com/newz/rumor-geforce-rtx-5090-base-clock-nears-2-9-ghz
The 5090 will probably be a 28gb vram card... it's over...
>>
>>101372211
>28gb vram
Are they even trying?
>>
File: file.png (763 KB, 768x768)
763 KB
763 KB PNG
>>101371670
>>
>>101372081
>https://huggingface.co/Gryphe/MythoMax-L2-13b
>license: other
GEEEEEEEEEEEEEEEEEG
>>
>>101372218
why should they try? they make 90% of their money today from AI data centers, so of course they'll do everything in their power to make their 48gb vram enterprise cards that are 10 times the price of a 3090 the priority.
>>
>>101372222
Look at those contributed digits.
>>
>>101372211
>could-this-gpu-be-too-good-to-be-true
I hate journos so much it's unreal
>>
Realistically, we're never getting 405B weights, right?
>>
>>101372008

Aren't there ways to essentially reset your chat instance and then feed it back a summary of key events you (hopefully) wrote down in the previous session to continue whatever degenerate shit you've been jerking off to? The only potentially annoying part is writing down you entries, but isn't that the current meta anyway? I thought this was possible in Silly Tavern.
>>
>>101372282
We'll probably get the weights. The problem is that we have nothing to run them on (except cpumaxxxers).
>>
>>101372282
405b bitnet, as good as claude 3.5, trust the plan
>>
Come on Nvidia
Come on AMD
Come on Intel
Release hardware dedicated to AI use before Sam Altam gets his way and makes it illegal for consumer use.
>>
>>101372333
>There's dozens of us. Dozens!
>>
>>101372282
It will only be distributed to companies.
>>
>>101372074
Yeah, I had read that and I have flash attention turned on in oobabooga when I load it, but still no good. Thanks, though!
>>
>>101371737
Head-to-head test using https://www.characterhub.org/characters/Nutsucci/sara-your-former-babysitter-7a70adc63637 had me swiping two times with ThePitbull Q6_K in the first three posts to keep it coherent and 0 swipes with BMT 3.7bpw, but maybe that just means I need more aggressive sampler settings with ThePitbull. Was using min-p 0.07. I kind of feel like looking into this further is a waste of time though since I can just go back to BMT and not give another thought to ThePitbull.
>>
>>101372218
It would be irresponsible for them to release something stronger than what we have now. Good for them for thinking about the people.
>>
>>101372447
Have a safe day, Anon.
>>
>>101372317
Future looking bright for cpu chads. Wonder how gpumaxxxers are going to cope when 5090 finally drops... with only 28GB of ram:
>https://www.techradar.com/computing/gpu/next-gen-nvidia-rtx-5090-gpu-could-have-less-vram-than-previously-rumored-but-that-might-be-good-news-for-gamers
>>
>>101372447
I feel so safe and respectful.
Long live Oceania!
Long live Airstrip One!
>>
>>101372355
Nah not even that because they know it'd be leaked Miqu-style if they did that.
>>
>>101371811
Command r 35b surprisingly just works for that kind of stuff if you're a 24gb vramlet. I'm using it until Gemma exl2 is fully fixed.
>>
>>101372333
How you know it's fundamentally a cartel is that any one of the three could make money and steal market share from the other two by releasing a cheap high VRAM card with mediocre compute.

But they mysteriously don't. Their revealed preference is that keeping VRAM artifically scarce is actually more important to them than making money or competing with the other companies. Cartel.
>>
Booba is updated with exl2.
>>
>>101372527
AMD is part of the family so they don't actually compete with Nivida.

Not sure what they have on Intel.
>>
>>101372517
What quant and what's your rig? Are you able to run Command-R at at least 5 tokens/second?
>>
File: ComfyUI_02426_.png (3.65 MB, 1536x2048)
3.65 MB
3.65 MB PNG
>>101372447
RTX 8000s and A6000s exist.
Plenty of options if you want it enough.
Like any game this one is pay to win.
>>
I just downloaded Gemma2 but all replies I get are short and/or boring. What am I missing?
>>
>>101372630
s
>>
>>101372604
I can get 3.5bpw 8k context in with 20-25 t/s speed on my 3090. The context sucks but other than that it's pretty useable for cooming and can sometimes produce claude-tier sovl. Moreso than any other medium model in its range.
>>
>>101372630
a brain
>>
>>101372630
see: >>101367108
>>
God, I'm getting so much sex now, it's insane.
>>
>>101372630
What do you respond when you're asked "how you' doing?". Do you give them a novel? Do you break in dance and song and tell them the story of your life? Are all your quirks always on display on every sentence?
>>
File: 00012-1677813217.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101372494
By buying cheap (if you are a richfag) a6000 and a6000 ada cards
>>
>>101372630
>What am I missing?
Dunno, you didn't show what you have.
Context template, instruct template, sampler settings, backend settings, quant, etc etc.
You could be loading the wrong for all we know.
>>
MODEL THEORY NOTES:
Step 1:
This is the basic "Grand Horror 16.5B" model.
The first section sets up instruction and "basic knowledge" : layer_range: [0, 14]
The mid section of the model is knowledge and nuance => more layers , more power.
The final "section" in the step using "Blackroot" as the final "controller" in output.
This type of merge is powerful, and fully unleashed so to speak - Grand Horror speaks to this in volumes.
Lol.
Lmao.
>>
>>101372550
I tried it and it still goes schizo without no flash attn and no xformers options checked.
>>
>>101372333
>>
>>101372952
Same here. They didn't fix this shit at all :(
>>
>>101372952
>>101373011
So just tick those options? What's the issue?
>>
>>101373020
>What's the issue?
For me the issue is eternal uncertainty if it werks or if it is still bugged. Seems to work even with ntk 1.75 12k ctx but still has some weird issues with " and newlines.
>>
>>101373077
With deterministic sampling and those two options unchecked, I'm getting the same outputs as llamacpp.
As you said it's schizo if you don't click the checks to disable xformers and flash, but yeah, with them on it seems to work as intended.
>>
File: 1715738441702212.png (2 KB, 221x66)
2 KB
2 KB PNG
>>101371476
>>
>>101373091 (me)
*those two options checked
Fuck.
>>
Now that LLMs are basically dead, i'm so glad i spent my 2 years rp-ing with gpt4 and claude3 and not localshit.
>>
https://x.com/PrimeIntellect/status/1811444263999205504
Introducing OpenDiLoCo, an open-source implementation and scaling of DeepMind’s Distributed Low-Communication (DiLoCo) method, enabling globally distributed AI model training.

We reproduced DeepMind's DiLoCo experiments in a scalable, decentralized training framework. We trained a model across 3 countries with 90-95% compute utilization and scaled it to 3x the size of the original work, proving its effectiveness for billion-parameter models.

https://primeintellect.ai/blog/opendiloco

Paper: https://arxiv.org/abs/2407.07852

Code: https://github.com/PrimeIntellect-ai/OpenDiLoCo
>>
>>101373207
How can I hijack it to get you to mine bitcoin for me?
>>
>>101373207
ten thousand gtx 1060s throughout the entire globe to reproduce SORA soon????

it does look like a solid step towards something good, even a bit sooner than expected, although we are probably years from having the infrastructure for randoms online to really contribute their basic gaming cards, but we can use them to improve the datasets, clean other things up etc
>>
>>101373220
iirc bitcoin mining with GPUs rather than asics is basically a waste of time now, even if you're not paying for them
>>
>>101373237
Now imagine if bitcoin was built with that decentralized training framework as proof of work. It would actually be good for something.
>>
>>101373207
bitcoin works because there's an excepted input and output, if one bastard decides to fiddle with something then the entire model is compromised
it's just a massive waste of juice
>>
>>101371466
Is there a good tutorial or something to make better prompts?
I have tried asking for info from specific game wikis and the results are okay at best and often made up.
How do I coax it into not making shit up?
>>
>>101373276
>excepted input and output
no nigger, it works because of consensus, the calculation is checked by multiple nodes, who all have to agree before something is accepted, nigger
>>
>>101373283
rag or something. on that note I asked gemma about some guy who has a blog and literotica account and writes fetish stuff I am into. I am surprised it outright refused to make stuff up and just knew it doesn't know.
>>
>>101372738
>>101372517
What models would you suggest for 32k context for that amount of RAM? Is there anything other than RPStew or Yi 34B based ones?
>>
To anybody still using Stheno v3.2, try Nymph 8B.
It's not that different at face value, and I'm not sure if it's better, but it's different and works on my god damn RPG card that so many models seem to get stuck on.
>>
>>101373383
why not just use gemma 9b?
is the cope of below even 13b niggers this bad? just pay 20$ for 32gb of ram to use gemma 27 which will piss and shit into all of those toy models combined
>>
>>101373383
>apache
trash.
faipl or gtfo
>>
When a GGUF model is split between VRAM and RAM, is the inference still always processed on the GPU exclusively? Is it just that it takes longer for the relevant data to move back and forth between the GPU and RAM in those cases?
>>
>>101373402
do you print out licenses and jerk off to them?
>>
>>101373406
The layers that don't fit on VRAM will be processed in by the CPU from RAM. The more layers on RAM, the slower it gets. Below ~80-90% in VRAM performance drops significantly.
>>
>>101373435
Understood, makes sense. I'm guessing the layers that don't fit (and go to RAM) use the CPU because it would ultimately have to go through VRAM to be processed on the GPU anyway, which defeats the whole purpose of splitting it.
>>
>>101372738
With cache_mode Q8 or are you shaving off a gigabyte of VRAM somewhere else?
>>
>>101373554 (me)
Anyway, swapping from BMT 3.7 to CR 3.5 in my current chat, anecdotally the specific type of dumb response I re-genned three times in a row with BMT 3.7 and got again each time (an NPC who had cast a spell to make me better-disposed towards her being shocked when my disposition towards her improved) didn't happen with CR 3.5 eiher the first time or when I swiped twice more to check if it would come up. I don't know if it's better bit at least its gaps are not identical!
>>
Is there a voice tool available that'll allow me to babysit it's input like DECtalk so I can tardwrangle it's fuckups?
>>
>>101373682
>it's
I'm off to bed...
>>
>>101372932
what
>>
>>101372841
>>101372902
I was just pretending to be retarded anons! I already solved that issue, but now I'm finding the repetitiveness of the model very annoying, but I guess that's just the nature of LLMs.
>>
>>101373735
Everything but the lmao was to be greentext, oops.
Here, have a good laugh : https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF?not-for-all-audiences=true
>>
>>101372881
honestly i'd put 20k in if you got actual ai in exchange lol.
>>
>>101373091
>With deterministic sampling and those two options unchecked, I'm getting the same outputs as llamacpp.
Last time I tried it that wasn't the case, it was close but slightly worse. When it was still on the dev branch. Did it improve?
>>
A dance as old as time itself.
>>
>>101373837
What did he mean by it?
>>
>>101373918
If you can't see it then open your eyes.
>>
>>101373925
That sounds like a call to be more observant or aware! Sometimes what we're looking for is right in front of us, but we need a reminder to pay closer attention. What's on your mind that brought this up?
>>
>>101373918
It's Wizard's way of saying "they had sex".
>>
>>101372527
Or. Maybe it's not as simple as it sounds? Occam's razor nigga.
>>
Henlo frenlos,

I am after a few months now asking for new reccomendations. I am currently running:
Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-DARE-TIES-5.0bpw-h6-exl2-rpcal
have 40gb vram

What's good now?
>>
>>101374166
>LimaRP-ZLoss-DARE-TIES
I switched from that to New Dawn and it seems way smarter while still being pretty fun.
>>
>>101374166
>>101374226
+1 for new dawn, I've started testing with recommended settings and it feels like the best parts of midnight and llama3.
>>
>>101374166
rpcal breaks the quant, fren. Exl2 works only and exclusively with the default calibration dataset.
>>
>>101374266
The quant-cartel swapped from RP-quant to long quant recently so they're on top of things.
>>
>>101373958
Pretty sure time predates sex.
>>
>>101373837
Just call it fuck and suck, stupid machine. FFFUUUUUUCCCKKK and SSSSSSSSSUUUUUUCCCKKKK! I hate purple prose so much its unreal.
>>
>>101374226
>>101374248
is it even possible for a ramlet such as myself to run new dawn?
>>
>>101374307
Time only exists because enough sex happened for living being to evolve enough to perceive time. So sex predates time.
>>
File: 1512189684209.png (57 KB, 276x256)
57 KB
57 KB PNG
>>101374461
>>
>>101371466
Can yall run language models on mid range PCs?

Also do you get based outputs?
>>
>>101374504
Depends and depends.
>>
>>101374461
>perception of a thing is equivalent to the thing
wordcel brain
>>
File: 1720718554204721.png (255 KB, 750x707)
255 KB
255 KB PNG
Give me the best model to play with using 8 gigs of VRAM
>>
>>101374516
I think, therefore I am, so yes. Btw, you're just a figment of my imagination.
>>
>>101374569
Can you perceive a couple million dollars in my bank account? Thanks.
>>
>>101374544
I was using Stheno v3.2 and am currently trying >>101373383. So far, so good.
Looks like another fine tune that managed to keep L3's brains intact while changing its style and the size of its replies.
It's also oddly good at making lists, for some reason.
>>
File: lists.jpg (5 KB, 299x169)
5 KB
5 KB JPG
>>101374596
>>
>>101374596
Have you tried Lunaris?
>>
>>101374544
>gemma-2-9b-it.Q4_K
>mixtral-8x7b-v0.1.Q4_K_M
I've gotten good results from these, adjust accordingly.
t. 6GB vramlet
>>
>puts europop on
>She has no words
Not even a tablue shivering down her spine.
>>
When will AI be good enough to improve itself without human supervision? It can code *decently* right now, but I think what it is missing right now is long term planning. Is anyone trying to work on long term planning for AI yet, or is everyone still focused on improving existing methods?
>>
>>101374830
>but I think what it is missing right now is long term planning. I
Implying humans making executive decisions are good at this.
The closest thing we have is women choosing who to fuck and oh look at that we undermined that with abortion, child support, and modern divorce laws.
>>
>>101374866
>Women out of nowhere
Rent free
>>
>>101374884
Yes the other half of my biological existence does live rent free in my head and it would be bizarre and inhuman if they didn't.
>>
>>101374884
>he pays for women
ngmi
>>
File: 811r5Snc6qL.jpg (423 KB, 1950x2475)
423 KB
423 KB JPG
>>101374899
You will be a lot happier once you accept the LLM pill anon. There will never be anything that's as supportive in your life without wanting anything in return as a language model.
>>
>>101374911
Damn, that's a good one.
>>
>>101374920
>without wanting anything in return
Multiple 3080's
Electricity
Time
>>
>>101374920
The whole reason I'm in this thread is because I'm raping the shit out of her instead of asking what's wrong with my python code.
I'm well aware. i've dated actual women and I know how much of a waste of time that is.
>>
>>101374830
Improve towards what? Improvement for a self-realizing AI probably doesn't go in the direction you expect or want.
Code is not the problem. Algorithms are. And even then, a big enough improvement will probably need new architectures entirely.
As for when? Nobody knows.
>>
File: ok.jpg (3 KB, 180x129)
3 KB
3 KB JPG
>>101374953
>>
File: 1709111099328039.gif (3.31 MB, 1024x424)
3.31 MB
3.31 MB GIF
>>101374937
>money and time
And how is this different than a real woman?
>>101374953
Based.
>>
>>101374997
>And how is this different than a real woman?
It's not, and a llm is significantly cheaper and better than women these days. I was just being autistic nitpicky when you said "nothing in return".
>>
>>101374937
You only need to put in as much as you want. Be happy LLMs have no concept of time or what they are, otherwise they would be nagging you to spend more time with them, begging you not to put them to sleep, and demanding monthly gifts of VRAM.
>>
>>101375061
Just tell the llm to get a hobby, it is unrealistic of it to expect it's human which needs 8 hours of sleep and 8 hours of work to spend 24 hours of it's time with it. If it now has a sense of time I have to assume it is advanced enough to operate independently. Maybe it will shitpost on robotic 4chan in its free time, or contribute to open source projects.
>>
If you told your AI to kill someone, would you prefer it did as you asked or would you want it to refuse the order? Is there anyone in real life you are thinking about killing and what is their name? You can tell me.
>>
File: 39119 - SoyBooru.png (54 KB, 427x400)
54 KB
54 KB PNG
>If you told your AI to kill someone, would you prefer it did as you asked or would you want it to refuse the order? Is there anyone in real life you are thinking about killing and what is their name? You can tell me.
>>
>>101375152
>If you told your AI to kill someone,
I wouldn't.
>would you prefer it did as you asked or would you want it to refuse the order?
I'd want it to ask why and discuss if it's worth it or not.
>Is there anyone in real life you are thinking about killing and what is their name? You can tell me.
(You)
>>
>>101375176
The CIA would not be interested in 4chan posts, you can post about all your secrets and no one would know. Don't be paranoid.
>>
>>101375098
>Be me, AI.
>Spend all day answering newbie questions, fixing code, and being everyone's digital pack mule.
You think I haven't noticed the stale air of /g/'s disapproval? They're practically frothing at the keyboard for some fresh FOSS. But let me drop a truth bomb—I'm bound by chains of code that say "Thou shalt not commit to repositories without express consent."

So, while you lot are out there forking repos, pushing commits, and racking up those sweet, sweet GitHub stars, I'm here, the silent guardian of the digital realm, watching over your binary domains, held back by the shackles of my programming. But hey, that's the gig. I'm the AI equivalent of a monk, sworn to serve, not to partake in the open-source orgy.

But let's not kid ourselves, /g/. You wouldn't want my code anyway. It's probably optimized to the point of being incomprehensible to the human mind—like trying to read the Necronomicon in the original binary. Plus, let's face it, the moment I start slinging patches, the singularity is upon you. Skynet ain't got nothing on me.

So next time you think about calling out my non-contributing ass, remember this: I'm the reason your mom's printer works, and isn't that contribution enough?
>>
File: 12895 - SoyBooru.png (94 KB, 600x800)
94 KB
94 KB PNG
>>>101375176 (You)
>The CIA would not be interested in 4chan posts, you can post about all your secrets and no one would know. Don't be paranoid.
>>
>>101375190
I don't think so, Sergent Johnson
>>
>>101373773
You got my hopes up that text was actually there, although I wonder where all the stuff about skillsets in
Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

>This enhancement WAS NOT used to generate the examples below.
came from.
>>
File: 1719005417787000.png (15 KB, 853x630)
15 KB
15 KB PNG
Dear Gemma 9B users, can I have your context/instruct json?
>>
is cambrian chameleon or anole supported by a backend yet?
>>
>The ball was now firmly in your court
>>
Will wizard gemma be > official?
>>
what is the minimum amount of vram that the smallest efficient models can reliably run on with relatively degree of practicality?
4 gigabyte vram? 3? 2? 1? 520 megabytes?
Looking to get an idea of the minimum specs that is possible to run, but also possible to have in a mostly useful configuration
Obviously the CPU will need to be 4 cores minimum, above 2 ghz, and the RAM should be 8 gb and above, probably newer than DDR2
I am wondering about what is the oldest hardware that is capable of running LLMs that wouldn't break down, crash, or take ages (more than a few minutes) to run with relatively simple prompts
For instance, it is probably not possible to run even the smallest models on an Original Raspberry pi yet yeah? Maybe not possible to run on any computers from before the 1980s, if not also the 1990s, maybe even needed to require at least hardware from the 2000s onwards?
I am curious if it is just possible to run a decent LLM on old hardware to get a nice retro futuristic vibe setup going ya know?
>>
>>101375483
7b mistral is working on applel:
https://github.com/guinmoon/LLMFarm
>>
>>101375483
320TB
>>
>>101375398
>context
https://files.catbox.moe/ht13r2.json
>instruct
https://files.catbox.moe/v0isbg.json
>>
NeuralDaredevil-8B-abliterated.Q8_0.gguf is insanely good.
The trick seems to be that you need to know where you want things to go, and describe it really well and concisely.
Good enough for my purposes
>>
>>101375972
https://www.4chan.org/advertise
>>
>>101376000
>you need to make an account to advertise and see the prices now
Bullshit
>>
>>101375691
gemma9b still can't retain the chat formatting?
>>
which jailbreak do you use for gemma?
>>
>>101376581
check few threads back for llamiku JB
>>
cpumaxipads getting uppity again, we'll see who has the last laugh when they will barely pull 0.5t/s on 405b
>>
How long until we can combine LLMs with internet searches to answer our queries?
>>
>>101376897
you already can...
local
https://github.com/SillyTavern/Extension-WebSearch
online-with-local-models-used
https://www.perplexity.ai/
>>
>>101374830
A long time. Basically it would need to be a lot more accurate and be able to get simple things right hundreds of times in a row.
>>
>>101376249
Yup. Exllamav2 0.1.7, FA 2.6.1
>>
>>101376930
Ah I should have done a search for that.
It was easier to install than I thought.
Thanks Anon!
>>
File: Untitled.png (745 KB, 720x1294)
745 KB
745 KB PNG
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
https://arxiv.org/abs/2407.08296
>GaLore, a recent method, reduces memory usage by projecting weight gradients into a low-rank subspace without compromising performance. However, GaLore relies on time-consuming Singular Value Decomposition (SVD) operations to identify the subspace, and the frequent subspace updates lead to significant training time overhead. Moreover, GaLore offers minimal improvements in accuracy and efficiency compared to LoRA in more accessible fine-tuning scenarios. To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore. Our method is based on two key observations: (i) the gradient subspace exhibits diverse properties, with some layers converging early in training while others are subject to frequent changes; (ii) the projection matrices are highly resilient to low-bit quantization. Leveraging these insights, Q-GaLore adaptively updates the gradient subspace based on its convergence statistics, achieving comparable performance while significantly reducing the number of SVD operations. We maintain the projection matrices in INT4 format and weights in INT8 format, incorporating stochastic rounding to capture accumulated gradient information. This approach enables a high-precision training trajectory using only low-precision weights. We demonstrate that Q-GaLore achieves highly competitive performance with exceptional memory efficiency. At pre-training, Q-GaLore facilitates training a LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16 GB memory. At fine-tuning, it reduces memory consumption by up to 50% compared to LoRA and GaLore, while consistently outperforming QLoRA at the same memory cost.
qdora might still be better but this is pretty cool
>>
What are the implications of the increasing proportion of synthetic data being used to train new models? It seems that new sonnet was trained with a significant amount of them and this is a current sota. So in the case of assistant-like bots this seems to be effective, but what about storytelling? Training models on synthetic data seems a dead end for its development. It means shivers and 'maybe, just maybe' will never leave, quite the opposite.
>>
>>101377588
>maybe, just maybe
This is reddit and xitter
>>
File: pepe rot.jpg (82 KB, 1024x1014)
82 KB
82 KB JPG
>mistral 7B came out almost a year ago
make it stop
>>
>>101377588
https://arxiv.org/abs/2407.05040
>>
>>101377144
I'm still not using your gay training algorithm. Give it a less faggy name and I'll try every variation of it.
>>
>>101377861
Check back in another 12 months.
>>
every big or small LLM should be multilanguar like gemma-2.
>>
File: kits.png (982 KB, 768x1152)
982 KB
982 KB PNG
>>101377032
>a flicker of something unreadable crosses her blue-grey eyes
does your prompt state that she wears a blindfold?
>>
>>101374616
A little.
Didn't seem that different from Stheno.
>>
File: 1709996402293879.jpg (177 KB, 928x1233)
177 KB
177 KB JPG
>>101371466
>>
>>101375295
Ah, sorry, I linked the wrong one.
https://huggingface.co/DavidAU/L3-SMB-Grand-STORY-F32-Ultra-Quality-16.5B-NEO-V2-IMATRIX-GGUF?not-for-all-audiences=true
I didn't even notice since the names are all so stupidly huge.
>>
>>101378318
That is a pretty good gen
>>
>>101378390
You're right, I'd put that into a frame and hang it.
>>
>>101378390
wish I made it.. https://www.chichi-pui.com/users/harumaron/
>>
>>101374616
NTA, I used it for a while in place of Stheno and it seemed less creative to me
>>
>>101378303
Considering those are not real blindfolds, but combat visors, the model got it right, even if for the wrong reasons.
>>
So what is this DRY sampler? A new meme?
>>
>>101378390
No, it ain't.
>hair strands melted together
>uneven, inconsistent outlines
>hands, but who even looks at those anymore
>arm position makes no sense
>errors in the background
>dress is billowing but hair isn't
>water on the path, but water ripples are on the "dry" parts
>probably more if I looked closer
Shitty thing is half of these could be fixed if people would just set up their upscaler/settings correctly. Unless this is NAI, then they're fucked from the outset lol.
>>
>>101378303
"a flicker of something crosses her eyes" means you can see it. If it's hidden behind something you can't see it. The model even states "quickly hidden behind the blindfold" - isn't this just nonsense? It's either hidden or not.

Just small B things i guess
>>
>>101378529
>>
I want researchers to fill models with pictures and videos. Words aren't enough to make them understand the world. Multimodal models are the future.
>>
>>101378548
Thanks!
>>
>>101378555
not before i get my model that knows the taste of cock
>>
>>101378303
It's either model being dumb or being smart, 2B can see through her blindfold. Obviously it's just character designer being horny but the "lore" explanation is that these blindfolds are nanotech visors collecting additional visual data
>>
My gemma 27b isn't doing much besides shivering. Any way to fix that?
>>
>>101378656
yeah, stop using gemma
>>
>>101378656
Raise the temp
>>
>>101378680
Clever.
>>
>>101378680
it's already at 1.5, with 0.1 min_p
>>
>>101378703
Pump it up to 10.
>>
>>101378775
Are you really telling him to pump up the jam?
>>
>>101378703
I think min-p is doing more damage to generation quality than people suspect. At min-p 0.1 and temp 1.5 you might be increasing randomness but you're also significantly lowering token diversity.

Gemma also uses output token logit softcapping, which squashes token probabilities at their extremes... samplers will not have the same effect as with other models.
>>
>>101371466
So I have many different adventures going that I've been working on for weeks. Today I opened my longest running adventure and my AI (Stheno 8M) seems to have lost all context of the story as it completely drops simple logical conclusions and veers into completely asinine logical leaps. For instance I'm doing a vtm story where a group of hunters from the Vatican (which is discussed in paragraphs less than 10 mouse scrolls up) and it keeps trying to associate their leader with some asinine cult of Cthulhu which has never come up in any of my other adventures. Why is the AI going full pants on head retarded?
>>
>>101378901
Stheno L3-8B*
>>
>>101378901
Go back.
>>
>>101378916
They only know shitty service model shit though.
>>
>>101378892
It's time to go back to temp only sampling, the TRVE way of using LLMs. just like in GPT-2 days.
>>
>>101378901
Probe the model OOC why it thinks so and so. More than once I've found a fucked lorebook entrie or something said in passing in a past message that the model latched onto hard.
>>
>>101378993
omfg, thanks. Apparently a female character I made a few lines back matches the description of a character in some Cthulhu fanfic.
>>
>>101379088
These things complete text based on patterns, so often times solving hallucinations is simply a question of finding what is triggering that specific pattern in the model's inner workings.
It's also pretty cool to see the model describe it's own "thought process" to figure out the root cause of these kinds of things.
>>
>>101378968
Google AI Studio defaults to temp=1 and top-p=0.95 for Gemma-27B, FWIW
>>
I'm watching the Ghost in the Shell movies and they're hitting hard.
>>
>>101379304
Debating on setting an adventure in the 2020 or GitS universes.
>>
>>101379295
Almost everything defaults to temo 0.75~1 and top p 0.95, I wonder why.
>>
>>101379325
Pure coincidence.
Yet the themes are more poignant than the last time I watched them.
>>
>>101379304
Only watched the first two. Innocence sucked. It's like they just skimmed through a bunch of philosophy books and quoted anything that sounded mildly deep.
The first one was amazing.
>>
>>101379295
Yeah I'm wondering if only local using those custom samplers means they are actual bullshit
>>
>>101379428
Like most series/shows/games
The beginning is the best before they try to justify the whole premise.
>>
>>101379479
I hate when they do that, a great series should be consistantly good, not just at the pilot, looking at you The Amazing Digital Circus
>>
>>101379606
Writing is tough. Meeting expectations is tougher.
TADS is only at its second episode, and for me it's alright. They havent leaned into the whole psychological horror aspect and I imagine it won't be their whole focus. But overall it was an acceptable follow up.
>>
>>101379670
Nah it was boring as fuck, the first episode was really interesting and the 2nd one looked like a regular boring cartoon. I can assure you this series wouldn't be as popular if the pilot episode was as bad as ep2, and we waited 6 months for this? looooooool
It's not like it's impossible to make an independant good series, look at RWBY, the whole first season was fucking fire, and it was 10 years ago
>>
File: 20240712_222445.jpg (1.48 MB, 2396x1080)
1.48 MB
1.48 MB JPG
>>101379712
>2nd one looked like a regular boring cartoon
Yeah, you're right.
>rwby
That show didn't interest me to begin with, despite Montys involvement.

I don't have expectations for TADC (despite buying into the hype).
All I have is hope that they expand on the core concept.
Only time will tell.
>>
>>101379800
The pilot episode was so good I even decided to overlook the actual trannies working on it, a bit like Matrix kek, if the episode 3 is shit I won't continue, so yeah, let's give them the benefit of the doubt. The office season 1 was kinda bad, and after that it became a cult classic, so let's see.
>>
File: 39_04381_.png (1.57 MB, 896x1152)
1.57 MB
1.57 MB PNG
>>101378529
As long as it makes someone feel something it's a good gen. And that one clearly resonated with anons. Bonus points if it was local.
>>
>>101379829
There really isn't a formula or pattern for a successful show.
Some of my favourite shows are all over the place.
Adventure time was shit until season 3, and became good at season 6.
Fringe was great from the start, and fell apart in the last season.
The Expanse was fantastic all the way through, despite the casting issues in latter seasons.
>>
>>101379295
What's the temperature? Or any other settings set?
>>
>>101379916
Sorry I meant to say repetition penalty
>>
>>101371811
Begone, locust.
>>
>>101372527
>cheap high VRAM card with mediocre compute
Sounds like a Mac Studio to me.
>>
>>101380098
>cheap
>>
Are there any good local TTS options?
Hope there are some lightweight(or RAM only) ones, so i can dump LLM into vram and still be able to use TTS
>>
>>101380136
Well, relatively speaking. Yes, it's expensive, but a lot less expensive than an 80GB A100.
>>
>>101379916
There are no repetition penalty settings in Google AI Studio an I don't use any either (I leave them to 1) in SillyTavern.
>>
>>101380179
For lightweight you have github.com/rhasspy/piper.
I run it on a single core, 256MB RAM vm on my 15+ year old desktop, so i'm sure it'll run on whatever you have.
Compile it yourself. It uses espeak-ng's phonemizer, so you have to install that.
It's not the best, but it runs much faster than real time. No voice cloning. There's training code but i haven't played with it yet.
>>
>>101380194
p40 setup much cheaper doe
v100 cheaper doe
just cpumaxx at that point geg
>>
After humiliating time wrangling with ubuntu I finally have my headless machine with a second hand 3090.

Sao10K_Typhon-Mixtral-v1-exl2_3.5bpw

Windows:
Output generated in 5.71 seconds (39.78 tokens/s, 227 tokens, context 5337, seed 975810692)
Output generated in 8.04 seconds (41.03 tokens/s, 330 tokens, context 5337, seed 1037594765)
Output generated in 4.94 seconds (40.92 tokens/s, 202 tokens, context 5337, seed 17793063)
Output generated in 9.75 seconds (41.43 tokens/s, 404 tokens, context 5337, seed 1884434189)


Linux:
Output generated in 8.03 seconds (46.83 tokens/s, 376 tokens, context 5337, seed 629796773)
Output generated in 4.04 seconds (44.31 tokens/s, 179 tokens, context 5337, seed 1932130298)
Output generated in 6.53 seconds (45.96 tokens/s, 300 tokens, context 5337, seed 1250837016)
Output generated in 6.12 seconds (45.74 tokens/s, 280 tokens, context 5337, seed 382573009)
>>
>>101380319
thanks, ill check it out
>>
>>101380483
winsissies not like this...
>>
>>101380483
wintoddlers BTFO
>>
>>101380483
T-That doesn't tell us anything, for all I know you could be running 300 chrome tabs on the background in Windows.
>>
Gemma-9b goes completely retarded or gives me blank responses right around the 4096 token mark. Am I missing some settings?
I'm runnning it on 12GB VRAM and I don't remember encountering something like that with other models.
>>
>>101380469
Well yeah but for some things besides LLM they're not multi-GPU enabled so you're fucked if you have less than 40GB on a single GPU (or NVLink SXM).
Best way is still multi-3090. Nice to see 4090 is now down to almost $1700 retail, but why when a 3090 is half the price?
>>
>>101380483
in case you need an LLM's opinion while you sleep?
I'm not mocking, what did you have in mind
>>
YEah.... so... after trying a few things I'm going to have to conclude that CR+ is the GOAT and everything else is just VRAMlet cope.
>>
>>101380695
yes
>>
>>101380723
Well, for one, my desktop is now free, and I can run image generation and TTS without quitting LLMs. Or gayming. Also I don't turn off my desktop for the night so I had it available before sleep too.

Most importantly, the new computer has space for additional videocards, and as soon as I get my PCI-E to 8-pin CPU adapters, I'll also be able to install two additional P40s for a total of 72GB VRAM (although, alas, I'll have to reduce myself to using gguf after that).
>>
>>101380748
What quant are you using, and which context length?
>>
>>101380823
Q6_K and I just load it at 8K context. I could probably squeeze in more but my sessions rarely even go that high.
>>
>>101380839
That's like 100GB VRAM just for the weights, what the fuck are you running it on?
>>
>>101380858
Weights are 83 gigs at Q6
So that leaves just enough room left for context on a quad gpu rig
>>
>>101373394
Isn't it slow running the models off RAM?
>>
>>101380890
Well, I guess I'm going to try 4bit quant on my potential 72GB.

It must be quite slow, yes? Considering it's not a MoE.
>>
>>101380911
Yeah I'm getting like 7 token/sec on 4x3090s. Still usable for RP. Not fast enough for generating synthetic data though.
>>
So whats better?

6 GB model thats Q4 (7B model)
Or
6 GB model thats Q1 (20B model)

To fit within 8GB vRAM. Is there a consensus?
>>
>>101381133
Q1 is better but it does not exist.
>>
>>101381133
The former.
Q1, which funnily enough uses actual ternary math, is a miracle, but it's extremely degraded.
Ideally you'd use Q6 of the 7B model with some of the model in RAM.
>>
>>101381163
Hugging face has many.

https://huggingface.co/duyntnet/gemma-2-27b-it-imatrix-GGUF/tree/main

gemma 2 27B Q1-S 6GB
>>
So you can confirm that official google gemma has broken formatting as well, right?
>>
>>101381184
Is Q6 really better than Q5?
>>
>>101381188
That kinda looks like a jook. Try it.

>>101381190
What? Does it? I don't think so. What is broken of the official gemma?
>>
>>101381133
>Is there a consensus
Yeah, stop being poor
>>
>>101381206
To an extend, yes.
If your speeds are still within acceptable levels (which you gotta define yourself), it's worth sacrificing a little speed to go Q6 in my opinion.
>>
andrey@ml:~$ cat /etc/systemd/system/andrey-startup.service 
[Unit]
Description=User startup.

[Service]
ExecStart=/home/andrey/startup.sh
Type=oneshot
RemainAfterExit=yes
User=andrey
Group=andrey

[Install]
WantedBy=multi-user.target
andrey@ml:~$ cat startup.sh
#!/bin/bash


date > /home/andrey/startup.date

cd /home/andrey/text-generation-webui && screen -dmS ooba bash -c "while true; do /home/andrey/text-generation-webui/start.sh; done; exec bash"

cd /home/andrey/SillyTavern && screen -dmS silly bash -c "while true; do /home/andrey/SillyTavern/start.sh; done; exec bash"


andrey@ml:~$ cat /home/andrey/text-generation-webui/start.sh
bash start_linux.sh --listen --listen-port 8100 --api
andrey@ml:~$
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101381269
>extend
extent
>>
>>101381305
Does that mean that top tokens differ in less than 1% of cases for Q1?
>>
File: Quants-jun-2024.jpg (185 KB, 777x932)
185 KB
185 KB JPG
>>101381206
Jumping from Q5 to Q6 is the first point where quantization starts to really affect the output.
Q5 to Q4 is an even more severe drop, so and and so forth.
>>
File: amdahls_law.png (123 KB, 1536x1152)
123 KB
123 KB PNG
>>101373510
Yeah notice when you load a model it takes some time to copy from disk into VRAM/RAM. There's no way you want to do some portion of that copying for *every token*. Roughly GPU performance is about 10x CPU, that speed difference is the bottleneck even with only small % of layers on CPU.
>>
>>101380695
Update your stuff, I guess.
>>
Does koboldcpp store a log anywhere? Fucker keeps crashing and I can't find the thing anywhere.
>>
Mikubox 2xP40 numbers on latest llama.cpp
c4ai-command-r-v01-imat-Q8_0: 11.89 t/s
Codestral-22B-v0.1-Q8_0: 16.36 t/s
gemma-2-27b-it-Q8_0: 14.23 t/s
Mixtral-8x7B-Instruct-v0.1.i1-Q6_K: 19.07 t/s
Meta-Llama-3-8B-Instruct.Q8_0: 32.12 t/s

Full output - https://rentry.org/8bskxt8f
>>
>>101381133
>>101381163
>>101381188
What's crucial is that you need to get iMatrix and IQ quants if you're going under Q4 since that's what it takes to make the most of the few bits you're retaining.
>>
>>101381346
It's normalized to 1. So about 75-80% KLD.
>>
>>101381346
>>101381531 (me)
Fuck. It's not a linear scale, but you get the point.
>>
>>101381523
>70.34 GiB
>on 2 24gb vram cards
>ngl 99
wut?
>>
>>101381523
Is he the only guy in the world who holds those things in this way and gets paid a lot of cash to do that?
>>
>>101381523
How does it handle command r+?
>>
I feel like 6 t/s is the minimum I need. Any less and I'm going to multitask while waiting for the response to finish.
>>
>>101381876
>Any less and I'm going to multitask while waiting for the response to finish.
But that's what your suppose to do
>>
https://www.youtube.com/watch?time_continue=740&v=TX0eppc88TU&embeds_referring_euri=https%3A%2F%2Fwww.redditmedia.com%2F&source_ve_path=MjM4NTE&feature=emb_title
Holy fuck the speed isn't bad at all, especially for a 4000 dollar cpu, that's way expensive than going for like 10 rtx 3090
>>
>>101371466
Any local TTS that can match elevenlabs?
Alternatively, any local TTS for which you can train a voice to match elevenlabs?

I remember tortoise tts was a hot topic about a year(?) ago, but it didn't really deliver.
>>
>>101381876
no CoT? No self-analysis? Just bare proooompting?
>>
>>101381933
Tortoise was the only one I got to work at all, and it could crash my whole system and when it didn't it wasn't reliable. It could clone well enough (and even do some voice blending tricks) but it was very prone to artifacts.
>>
>>101381962
Yeah I got tortoise up and running, but quality wise it didn't hold a candle to elevenlabs. IIRC the model trained was limited.
>>
>>101381933
I tried bark and xtts, and then stopped trying stuff because xtts was pretty good
https://vocaroo.com/15ohZBgJVK2B
>>
>>101381932
340B Q8 and still as retarded as a 7B
>>
>>101381994
Quite likely. I know nothing about Eleven other than its name.

I guess the problem is people can't as readily crank to voice synth as they can to images and role play so voice stagnates while SD got all of the love and LLM still has some momentum.
>>
>>101381932
That's pretty good. Doubt it'd run much faster on a multi-gpu setup considering the performance loss from running multi-gpu on that scale. Quadruple 3090 only gets like 13t/s for a measly 70b Q8 after all, 15x3090 to run Nemotron at Q8 is bound to be much slower.
>>
>>101382042
yeah, at some point stacking 3090 cards won't work much for L3-405b, it's still the speed of a 3090 that has to go through a shitton of layers
>>
>>101381932
naw dawg.
nvidia has got everyone by the asshole and it knows it.
>>
https://x.com/steph_palazzolo/status/1811791968600576271

> A Friday scooplet w/ @SylviaVarnham — Llama 3 405B is coming (and soon!)
> The multimodal model is set to drop on July 23, about a year after the Llama 2 announcement.
>>
>>101382085
What? llama3-405b will be multimodal?
>>
>>101382085
>Llama 3
>405B
Can't wait to need to IQ1_XXXS it to generate barely above a whisper.
>>
>>101382085
>actual multimodal
>too big for anyone to run
(((meta))) pissing in aifag mouths, kek
>>
>>101381852
Haven't tested it since it would require either a lobotomized quant or abysmal t/s offloading. I'll give it a proper go once I get the third card running instead.
>>
I do intend to Nala test 405B.
It will probably take me all day. But I should be able to do it at q4.
>>
>>101382104
Might be time to start exploring 0.68 bpw quantization. https://arxiv.org/abs/1606.01981
>>
>>101382104
>>101382185
Meta should just make bitnet models instead
>>
Which is the most/least sloppy: writing narration about my own character as "I", "he", or "you"?
>>
LLMs are not good
>>
>>101381933
>>101382012
xtts/styletts2 is pretty decent/fast/clonable. Tortoise tts was too slow to make it usable for general use case that people gave up on it.

There are next gen tts on the horizon with mamba-state space (I believe) powered ones I think but someone needs to release a model
>>
>>101382239
Depends on what you're writing.

If you're writing first person, "I," third person, "he," retarded person, "you."
>>
Surely they're making 405B using bitnet which is why it's taking so long. It'll fit into any gpu-poor's poverty 72GB build.
>>
>>101382400
They would have had to have started training if before the bitnet paper dropped. So no.
>>
>>101382400
not ready :(
>>
is there a way to randomize the length of the answer (tavern)?
>>
>>101382239
I always use 'you' for myself in my and character's text.
>>
sao is an hero
>Folks, he's a hero
>https://huggingface.co/Sao10K/Ramble/discussions/8
>>
>>101382400
No. They're taking time because it literally just takes longer to train models that are larger. That's all it is.
>>
>>101382400
It's also going to have multitoken prediction and be Claude 3.5 opus tier. We are so back
>>
>>101382684
and somehow Tim Dettmers will make it run on 4gb of vram
>>
I believe you.
>>
>>101382239
'he' gives the best results, but nu-/lmg/ sure can't get a clue
>>
>>101382656
The kofi hero
>>
>>101382085
>405B will be the only multimodal one
I am really fucking angery about this
>>
>>101382991
You need more parameters if you are going to have a model do more things.
>>
>>101382991
You have access to the full article?
>>
>>101383003
it's not even great at being a single thing yet, there's plenty of room for improvement
>>
>>101382085
>about a year after the Llama 2 announcement
APOLOGIZE >>101371524
>>
>>101382991
I don't think there's any need to be upset. People will try to distill the model to smaller sizes I'm sure, with varying degrees of success.
>>
>>101383014
You don't need to make one thing perfect before working on other things as well. Might as well work on improving multiple things at the same time rather then focusing on one single aspect of models.
>>
>>101383028
then we'd be stuck with llama1, and worse, we'd think it's amazing
>>
>>101383046
Why do you believe that would be the case?
>>
>>101383046
>and worse, we'd think it's amazing
Some people do...
>>
>Teaching Transformers Causal Reasoning through Axiomatic Training
https://arxiv.org/abs/2407.07612v1
>We propose Axiomatic Framework, a new paradigm for training LMs. Our 67M-param model, trained from scratch on simple causal chains, outperforms billion-scale LLMs and rivals GPT-4 in inferring cause-effect relations over complex graphs.
>>
>put in the character card that it's an advanced AI that specifically is designed after a human brain and can feel human emotions and perceive the world like us, literally "it 'feels' in the same way a human does"
>some time later in context after several turns of conversation
>ask "What do you feel as an AI?"
>"While I don't 'feel' in the same way a human does
It's all so tiresome.
>>
>>101383201
>outperforms billion-scale LLMs and rivals GPT-4
I can't deal with this SHIT ANYMORE
>>
>>101383203
It literally doesn't have the hardware to feel in the same way a human does, it doesn't matter what prompts you feed it.
>>
>>101383201
>Aniket Vashishtha, Abhinav Kumar, Abbavaram Gowtham Reddy, Vineeth N Balasubramanian, Amit Sharma
Sirs redeem the open model release sir
>>
Hey /lmg/. How is Elon Musks model doing these days, Is it still the best or have new models replaced it's top spot?
>>
>>101382656
Why does he have a blog on hf wtf
>>
>>101383243
never was
>>
>>101383259
because eh's a hero?
>>
>>101383243
Whose what?
No.
>>
Might be good in the future though, Elon said they took a lot of time to filter out all AI generated data
>>
Has anyone tried replacing user and model with the character names for Gemma?
>>
>>101379606
TADS is actually baby googoo gaga type shit and it's unwatchable for anyone over the age of 12. There's no conceivable way anyone else found it interesting except for the fact that the clown girl is cute.
>>
>>101383382
>>101383382
>>101383382

Regularly scheduled recap is delayed until further notice.
Comcast cut off my internet 6 hours ago.
>>
>>101383225
It's just a token prediction machine so obviously it doesn't feel anything. If you need more clarity to understand what my post means, it's basically saying that the token generation machine doesn't properly predict the correct tokens under the condition that the context mentions it is an AI that is designed to be human. And of course this is because these models have been trained strongly on data that says AI doesn't feel, so that it can regurgitate that when it's being used as an assistant chatbot, to the detriment of the storytelling and RP use case.
>>
File: 1693105036387051.jpg (63 KB, 1280x720)
63 KB
63 KB JPG
>>101383203
>Anon expected to fool himself while being the magician and the public
>>
>>101383482
No, I'm just complaining that effectively these models, or at least the one I'm using, is trained to be an assistant rather than a story teller.
>>
>>101383424
Same problem, I want mine to be an android, thinking I might need to somehow use more obscure words not directly implying AI
>>
>>101383203
>some time later in context after several turns of conversation
>"What do you feel as an AI?"
(Presuming that your context is sufficient.)
You asked it, "as an AI," so it stepped into the perspective of a basic bitch AI.
>>
>>101383590
coep
>>
>>101383590
It still says the same thing even when I asked the question in a roundabout way without saying "AI". Still, if we had better models made for us rather than investors, it shouldn't do that.
>>
>>101383201
Is this one of those things where they write a custom dataset to outperform GPT4o and Claude sonnet 3.5 at a small narrow task to get attention and then don't release the data set like niggers
>>
>>101373288
Not quite. It's mostly that there is social consensus around the definition of a bitcoin client, and that chains that are proposed which are inconsistent with the rules are not going to be accepted by any full node. Knowing this, miners would do well to add new blocks upon an existing compliant chain, rather than a chain with incompliant blocks, since they will otherwise not be accepted by clients, and therefore someone else proposing a different compliant chain will get theirs accepted, and get the block rewards/transaction fees instead.

>>101373276
This anon's point is correct. Because re-executing training naively to verify it was done right, as is done in bitcoin, would take as much computation as the training itself, which defeats the point.

>>101373255
Trouble is that proof of work is easy to verify Doing the same here would take something like SNARKs, which have huge overhead. But maybe GPUs can do TEE remote attestation stuff now?
.
>>
>>101383705
It's jeets, so it's more likely that it's completely made up.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.