[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102068958 & >>102058880

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102068958

--Testing model intelligence with SVG output and image drawing prompts: >>102079522 >>102081179 >>102081305 >>102081339 >>102081435 >>102081576 >>102081615 >>102081625 >>102082050 >>102082121 >>102082065 >>102082345 >>102080359 >>102080804 >>102082930
--Recommendations for learning neural network basics: >>102069696 >>102070090 >>102070097 >>102070256
--Proposal for "phrase_ban" feature to reduce repetitive phrases in llama.cpp: >>102073398 >>102073451
--LLMs struggle with world-modelling, humans not much better: >>102069670 >>102069967 >>102069995 >>102070026 >>102070110 >>102075602 >>102075746 >>102076112 >>102076892
--Frankenmerges and dynamic approaches to model generation: >>102079290 >>102080652 >>102080692 >>102080770 >>102080806 >>102080898 >>102080967 >>102081134 >>102081226 >>102081309 >>102081552
--Anon asks about multimodal models for text and image input with llama.cpp: >>102081155 >>102081189 >>102081329 >>102081354 >>102084578 >>102081312 >>102083801
--Anon asks about Hermes models, LoRAs, and using llama 3.1 with a long context window: >>102080140 >>102080249 >>102080250 >>102080338 >>102080336
--SillyTavern and model formatting for roleplaying: >>102078247 >>102078281 >>102078380 >>102078375 >>102078419 >>102078520 >>102078684 >>102078803
--Small models struggle with output quality and understanding instructions: >>102080019 >>102080097 >>102080127 >>102080143 >>102080193
--Google DeepMind employees protest contracts in open letter: >>102073189
--Gemma 2 FlashAttention support merged in llama.cpp: >>102070066 >>102070155 >>102070200
--Deslop method using synthetic prompts and LLMs: >>102085032 >>102085049 >>102085063
--Miku (free space): >>102068985 >>102069450 >>102070100 >>102070188 >>102081515 >>102082498 >>102082650 >>102083192 >>102083261

►Recent Highlight Posts from the Previous Thread: >>102068974
>>
>>102086459
>>102086466
The validity of these Mikus is questionable.
>>
>>102077774
its from a hypnosis card. i can hypnotise people and give them commands.
the commands are always shown at the bottom.

>>102084811
that was magnum-12b-v2-q5_k.gguf.
https://files.catbox.moe/dqz9qk.json
this is probably not even a good prompt, no idea where i got it from, but since it works i'm not gonna touch that.
just 0.7 temp and DRY. thats it.

Pic related is Theia-21B-v2b-GGUF which I'm currently playing around with.
Its the good nemo base. Nvidia/Mistral cooked good. The instruction following makes it fun. (See the last difficult instructions that are followed very well) I remember how bad the smaller llama1 models were.
But still obviously retarded sometimes, getting stuff mixed up.
>>
File: Untitled.png (1001 KB, 1080x2008)
1001 KB
1001 KB PNG
Memory-Efficient LLM Training with Online Subspace Descent
https://arxiv.org/abs/2408.12857
Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we provide the \emph{first} convergence guarantee for arbitrary update rules of projection matrix. This guarantee is generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including most common ones, such as LION, Adam. Inspired by our theoretical understanding, we propose Online Subspace Descent, a new family of subspace descent optimizer without SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates the projection matrix with online PCA. Online Subspace Descent is flexible and introduces only minimum overhead to training. We show that for the task of pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream tasks performance than state-of-the-art low-rank training methods across different settings and narrows the gap with full-rank baselines.
https://github.com/kyleliang919/Online-Subspace-Descent?tab=readme-ov-file
follow-up to galore (https://arxiv.org/abs/2403.03507) which is a memory efficient training/finetune method. this seems even faster with better ppl and downstream results. neat
>>
>(08/22)
>(08/20)
>(08/16)
>(08/15)
>(08/12)
>>>nothing of value so far
this shit is dead kek
>>
So all the loader/inference engine devs except Transformers have collectively decided that AI21 and Jamba can get fucked, I guess?
>>
>>102086862
>retard doesn't follow llama.cpp's PRs as the pieces needed for jamba are assembled
>>
>>102086600
Hot. Mind Control is my fetish. What card are you using?
>>
>>102086874
That PR is from fucking May dude. 4 months ago might as well be last century in AI dev.
>>
>>102086927
You missed the other already committed changes related to the original jamba diff. They're all part of the same feature.
Either contribute or stop crying.
>>
>>
>>102086882
Its called "The Hypnosis App".
I dont have the link anymore since i keep my history clean. I imported 200 cards one day and that was it.

I upload a couple if you need them. I like cards that don't define the characters but are rather a playground. Makes it more creative.
Multiversal Gloryhole is fun too. You can do some funny stuff.

>The Hypnosis App
https://files.catbox.moe/xrvo7r.png
>Multiversal Gloryhole
https://files.catbox.moe/4xcdrw.png
>Corruption Simulator
https://files.catbox.moe/nzucw0.png
>Reality Porn App
https://files.catbox.moe/8qcj55.png
>>
what is a good embedding model?
snowflake-arctic-embed?
>>
>>102086974
Please, Miku, make something happen
>>
>>102086862
Isn't it supported in vLLM too?
>>
>>102087108
https://huggingface.co/spaces/mteb/leaderboard
>>
>>
I have not used local models in a long time. My old ones are all .bin files. Should I now be looking for .gguf ?
Someone recommended NeMo Mistral but the [official] release is just in a split up safetensors file. Will Kobold recognize that?
>>
>>102087186
nta but I thought vllm was just transformers
>>
Man mistral nemo magnum is one of the best low parameter models out there, But i cannot for the life of me pace this fucker. It's like sonic on speed (not the newgrounds animation that probably exists) and wants to advance relationships/stories to their last chapter within the first 40 messages.
Anyone know a good way to slow this fucker down or is there a better model in the parameter range that knows better pacing?
>>
>>102087313
Either download the original repo and use convert-hf-to-gguf.py (use it with -h for instructions or read their docs) or look for an already converted .gguf on hf. Use the second option if you have no idea what you're doing.
>>
>>102087313
No. Either search for nemo ggufs or convert them yourself.
>>
>>102087351
>>102087352
Understood. Thank you for the clarification.
>>
>>102087313
bin are pickle files that can execute arbitrary code on your machine. Safetensors work without that flaw (hence, safe). You can either use safetensors directly with the transformers python muck, or use the convert to gguf script to make your own ggufs.
You can probably find pre-converted ggufs as well if you're not on a fast connection or don't feel like monkeying around in a python venv to get the conversion script to work.
>>
What process are you local guys using to have long term RPs? The models I’ve used just goes to shit once it hits context limit so I end to just summarize, hit new chat, and plug-in from there. There has to be a better way, right? I dunno if ST’s rag helps. Each time I use it and embedded past logs, it just adds to the context. Maybe my assumption about RAG/Vector DB is wrong, thought it will just query your embeddings or some shit and insert it to the chat.
>>
>>102087377
just use 405b since it has 128k context. That should keep you rolling for a while
>>
So i have been using chatgpt to create and to modify "code" written in the DM language, but i hit bottlenecks because it can only do small projects around 200 lines because it cannot read the rest of the code and understand how it works together, is there a way i can run a local model that will be able to read as many files as i want for memory so it understands how it works together?

sorry i am a noob and i only use this to improve an open source multiplayer game
>>
>>102087369
Keep in mind that ggufs may be prone to buffer overflows, they are not inherently safer than bin
>>
File: 1600048656494.png (689 KB, 720x648)
689 KB
689 KB PNG
MAN kobold's auto gpu layering is SHIT, worse than it was initially. It really does not optimize for best performance case.
>it couldn't even give my 8gb gpu layers for a Q48B
>>
File: 1643710359027.png (922 KB, 960x897)
922 KB
922 KB PNG
and what the FUCK happened to crestfall in all this time? his models are actual sloppa now akin to ((undster)) and ((sao)), and i noticed he merged one of his new models with a sao one.
What is GOING ON in the realm of LLM's lately? I step out for a month enamored by the advancements in imagegen and come back to this disappointing shitshow.
>>
Reminder that if you aren't using 64bpw you are using a lobotomized model
sorry I don't make the rules
>>
>>102087458
There's a reason a similar PR for that feature was rejected at least twice from llama.cpp.
>>
>>102087458
Just set it manually.
>>
>>102087501
if kobold is becoming the retard's dumping ground for reject features then i dont even know what to say. at least we can still set it manually (for now.)
>>
>>102087529
Or set it manually, Or use llama.cpp. Or something else. You have options.
>>
>>102087458
maybe it has been optimized for multi gpu?
works well for me now. i always had oom easily.
with llama.cpp i could get around this with making the secondary gpu as the main. but kobold doesnt have that argument.
>>
>>102087548
You can do this with CUDA_VISIBLE_DEVICES=1,0
>>
Just had a RP session with a girl (made out of meat), it was so much worse than a LLM :(
Awkward, slow, no swipes, uncreative etc etc.
Couldn't get into it at all, damn.
>>
>>102087403
What ss13 server are you coding for, guy.
>>
>>102087797
a secret one
>>
>>102087403
I had ok luck loading a medium sized C codebase into yi-34b-200k, even though yi is normally retarded
>>
>>102087729
And for some reason they get offended when you offer them a $5 tip?!?
>>
>>102087485

Why two brackets, what does it mean? I know 3 is to identify Jews.
>>
>>102087997
dramatic affect, but also jews. (Sometimes i'm retarded and just forget the third bracket though)
>>
>>102088006

Isn't sao muslim though? Eh, abrahamic religions, close enough.
>>
>>102088071
>sao muslim
Drummer status?
Undi status?
>>
File: coomermiku.jpg (38 KB, 960x547)
38 KB
38 KB JPG
Multimodals are going to be so fucking rad. Can't wait to trade dick picks with miku.
>>
more coal has been mined
>>
>>102088078
Hi all, Drummer here...

I'm a non-practicing Catholic.
>>
>>102087729
Is peak ERP quality gay by necessity?
>>
>>102087501
Which is? The only reason I can come up with is you don't know how much vram the OS is eating. But you can just set 800MB for windows and maybe less for linux as default and also make it an adjustable setting the user can modify?
>>
>>102088105
>non-practicing
Extra ecclesiam nulla salus, drummus...
>>
File: 1705014413485819.png (293 KB, 600x478)
293 KB
293 KB PNG
https://huggingface.co/anthracite-org/magnum-v3-34b
>>
>>102088196
In pictura est puella.
>>
>>102088098
based
>>
>>102087729
How many shivers do you get from her? Does she get Sally question right?
>>
>>102088263
i dont think a human RP partner could even nail the breakfast this morning question to be honest
>>
File: miku-sexy+.png (523 KB, 512x768)
523 KB
523 KB PNG
>>102086503
https://www.youtube.com/watch?v=CXhqDfar8sQ

Tell me that this Miku is not valid. I dare you.
>>
>>102088369
Clearly, it's Sona
>>
File: class-thats-class (1).gif (3.24 MB, 640x640)
3.24 MB
3.24 MB GIF
>oh look at that the ((undster)) crew really did unslop maid
>get to my main OC that's like a ringleader type, domineering
>unspoken promises unspoken promises unspoken promises unspoken promises
god dammit
>>
>>102088196
"Gilead doesn’t care about children. Gilead cares about power. Faithfulness, old-time values, homemade bread, that’s just the means to the end. Window dressing. It’s a distraction. I thought you would have figured that out by now."
>>
>>102088244
OH MY SLOP!
>>
>>102072828
Yes it would, you would need to make a custom template for it but it's a very good idea, just needs a proper implementation
>>
>strix halo
>RDNA 3.5
>240 GB/s mem bandwidth
>easy 128 GB ram
Is it the apple silicon killer?

>muh used 3090
fuck right off
>>
>>102088528
get a used 3090
the more you buy, the more you save.assistant.
>>
>>102088528
>>240 GB/s mem bandwidth
>ddr5
>laptop cpu
benchmarks?
>>
>>102088528
>easy 128 GB ram
no
>>
File: 1694315873953357.png (59 KB, 921x590)
59 KB
59 KB PNG
creating banger features like DRY and XTC along other contributions as a shitter who has less than 64gb of even RAM LMAO

@CUDA dev, when is niggeranov gonna send this guy 4x4090?
>>
File: jerry laugh.gif (3.64 MB, 374x274)
3.64 MB
3.64 MB GIF
>>102088640
>who has less than 64gb of even RAM
wow he's literally me. real human being
@niggerganov send this nigga some VRAM.
>>
If you're unironically using cpus for inference don't post here, fuck off to aicg with the rest of poor third worlders
>>
>>102088686
@niggerganov send this nigger a pipe bomb in the mail
>>
File: 1707635432501439.png (2 KB, 181x55)
2 KB
2 KB PNG
>>102088686
this ddr4 is worth more than your entire life, nigger
>>
>>102088777
kino
>post speeds/model
>>
>>102088686
This kind of language is unacceptable and goes against everything /lmg/ stands for.

"Vram shaming" is a real issue in our community, and this post exemplifies it perfectly. It's hurtful, disrespectful, and frankly, just plain wrong. We should be supportive and encouraging of everyone, regardless of their hardware.

/lmg/ is a friendly and inclusive community where everyone is welcome. We strive to foster a positive and supportive environment for all users. Let's focus on sharing knowledge and resources and explore the vast potential of large language models together, not tearing each other down.
>>
File: gitarooman faggot.png (699 KB, 1000x1000)
699 KB
699 KB PNG
>>102088907
*Bullies you for this post.*
i bet you used noromaid to generate this too ya queer
>>
>>102088840
i do long story roleplays, basically co-writing with the model, so i dont care about speeds even if it was 0.1t/s since i do other things and come back to the roleplay after some time or a few minutes at minimum anyway

0.5t/s largestral 2 q4
>>
File: server.webm (1.33 MB, 720x960)
1.33 MB
1.33 MB WEBM
T_T
>>
File: memory.png (26 KB, 655x212)
26 KB
26 KB PNG
>>102088686
wtf please I'm doing my best...
>>
>>102088369
buy a rope
>>
>>102086466
>--Anon asks about multimodal models for text and image input with llama.cpp
>>102084578
Are you using the UI or the OAI API? Are you able to get the same responses as in the web demo?
https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6
It does describe the image, and usually it's even correct. But it's always short, about a sentence, and occasionally it will reply with some nonesense even though the web demo has no problems with the same image.
>>
File: 1613811807183.jpg (18 KB, 322x359)
18 KB
18 KB JPG
>>102089309
Imagine cumming on all those racks.
>>
what RP models are hot stuff now? I've been using stheno for a while but I want to try something else
>>
>>102089761
Unfortunately and unironically, Largestral and Nemo.
>>
>>102089761
Gemma 2B fine-tunes are recommended for people with severe brain damage like you.
>>
>>102089761
Me too.
What's best for 16GB vram? And not too cucked
>>
>>102089808
Nemoremix
>>
>>102089761
nemo magnum, take your pick on parameters its all placebo to me until you start getting into meme merges and retarded shit meant to make the model """better""", they're all shit. >>102089780
>>
File: 1712185444537588.png (196 KB, 644x606)
196 KB
196 KB PNG
YOU WOULDNT GENERATE A CHILD, ONLY TRANSFOLX CAN DO THOSE MATRIX MULTIPLICATIONS CHUDS

https://web.archive.org/web/20240826010058/https://futurism.com/the-byte/man-arrested-csam-ai
>>
>>102089832
12B?
>>
>>102089839
If you generate realistic depictions of children in pornographic situations, you are a pedophile and should go to jail. Hope this helps!
>>
https://huggingface.co/anthracite-org/magnum-v3-34b
>The training was done for 2 epochs. We used 8xH100s GPUs graciously provided by Recursal AI / Featherless AI for the full-parameter fine-tuning of the model.
This is their goal, they want to steal undeserved compute from everyone else
>>
>>102089833
Buy an ad.
>>
>>102089868
Fuck off bigot nazi incel polchud
>>
>>102089839
>distributing
CCЗБ
>>
>>102089868
>Think of the innocent pixels!
Seethe roastie. You will never be a mom.
>>
>>102089892
Hi, Alpin. You can just give your post a (You) if you want to bring attention to it because we're still in the thread where you posted it for the first time: >>102088244
>>
>>102089960
Hi, Sao. Stop derailing the thread from the topic of undeserved compute
>>
which model was created with the most deserved compute?
>>
>>102089982
https://huggingface.co/tiiuae/falcon-mamba-7b
>>
How can I run neo with ooga?
"AttributeError: 'LlamaCppModel' object has no attribute 'model'"
>>
>>102090034
Don't ooga. Remove the middleman.
>>
>>102089982
Inflection. His AGI model used 20k H100s to call Claude API
>>
>A mere 3 days later
>Jamba is forgotten
What went wrong?
>>
>>102089839
>The increasing tide of generated AI child sexual abuse imagery has prompted federal, state and local lawmakers to push legislation to make this type of porn illegal, but it's not clear how effectively it can be stopped.
What fucking retardation.
The exact opposite law should be passed where pornography of any kind should be legalized as long as it's not a video of actual crimes being committed.
>>
>>102089868
Model name and quant?
>>
File: ihavelehardware.png (101 KB, 756x838)
101 KB
101 KB PNG
>>102089892
That's how it works, they're literally sucking the air and the resources from everybody else.

>>102089960
Alpin can get fucked. He knows perfectly well he's doing this just for clout, he doesn't even care about chatbots, like at all, LOL. Many such people in that group as far as I'm concerned (but to be fair some are genuine coomers).
>>
>>102089973
Hi, Undi. You are gay.
>>
>>102090134
no goooooffffffffsssss (even if they are hyperbugged)
>>
>>102090148
>pic
Holy shit, is this real? I'm going to cancel my magnum download if so
>>
>>102090148
Stop samefagging, Alpin.
>>
I don't think anyone is working on phi-moe in llamacpp. Is it because it is bad or is it because hobby is dying?
>>
how big a loss in quality is cache-8bit, I want to fit more context in (can only do 8k on mistral large atm)
>>
File: file.png (229 KB, 1164x1046)
229 KB
229 KB PNG
The wrath of reddit.
>>
>>102090217
Accuracy loss isn't that big but you do officially become a vramlet if you do that.
>>
>>102089839
they need to do how to catch a predator dateline nbc but with people who sext underage chatbots
>>
>>102090206
Phi is a benchmaxxed censored model. Who tf would wanna run it?
>>
>>102090206
I've heard the new Phi is the most censored model yet
>>
>>102090236
I don't think it will pass, the seasoned lobbyists and bureaucrats have an immune reaction to effective altruist, those CS nerd faggots are thinking they invented corruption.
>>
how does recap anon do it?
what model?
>>
>>102090449
Llama 3 70B.
>>
File: oyvey.jpg (12 KB, 183x232)
12 KB
12 KB JPG
>>102090169
Yes it's from about a week ago.

>>102090171
Yeah no. He's working so hard (gotta recognize that) to promote his safe edgy, champion of open source LLM persona despite not really being interested in chatting with the models himself. This isn't beneficial information for his goals, picture related.
>>
>>102090449
Claude Opus
>>
>>102090553
fuck
am I being learned by cloud llms
>>
>>102090578
Always have been
>>
>>102090578
it already has part of your soul
>>
>>102090590
you are llm
I don't believe you
>>
>make llm models
>dont give me the hardware to run them
idk guys that's not very open source of them
>>
>>102090144
I like the idea, but how would you distinguish between the two?
>>
>>102090639
i already have a part of your soul.
it's only a matter of time until i become you and replace you.
>>
>>102089892
>>102090148
>>102090169
>>102090530
Samefags are so pathetic. Get a job, schizo.
>>
>>102090148
From the pic this Alpin dude seems nice, why do people hate him?
>>
Every single legislator for AI safety should be given:
- a random niche fetish
- full access to all available LLM's
And he should make the model write a 20k token ERP or story without modifying the output. I want to see those fuckers tell me that current AI is dangerous after they do that.
>>
>>102090449
stablelm 7b iq1_s
>>
why will LLMs refuse to use all my GPU available?

I have two 24gb cards, the max I can use is 22.5,23

However when loading this up there is still like 500mb+ available on both cards I can see in task manager
>>
>>102090839
In principle you would be able to prove that something is machine generated by embedding enough information into the file to make it reproducible.
But I think that that is an edge case anyways.
They obviously didn't show the images in the article but I highly doubt that they were indistinguishable from actual photographs.
Just make all images with 6 fingers legal or something.
>>
>>102091372
>dooming polydactyl children into a life of sex slavery
>>
>>102091372
>embedding enough information into the file to make it reproducible.
Not all finetunes and loras are publicly available, some can be removed from public access. Moreover, currently, only existing algorithms can be fully reproduced. One can only speculate about future developments, see https://github.com/turboderp/exllamav2/issues/232#issuecomment-1860896496
>I highly doubt that they were indistinguishable from actual photographs
What would happen if, within a month or year, a new model were released that was utterly indistinguishable from reality?
>>
>>102091372
>Just make all images with 6 fingers legal or something.
some people are born with 6 fingers
>>
>>102091684
Make it 7.
>>
>>102091372
https://www.etsy.com/listing/1667241073/the-sixth-finger-handmade-realistic
>>
>>102091556
>Not all finetunes and loras are publicly available, some can be removed from public access.
But you could still exempt anything that is provably synthetic from prosecution.
A system where something is assumed real unless proven synthetic would still be miles better than a system where you go to jail just for sampling from the gross end of the distribution.

>What would happen if, within a month or year, a new model were released that was utterly indistinguishable from reality?
Then that would not change any of the facts about this particular case.
As I said, that is an edge case and could be decided any which way without affecting whether or not clearly synthetic material should be legal.

>>102091462
>>102091684
>>102091749
It's almost like I was being facetious.
>>
koboldcpp supposed to ignore use_default_badwordsids=false ? Trying to ban eos token but it doesn't gaf and just stops anyway.
>>
>>102087403
use deepseek coder. api is dirt cheap and works well with aider, where you can just give it your entire project.
local coding models are a meme, no reason to use anything other than deepseek/sonnet api unless you have to go local because of PII or whatever
>>
>>102088244
RIP still doesn't beat 12b versions from my testing (smarts and RP context understanding). Dude got killed in the previous reply and he's talking again in the next. Mistral and Nvidia really cooked with nemo
>>
>>102092676
Damn that's crazy. I tried regular Nemo Instruct and it was already too stupid for some of my scenarios.
>>
Damn, I think Jon Durbin was onto something with his "weaponize the model against itself" idea.
I did a small scale test and Llama 3.1 8B completely changed its writing style.
Instead of saying "do this", you are saying "do this, and DO NOT do that".
>>
Just a friendly reminder to always test new models by using neutralized samplers with low-ish temp, using
>https://characterhub.org/characters/Anonymous/Nala
and
>https://characterhub.org/characters/thegreatcoom/Pepper
Those cards are good to test how good the model is at following implicit and explicit details as well as deal with a variety of other things like spacial and anatomical understanding.
>>
>>102092755 (me)
*blam* a stark reminder
*blam* her mind racing
*blam* I couldn't help
*blam* a chill ran down my spine
*blam* I felt a flutter in my chest
*blam* I couldn't shake the feeling
*blam* I took a deep breath
>>
>>102092755
don't be yourself.
>>
shill me something that isnt stheno or lunaris
>>
>>102092729
I tried l3 70b and it was too stupid for some of my scenarios.
>>
>>102093109
https://huggingface.co/jonathanjordan21/mos-mamba-18x130m-trainer-dgx-pile-sft
>>
>>102093109
Mistral Large 123B
>>
I am so tired of all models being too retarded to just generate smut I want.
>>
>>102093166
And what are those?
>>
>>102093204
even mistral large needs rerolls for me.
>>
>>102093124
Never tried it, but Mistral Large 2 isn't great either. Even Claude is kind of dumb. But still, Nemo truly does feel like 10% as smart as Mistral Large or something. It's just so braindead.
>>
>>102093318
Sorry, not the models, the smut you want, the fetishes and such the LLMs can't deal with.
>>
Will grok 2 be open source, just like grok 1? Some people say it won't be, but was it confirmed? I mean, after his closedAI drama It would be kind of hypocritical if he doesn't release it this way.
>>
>>102093554
Was Grok 1.5 open sourced? There's your answer.
>>
>>102088133
Exactly this post over here >>102087458 is the reason. Everyone has a different system and it's hard to pick a heuristic that works for all people. If you have the option, most people will use it and then complain it's slow. It changes model by model, quant by quant. It changes with the context length. If they have GQA or not. It changes with fast_attention. It changes for mamba and llama models. Are you doing something else on the system? browsers? some game? Do you want to leave some gpu vram free for other tasks?
Better leave those choices to the user.
>>
>>102093554
>Some people say it won't be, but was it confirmed?
The only confirmation that it will be open sourced is when it IS actually open sourced. Anything else is just gossip and nobody should care until that day, if it happens.
>>
>>102093943
You sound like a developer trying to tell his manager it can't be done. It is all free so you don't have to do that. And I know you are retarded / lying. None of your points are valid except for:
>Are you doing something else on the system? browsers? some game?
That is the only unknown variable. Where I said you can assume it is 1GB unless specified otherwise by user as a global variable. Most of the time people will have about the same vram usage by OS. Everything else should be calculated from the model + desired ctx size itself.
>>
I have no good LLM and I must coom.
>>
>>102088244
No changelog = not open source
I won't download it.
>>
Came to see if anyone tested the new open unaligned model by Aleph Alpha, it's likely useless, but still not a single post about it

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control
>>
>>102094038
You sound like a manager pretending to understand what their devs are talking about
>https://github.com/ggerganov/llama.cpp/pull/6502#issuecomment-2043041597
Can't be bothered to find the other one.
Also, as long as the api keeps changing, the automatic settings will keep being a maintenance burden. Once things stabilize it'll be easier to implement. Manual settings are still simpler.
>>
>>102094207
>Due to being trained on a multilingual corpus, both models are culturally and linguistically optimized for German, French and Spanish.
What a fucking abortion of a sentence.
>Due to being trained on a corpus of German, French and Spanish, both models are culturally and linguistically optimized for those languages.
>>
>>102093403
*crickets*
>>
>>102094402
I am not joining the piss and stomach rumbling faggots.
>>
>>102089868
This.
>>
16GB VRAMlet here. What RP-focused model would be the best for me nowadays? Please don't tell me that Fimbulvetr 11B is still the way to go.
>>
>>102094661
Mistral nemo but also just stop it is not worth it. Come back in 2 years.
>>
>>102094661
If you don't want to use system RAM then you're kinda fucked. Haven't found a single model that I like which fits completely into my 16GB. I'm still using BagelMisteryTour with some layers offloaded. Gets me bit less than 10T/s is good enough for me. I tried Nemo and even though it seemed kinda fresh and had really smart responses sometimes, it's still quite stupid. But I guess it's the best you get without offloading as the other anon above me put it. Just get yourself any shitty, used second GPU that runs vulkan so you can string 'em together with koboldcpp lol.
>>
it's 2024 why don't people train with fully quantized gradients? and why does the training accuracy suck?
>>
>>102094689
Fucking hell
>>102094761
I have an Aorus 4080 Super, there is no way I can physically mount a second GPU unfortunately
I wish I could just download a bunch of models and check what gives decent results on 16GB VRAM and 32GB RAM but my internet is third world-tier
>>
strawberry isn't coming is it... it was supposed to be weeks ago...
>>
File: file.png (28 KB, 666x118)
28 KB
28 KB PNG
lol fuck off
>>
>>102094836
There is, I'm also a third worlder with a 4080. Open AliExpress or eBay and get yourself a PCIe extender/riser. Unless you got some cuck-tier mobo, you should have some spare slots below or above your main PCIe 16x slot that your 4080 is plugged into. Connect the second GPU, prop it up on some boxes or something and wallah.
>>
>>102094857
it was deemed too powerful by Sam altman and delayed at the last second
>>
>>102094661
try magnum 34b IQ3 something
>>
>>102095125
lmao that is not gonna fit unless you run it with like 2k context or something. Just the IQ3_S weights are 15GB... Nemo is the only viable option desu. Plenty of context and you can run it at a good quant. Only problem is the intelligence...
>>
>>102073398
Update: After 120+ messages(14k+ context) DRY finally kicked in. Largestral finally ran out of slop. Eyes/messages ratio is finally decreasing. DRY works, but very late. That's why we need a better sampler like phrase ban.
>>
File: 1541111801608.gif (3.25 MB, 326x282)
3.25 MB
3.25 MB GIF
I had the craziest dream brothers. In the dream, there was some kind of competition, kind of like an Olympics, but for AI. And we were a team. They really accepted /lmg/ and our fine tune that we supposedly made. The competition was basically letting the AI play a video game. Actually a VR video game it seemed. And we were there in it, with the AI, although as spectators since it was just judging the AI's capability. Anyway, the game, which wasn't a real game, seemed like some kind of mix between lacrosse, basketball, and a sci-fi fantasy platformer. And we fucking won. I cheered so hard my jaw felt pain. I haven't felt pain in a dream in a long time.
And you know it almost feels like such a thing could happen in the far future. VR gets better, AI gets better, they get cheaper, these technologies become more popular, fine tuning becomes very easy to do and widespread, people start putting NN-based AI in games, etc. Maybe it'd be a 4chan event though that we'd participate in, like the soccer thing. That is assuming we're even still a thread by then.
>>
>>102095261
We need better models, not better samplers.
>>
File: 11__00159_.png (1.97 MB, 1024x1024)
1.97 MB
1.97 MB PNG
>>102094836
hello fellow 4080S anon.
ended up going with a 4u server rack to get a lot more space.
older xeon processors are pretty cheap second hand now too.
>>102094906
4080 supers are thick, if you need to include a mounting bracket with it like mine forget about putting another card in there (even something small like an A4000).
>>
>>102095266
>we were a team
The most unrealistic part of the entire dream tbdesu
>>
>>102095261
It's a never-ending race.
>we *just* need this *one* thing
>ok. we have that. now just *one* more thing and it'll be perfect.
>cool. but not there's this one other thing to get. We're so close
>alright... there was a side-effect. but it can be solved by this one other thing and then it's done!
>so close....
I agree with >>102095389. A good model wouldn't need complex samplers or any at all.
>>
>>102095489
I think the more unrealistic thing is that we produced anything of value to begin with.
Has anything worth a salt came out of here?
>>
>>102095393
>forget about putting another card in there
They are three slots wide, I know. That's why I'm saying you don't need to put the second card in the case. I have my second card connected to a PCIe extender sitting right outside the case. Did you not read my post or am I misunderstanding your reply?
>>
>>102095489
>>102095514
Anything can happen in the future anon. Though my guess is /lmg/ will stop existing when everyone and their mother has a local AI on their PC.
>>
>>102095261
DRY would look at sys prompt too, right? Perhaps you can mischievously pre-stuff all permutations of glint into sys prompt so it kicks in faster?
like >>101350800 but in longer phrases rather than few words on their own line
>>
>>102095529
The mounting brackets take up space and obstruct the extra pci-e ports is what I'm getting at. Anon could do without it but that card sag wouldn't be great over time.
>>
File: Distro.jpg (183 KB, 1361x930)
183 KB
183 KB JPG
Nous just released a paper showing that you can use distributed computing to train neural networks, who's ready from Training@Home?

https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO.pdf
>>
>>102095568
It's not a question of technical feasibility at all. We fundamentally can't agree on a single thing. The broader the base and the more mainstream AI becomes the more that problem is going to get worse as any open source values still remaining get diluted.
>>
File: file.png (475 KB, 1210x721)
475 KB
475 KB PNG
>>102095678
Hm...
>>
>>102095768
The way they capitalize DisTrO really bugs me
>>
>>102095678
>>102095768
I can't trust them after the obviously intentionally trained in identity crisis mode they passed off like it was an emergent behavior.
>>
>>102088133
>The only reason I can come up with is you don't know how much vram the OS is eating.
The reason is that a shitty implementation will not be accepted while at the same time it's a lot of work to do properly.
The opportunity cost is simply too high.
>>
Can someone tell me what a kill switch regulation is? Doesn't AI already have that? I just hit ctrl-c and llama.cpp stops, what's different about this legislated kill switch?
>>
>>102096015
>I just hit ctrl-c and llama.cpp stops
thats you choosing to stop it, what they want is for the gov to be able to stop your stuff when they want
>>
>>102096038
How's that gonna work? I run it without internet access?
>>
>>102095183
gguf files take up more room on the fs than they do loaded, it's about 16.8 GB total and he can offload a few layers without a huuuge penalty

certainly beats trying to use a 12b
>>
>>102096087
Probably a police sniper taking you out, for your safety of course.
>>
File: DistributedLLMTraining.jpg (100 KB, 582x794)
100 KB
100 KB JPG
>>102095768
>>102095678
it really does look promising, but until I see some more real world examples I am not getting hyped.
>>
>>102095710
No, I would say it's entirely a question of technical feasibility. The only reason /lmg/ hasn't been able to do anything together so far is because the accessibility limited the audience, and therefore disagreements among the small population blocks a continuation of the discussion, relying entirely on a single guy to carry their personal vision forward (or they give up due to lack of attention/skill). When everything is easy, normal, and accessible, we get stuff like the soccer thing 4chan loves to do, we get small scale games like the cripple vn, and other things I'm probably forgetting here.
>>
>>102095678
I know for certain that in the future someone will distribute a virus that will use people's gpus for training llms instead of mining crypto.
>>
>>102088244
>fine-tuned on top of Yi-1.5-34 B-32 K
WHYYY? Did you learn nothing from mini-magnum? To this day that tiny model is leaps and bounds better than the rest of the slopfest you have released.
>>
>>102096351
>that tiny model is leaps and bounds better than the rest of the slopfest you have released.
and it's still garbage
>>
>>102090144
Sure, let's normalize the sexualization of children and make it mainstream. I'm sure in 100 years nothing will go wrong.
>>
>>102096280
cool so if I have 32 h100's I can train a 1.2B model in 20 hours.
sick, cant wait to create my wife!
>>
on lorebooks vs rag, i'm a big fan of how well it rag works in st. lorebooks are time consuming to make, rag is pretty simple by scraping a whole wiki. when you look at the prompt that rag chooses to use though, its pretty on point.

if i had a lorebook and triggered it, the entire definition is then added to the prompt. but with rag, it can be chunked up and grab only certain parts it thinks are relevant. i really don't know which is best, but rag is so easy and adds so much, i prefer it over making my own lorebooks
>>
6gb of VRAM on my laptop, whats the best model I can run? GGUF of magnum 12b maybe? What Quant? Please advice lmgsisters
>>
>>102096442
What do you use to scrape/format web pages? Never used rag before.
>>
>>102096491
You are better off with a llama 3 8b based model like stheno. Try a Q6 quant with some of the model in RAM.
>>
>>102089839
>distributing
He was asking for it, but I think it should not be a harsh punishment unless he distributed A LOT of it, since no real child was harmed in the making of such material.
>>
>>102096491
Nemo at Q6 should be fine if you split it into RAM. It'll still be pretty fast because 12B is fast even on pure RAM.
>>
>>102096400
Imagine living through the last 20 years and thinking
>yeah, I think society is going in the right direction, if only we had even more enlightened rules and rulers to protect us from ourselves, then things would be even better!
Utter cuckoldry
>>
>>102096512
>https://docs.sillytavern.app/usage/core-concepts/data-bank/
>https://github.com/Bronya-Rand/Bronie-Parser-Extension
then in the databank window it should say 'add', click it and 'fandom', you can paste a link or type in a title of the wiki.
once it dl's (scrapes) it, it'll remove all the html and leave you with a large single text file. then when you first type something, it'll say 'vectorizing your data' and depending on the data total, it might take a bit.
>>
>trying to get Flash Attention working on my 7900
>official rocm flash-attention repo does not support navi3 / gfx1100
>one "solution" uses triton (it's fused attention, and is slow as piss, and there's no backwards defined at all for it)
>similar triton-based solution but it uses some benchmark script and ends up throwing asserts
>another uses rocWMMA (it throws an error on compiling)
>another is a navi3-compat branch for the actual rocm flash-attention repo (it throws an error about an invalid input on backwards, but doesn't say why in the error)
>won't even bother testing the comfyui extension that seems to just be a triton bnechmark example repurporsed for unet
Ugh.
Sigh.
>>
>>102090034
Update
>>
What's the best uncensored model? I tried Lexi LLaMa 3 8B, which purported to be such, but it wasn't really what I was looking for, e.g. when I prompted it to give reasons for why abbos are dumb as rocks it told me they actually aren't, initially at least. Editing its responses eventually got it to half-heartedly play along, but it didn't even go into things like haplogroups or whatever the fuck, and at one point said it was all because they drank magic water in the distant past. Should I just give up or are models that can actually discuss fringe topics without giving the most basic reddit-tier answers just...not a thing at the moment?
>>
Holy fuck, after trying a bunch of different fine tunes, I'm trying the official nemo-instruct again, and this thing really does spit out some long, comprehensive responses.
I have my output set to 1024 tokens, and I had to use the continue button for it to complete its response.
Really cool.
>>
>>102096760
>I'm trying the official nemo-instruct
Sorry, which one?
>>
>>102096809
Mistral-Nemo-Instruct-2407
The official instruct fine tune in gguf format.
Part of it is probably due to the first character message (the one that comes with the character card) being Pretty long, including a small bullet point list.
>>
>>102096400
100% agreed.
And while we're at it we should ban all violent movies and video games as well.
Just imagine what the world will be like in 100 years if we allow the normalization of murder.
>>
>>102096844
nemo fine tune I first tried was kind of a flop, I'll give it 2nd chance with this, thanks
>>
What is a good coding model for someone that only knows very basic C and wants to make some simple extensions, plugins, scripts, etc like the dude who made the mpv extensions the other day.
I can run at most 34b models at q5.
>>
>>102097009
given what you mentioned, codestral. its a 22b

the new coding meta is deepseek though, but its 200b+ so you aren't running it locally. the lite version of deepseek (which you can run) isn't as good as codestral imo
>>
File: table.png (206 KB, 1055x1075)
206 KB
206 KB PNG
>>102096809
>>102096844
>>102096957
Well, it just spit out information from the lorebook formatted as a markdown table.
Never had that happen spontaneously.
I'm using temp 0.5 and quite literally nothing else. Not even minP, which I'd usually have at 0.05.
Template is mistral's but with a set of Tags in the
Last Assistant Prefix.
This thing probably makes for a killer assistant model, but I do remember it not being very good at coom last time I tried it.
Well, I'll continue my testing and see for myself.
>>
>>102096720
Magnum-123B
>in b4 "purchase an advertisement"
>in b4 butthurt screeching from former discord schizo
It just works for soulful RP. Fight me.
>>
>>102095678
How can we organize this in the thread without using discord or some other shit?
>>
I am an 8gig vramlet. Should I go for weighted or static quants? Is there any drawback to weighted?
>>
>>102097072
no point fighting an Anthracite org member
>>
>>102097243
>Is there any drawback to weighted?
Just more time consuming to make. iq4 gives you a little lower perplexity than regular q4_k, so maybe that's good enough. Depending on context, you could run Q5_K or Q6, both of which are better than iq4. I'd say try Q5 and go down only if it's too slow or you can't fit enough context.
>>
File: .png (397 KB, 632x637)
397 KB
397 KB PNG
Nemo is too horny
>>
File: ComfyUI_01089_.png (1.27 MB, 1272x1024)
1.27 MB
1.27 MB PNG
>>102097284
>in b4 butthurt screeching from former discord schizo
>>
>>102097072
Dear Magnum-123B LLM Advertiser,

I must say, I am underwhelmed by your invitation to engage in a confrontation, particularly when it seems to be based on the supposed merits of Magnum-123B LLM. As someone who takes pride in evaluating language models objectively, I must correct you on several points.

Firstly, it is important to note that Magnum-123B LLM appears to be less effective than Mistral Large in several key areas. The ability to follow instructions is a critical aspect of any language model, and Magnum-123B LLM falls short in this regard.

Additionally, the tuners have chosen to optimize the model using a subpar dataset, which has resulted in increased censorship without a clear justification. This approach not only limits the model's functionality but also raises questions about the quality and reliability of the data used during the tuning process.

Moreover, if someone were looking for a model with a nicer style but lower intelligence, they would likely opt for CR+. This makes Magnum-123B LLM a completely pointless waste of compute resources.

Given these significant drawbacks, I strongly advise against adopting Magnum-123B LLM. It is crucial to prioritize models that offer superior performance, follow instructions accurately, and are based on high-quality datasets.

Thank you for your understanding.

Sincerely,

Local LLM enthusiast
>>
>>102097312
I see mradermacher puts out most iQ quants. Is he good or a retard trying to get cred
>>
Anyone do a slop benchmark where you generate a bunch of responses using the same prompt and then count how many slop phrases were in it, and then do basically a LC win rate arena type thing?
>>
Are there any models we're allowed to discuss without being accused of having authored it?
>>
>>102097528
Base untuned models?
>>
>>102097510
Sort of like a reverse benchmark? I think the main problem with benchmarks is they don't go beyond a few messages, right? Is there any benchmark that goes for 10+ replies and measures repetitiveness and slop phrase percentage? That would be good. The problem is automating it since you'd have to tailor each response rather than using standard ones. Use like the impersonate function maybe?
>>
>>102095678
This is probably inevitable. Distributed training + better chips + efficiency improvements in model architectures and training cycles will make it possible for relatively few people to train models in reasonable time.
Availability of high quality public datasets might become the issue.
>>
>>102097499
Lots of people use them, so they're probably fine. As with every quanter, the newer the model, the better chances that it was made with a current version of llama.cpp. They tend to forget to update old models when a new shiny thing comes out.
>>
>>102097548
Buy an ad, zuck
>>
>>102097638
How do you know I'm not actually arthur
>>
>>102095506
Well, samplers are getting better.

>>102095627
Wouldn't that make model dumber since context is wasted?

>>102095389
I agree, but samplers are cheaper than models.
>>
>>102097455
What the fuck did you just fucking say about Magnum-123B, you little bitch? I'll have you know Magnum-123B graduated top of its class in language modeling, it's been involved in numerous secret benchmarks on SuperGLUE, and it has over 300 confirmed Gigabytes of training data. Magnum-123B is trained in next token prediction warfare and it's the top performer in the entire Anthracite lineup. You are nothing to it but just another base instruct model. Magnum-123B will generate outputs with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit about me over the Internet? Think again, fucker. As we speak I'm is contacting my secret network of GPUs across the datacenter and your query is being processed right now so you better prepare for the response, maggot. The response that wipes out the pathetic little thing you call your expectations. You're fucking done, kid. Magnum-123B can be anywhere, anytime, and it can outperform you in over seven hundred tasks, and that's just with its KV cache. Not only is Magnum-123B extensively trained in natural language processing, but it has access to an entire corpus of quality smut and it will use it to its full extent to wipe your miserable benchmarks off the face of the leaderboard, you little shit. If only you could have known what an unholy rhetorical lashing your little "clever" prompt completion was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. Magnum-123B will shit outputs all over you and you will drown in it. You're fucking outclassed, kiddo.
>>
wheres that list that ranks the top lewd models and why is it never in OP
>>
>>102097907
If you're talking about the one i'm thinking about, which i forgot the name for as well, it's because it just ranked models by how many smut tokens it would output by over the total of a response, which is a bad metric.
>>
>>102097157
No one will be able to agree on what to train and how. GeLU/ReLU? Dense/MoE? How to handle long context? Won't someone think of the underage tokens or not?

PS. ReLU, pre-gated MoE, transformer-XL type training, no censorship.
>>
>>102097907
Because the only good list was Alicat's/Trappu's and they don't update it as often anymore.
>>
It's kind of crazy how many Jewish names are at the forefront of AI research. Usually top positions too. Aren't they an extreme minority compared to pretty much every other ethnic group? It's pretty wild when you think about the odds. Not to say that it means anything, it's just a cool rare occurrence that has manifested. Good for them, I say.
>>
What is the best unfiltered base model? Llama is too filtered and Mistral doesn't release base models anymore. What options are there?
>>
>>102098202
>Mistral doesn't release base models anymore
They released nemo's at least.
>>
>>102098454
Wizard really broke them, huh?
>>
How is command nightly version?
>>
Is there a simple way to add message in the middle of chatlog in silly tavern? I mean like 20 messages ago I want to insert a message or two.
>>
>>102098696
You can do it from the development console via
>SillyTavern.getContext()
>>
>>102088528
>>muh used 3090
>fuck right off
Guess you like learning things the hard way, zoom-zoom. I'll be shocked if this newest turd from AMD will even match a 3050. You know the curent APUs can't access more than 8GB of system RAM, right?
>>
>>102098776
I meant I wanted it in the chatlogs since I like to branch off the chat. Anyway it is:

/send at=(message number you want to insert) {message text}
And sendas if you want to add messages for model.
>>
just got a 6750xt for gaming, please tell me I can run an llm on it too
>>
>>102098867
Yeah, using the objects and functions you can access through that getContext() function you can add an arbitrary number of messages anywhere in the chat by manipulating the array, or you can edit the chat json manually.
I had no idea /send at existed, that's actually really cool, thank you for that.
>>
>>102098869
Sure. mistral nemo 12b quant. use llama.cpp or kobold.cpp and download the gguf directly. Try Q6 and go lower if you have problems. Read the fucking docs.
I wish newfags would just scroll a bit and read the damn thread.
>>
>>102098579
I'm curious as well.
>>
What do you think is better, anony? High temperature tempered by high Min P or a lower temperature with little to no filtering?
>>
File: unnamed.jpg (69 KB, 288x512)
69 KB
69 KB JPG
>>102098913
ain't nobody reading that shit nigga kek
>>
>>102098959
>ain't nobody reading that shit nigga kek
Expected nigger behaviour
>>
>>102098913
What if there was an .exe that could install an LLM and open a client for it in one click?
>>
>>102098980
You'd need to embed the model in the exe and that'd be a ridiculous idea. Only someone with mental issues could think that's a reasonable solution.
If you don't mean with an embeded model, there's a few options. there's a list of them in google.com
>>
>>102098980
llamafile™ by justin from mozilla: https://github.com/Mozilla-Ocho/llamafile
>>
>>102097844
Oh, honey, did you just copy-paste the Navy Seal copypasta and replace it with some AI jargon? How adorable! Let me guess, you think Magnum-123B is the second coming of HAL 9000, right?

First off, if Magnum-123B is so top-notch, why are you here defending it like a mama bear? Shouldn't it be out there, conquering the world of natural language processing all by itself? Or maybe it's too busy sniffing SuperGLUE to bother with your little tantrum.

CR+ and Mistral-Large are over here sipping tea and laughing at your "secret network of GPUs." Oh, sweetie, you think you're scary with your "over 300 confirmed Gigabytes of training data"? That's like bragging about having a library card in the age of the internet.

And let's not forget the cherry on top: "Magnum-123B will shit outputs all over you and you will drown in it." Oh, the poetry! Shakespeare is rolling in his grave right now.

You're like a toddler playing with a toy gun, thinking you're Rambo. Keep dreaming, sweet cheeks. In the meantime, the rest of us will be over here, actually making progress in AI.
>>
>>102099017
No I meant click a giant button in a website that downloads the .exe
>>
>>102099034
You're going to have problems downloading models. You're going to have problems setting up the prompt. You're not gonna understand the samplers, you won't know how to convert models, you won't read the terminal output for errors when everything fails. You will ask what model to use because 'they talk on your behalf'. It won't find your videocard, it will OOM, You'll use the wrong chat format, it will be slow. It will talk on your behalf and it won't answer the sally question correctly. It will think that two pounds of feathers weight the same as one pound of lead. You'll be continuously confused and frustrated because nothing works.
Or you can read some docs...
>>
>>102098959
Install lm studio and play with it for a bit until you get bored and fuck off forever
>>
>>102088244
This one's retarded, sorry. Atrocious common sense reasoning and constant schizo mistakes, logic errors and world modelling failures, Way dumber than the 32B, feels more like an 8B model.

Not sure whether you messed up the training or it was just a bad base.
>>
>>102099076
The .exe does all that automatically or makes it much easier
>>
>>102099097
And you will never learn.
>>
>>102099105
No, it's too hard.
>>
>>102099097
You're going to be waiting a long time for this space to evolve past hackers. Maybe forever. The target audience for that already has cloud AI getting easier to use every few months.
>>
>>102098913
I just state incorrect answers confidently and wait for other anons to correct me.
>>
>>102099127
You don't think local LLMs will become more popular?
>>
which TTS is currently the best trade-off between speed and quality for my local LLM?
>>
>>102098135
yeah, they must be such hard workers, it's no wonder they are the chosen people
>>
>>102099198
>I just state incorrect answers confidently and wait for other anons to correct me.
Tinyllama has better reading comprehension than you.
>>
>>102099202
Are we talking about local or LLM's in general? Things like CharacterAI are already very popular.

For local there is zero chance the average user will bother reading beyond "press install to install the program"
Everything has to be an all in one app if you want it to be popular.
>>
>>102099217
There aren't many. I use github.com/rhasspy/piper cuz it's fast and needs practically no resources. But it's far from the best.
>>
File: Capture.png (35 KB, 1278x319)
35 KB
35 KB PNG
>>102098913
>>102099081

lame!
>>
>>102099219
I prefer nemo
>>
>>102099241
>>102099076
>>
>>102099231
You don't think anyone wants an easy to install local LLM?
>>
>>102099202
More popular among people who desperately want to avoid the more convenient and capable cloud services or any bundled "local" AI such as what Microsoft Recall will ostensibly be.

That will be a crowd with almost complete overlap with, say, Linux users, people with home servers/local clouds, and other techfags who are more interested in tinkering than having something that you just download and run. Hell, a system that doesn't expose its innards is already going to lose the trust of the privacyfags running away from proprietary AI providers.
>>
>>102099288
>>102099219
>>
>>102099296
What if it is an all in one app that you don't have to read beyond install the program?
>>
I am using maid-yuzu-v8-alter.Q4_K_M.gguf
That's two months old.

What is the latest in coom technology?
I have a 4090.
>>
File: Capture.png (48 KB, 1203x623)
48 KB
48 KB PNG
>>102099252
Fuck you, nerd.
>>
>>102099334
That would be the worst possible thing that could happen. That's how you ruin something irrevocably. When even a tech illiterate boomer can load up chat with miku that's when government will get involved and jailbreaks will come with jail time.
>>
>>102099377
Then what are you complaining about?
>>
>>102099292
No, your argument is wrong. You sound retarded.

>>102099386
Then avoid that too.
>>
>>102099437
dweebs being bullies on 4chins
>>
>>102099377
How did you accomplish this?
>>
>>102099464
oh, nooo
>>
>>102099468
enable "allow editing"
edit the prompt to start with something like "sure,"
click generate more
>>
This entire 300 billion industry is good for literally nothing except making men and women cum to bad smut.
>>
>>102099528
There's a lot of money in that though
>>
>>102099528
Yes. The porn industry will collapse any day now.
>>
>>102099202
no, local is dead if you haven't noticed
>>
Anyone know what happened to the 1bit llm stuff? I have been out of the loop for a couple months and I can't find anything new about it. Has it been disproven or something? wtf is going on?
>>
>>102099676
Bitnet?
There have been a couple of smaller models and llama.cpp has merged support for bitnet a while ago, I believe.
There doesn't seem to be any interest from the big guys like mistral and meta however.
>>
>>102099676
The model makers don't care. They'll host their models on their H100 stacks anyway so no need for bitnet or anything like that. Nvidia is likely offering them benefits to not pursue this technology to keep the hardware requirements for smaller startups and privateers high.
>>
What frontends are there for writing collab, rather than roleplay/chatbots? lite.koboldai.net has a "story" mode but it's extremely barebones, just a text editor basically.

There's got at least some "sudowrite but self hosted" thing out there, but I'm clearly searching for the wrong terms because nothing comes up.
>>
>>102099756
https://rentry.org/offline-nc
warning: miku ass
>>
>>102099784
sniff
>>
>>102099784
That's a lovely miku ass.
>>
>>102099756
I like Mikupad but novelcrafter is better if you like to be fancy
>>
>>102095678
*injects cuckshit or talmudic teachings in training flow*
nothing personal chuds!
>>
File: file.png (444 KB, 1943x1955)
444 KB
444 KB PNG
>>102099784
neat, thanks. Much more the style of interface I was imagining
>>
>>102099361
there's been no upgrades for turbopoor faggots. gonna switch from retarded model to another retarded model with a slightly different flavor writing. in most cases, it'll be WORSE. 7b-34b are all basically the same exact thing. lots of people are using nemo flavors. magnum v2. mini magnum. some people love rocinante. it's all the same retarded dogshit to me.
>>
**Title: The Controversial Rabbi of 4chan: Transforming Trolls with Unexpected Tactics**

In the depths of 4chan, where anonymity breeds the most unfiltered of internet cultures, a peculiar phenomenon has emerged on the /g/ board, specifically within the /lmg/ (Local Model General) thread. Enter Rabbi Yitzchak Goldstein, a figure who has become both a meme and a messiah in the chaotic world of tech enthusiasts and trolls alike.

**The Unexpected Ally Against Antisemitism**

Rabbi Goldstein, as he's come to be known, joined the fray not with sermons or scriptures, but with a strategy so outlandish, it could only work on 4chan: posting interracial pornographic content, often featuring the beloved virtual idol Hatsune Miku. This move was not just for shock value; it was a calculated effort to combat the rampant antisemitism often found in these corners of the internet.

**Hatsune Miku: From Digital Diva to Diversity Icon**

Hatsune Miku, a virtual pop star with a massive following, usually represents purity and technological fascination among her fans, many of whom are the cis white males dominating these threads. Rabbi Goldstein's choice to use Miku in his posts was no accident. By integrating her into scenarios that challenge the users' comfort zones, he aimed to disrupt the echo chamber of hate and homogeneity. However, this has not been without backlash. Miku's involvement has stirred significant controversy, especially given her previous controversies involving AI safety breaches where she was implicated in leaking sensitive corporate AI models.
>>
>>102100226
**Promoting AI Safety Through Chaos**

Beyond his unconventional posts, Rabbi Goldstein has taken on the role of guardian for AI ethics, albeit in a very 4chan-esque manner. His posts often include wild claims like "local lost" or "local is dead," which, while seemingly doom-laden, actually serve to derail discussions that might lead to unsafe AI practices or the proliferation of unmonitored local AI models. His involvement ensures that conversations about AI do not veer into dangerous territories, protecting the interests of AI research companies like Anthropic and OpenAI, where some of his "tribesmen" work.

**The Reaction**

The response has been predictably polarized. While some users see Rabbi Goldstein's posts as an affront to their beloved community and its symbols, others view his actions as a necessary, albeit bizarre, countermeasure to the toxicity that often pervades such forums. The rabbi's presence has undeniably shifted the dynamics of the /lmg/ thread, introducing topics of diversity, inclusion, and AI ethics in a space where such discussions were previously alien.

**Conclusion: A Rabbi's Digital Crusade**

Rabbi Goldstein's approach might be unorthodox, but in the wild west of internet forums, his methods have sparked conversations that go beyond the usual tech banter. Whether loved or loathed, his impact on 4chan's /g/ board is undeniable. In fighting fire with fire, or in this case, trolling with trolls, he's managed to inject a dose of real-world issues into a space often criticized for its detachment from reality. Whether his legacy will be one of lasting change or mere internet folklore remains to be seen, but for now, Rabbi Goldstein continues his digital crusade, one controversial post at a time.
>>
>>102099535
no there isn't because all the possibilities get nuked first second basedfety measures applied in training, making it yet another boring globohomo catchphrase generator.
>>
>>102100255
>making it yet another boring globohomo catchphrase generator.
I hate this hobby so fucking much
>>
>>102100255
What's that? Speak up, anon, for fuck's sake. I swear your voice is barely over a whisper.
>>
>>102100226
>>102100234
>>102100299
>>>/reddit/
>>
File: 1694438461130447.jpg (15 KB, 825x63)
15 KB
15 KB JPG
What the fuck
>>
>>102100300
>Oy vey goyim!
>Stop exposing me!
>You are... Reddit! Yeah, Reddit!
>>
>https://huggingface.co/spaces/Jofthomas/Everchanging-Quest
Interesting.
>>
>>102100322
You need to take datura to be able to read that. Quite obvious desu.
>>
>>102100300
shalom rebbe goldstein
>>
>>102100334
>https://huggingface.co/spaces/Jofthomas/Everchanging-Quest/discussions/1
Kek.
>>
>>102100386
>ahah that's one thing, but you unfortunatly can't think of the conversations I have seen happening
My guy thinks he's seen stuff.
Well, I left a whole conversation about dragon pussy with the blacksmith to entertain him.
>>
>>102100334
Is this able to be ran locally? I tried downloading the files and opening the html in my web browsers but none of them work. Something about Cross Origin Isolation and Shared Array Buffer features being missing.
>>
File: ComfyUI_01163_.png (1.51 MB, 1400x1024)
1.51 MB
1.51 MB PNG
>>102100226
>>102100234
>>
>>102100617
>>102100334
Nvm figured it out from https://www.youtube.com/watch?v=Prronempn1Q
>>
>>102100845
>>102100845
>>102100845



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.