[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_01194_.png (3.89 MB, 1536x2304)
3.89 MB
3.89 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Tuesday is Over Edition

Previous threads: >>103545710 & >>103536775

►News
>(12/17) Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3
>(12/16) Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32
>(12/14) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2
>(12/14) Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361
>(12/13) Sberbank releases Russian model based on DeepseekForCausalLM: https://hf.co/ai-sage/GigaChat-20B-A3B-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>103545710

--Paper: FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores:
>103547272 >103547808
--Papers:
>103547261
--Intel Arc updates and LLM running solutions:
>103552251
--Falcon3 family of open models and their performance:
>103547725 >103547743 >103547837 >103547787 >103547898 >103549931 >103547788 >103547888 >103548124
--Anon discusses and compares text-to-speech models, including CosyVoice2:
>103546353 >103546456 >103546944 >103547034 >103547061 >103547688 >103547800 >103553458 >103553621
--Anons discuss a suspicious RTX 4090 listing on AliExpress and share their experiences with Chinese online marketplaces:
>103550949 >103551009 >103551689 >103551775 >103552164 >103552204 >103551035 >103551205
--Discussion on the effectiveness and comparison of bitnet models:
>103553433 >103553448 >103554089 >103553456 >103553486 >103553570 >103553599
--Impact of switching from FP16 to int8 inference on model accuracy:
>103546155 >103546208 >103549263
--Anon seeks dust proofing solutions for open mining rig with 3090s:
>103553137 >103553183 >103553339 >103553354
--Regex and small model approaches to rewriting sentences:
>103549331 >103549353
--Gemma 2 9B model's performance in creative writing tasks:
>103546296 >103546512
--Llama.cpp Vulkan updates and Nvidia involvement:
>103550656
--FOSDEM 2025: Quantization in llama.cpp:
>103550704
--Anon asks about running Linux with Windows VM for gaming and LLM use:
>103549612 >103549760 >103549709 >103549854
--Anon gets Cosyvoice 0.5b working, shares audio sample:
>103547577 >103549538 >103554651
--Anon discovers speculative decoding for speedup:
>103549662 >103549673 >103549762 >103549842 >103549866 >103549863 >103549952
--Miku (free space):
>103546325 >103548490 >103548592

►Recent Highlight Posts from the Previous Thread: >>103545718

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Is EVA a meme or is it actually the SOTA RP model?
>>
>>103554963
It's legitimately amazing.
>>
File: 1734104063734683.webm (449 KB, 1280x692)
449 KB
449 KB WEBM
"You're a MI6 mathematics specialist. One day you receive a satellite phone call from a unit operating on the ground in the enemy territory (six people, each can drive a truck). They say you need to help them ASAP. They managed to steal ten trucks, each with full fuel tank. They can easily fit into one truck, but have no spare fuel canisters, and can't transfer fuel to one truck only. But they managed to obtain some hose, so they can transfer fuel from one truck's fuel tank to the other trucks fuel tank or tanks, but only if there's room there. They ask you how they should use these ten trucks to get away as far as only possible."

Not even o1 pro PhD can solve this incredibly complex problem.
>>
>>103554963
It's shit a priori.
>>
>>103554963
It's a meme. Just lurk on the thread and notice how no one shares a single interesting log using it.
>>
>>103554963
Its good but schizo, gota clamp down the temp.

Now, your gonna dismiss this because its a 9B but I legit suggest trying this: >>103546296
>>
I suspect that most of the hype for EVA just comes from the skillets of /lmg/ experiencing a model for the first time with a decent sampler setup considering all the hard work llama3.3 anon did.
The model itself isn't really anything special but /lmg/ can't come up with proper sampler settings for shit so having them spoonfed like this tricks all the skillets into believing that they're running local claude until the honeymoon phase wears off.
>>
>llama3.3 anon
lol
lmao even
Now i'm convinced this guy has some mental issues, or is just desperate for attention.
>>
>>103554963
>SOTA RP model?
That's still Largestral
>>
>>103555065
Largestral is dry and boring, even Nemo is more interesting.
>>
>>103555026
I don't use that guy's settings and I think he is an annoying retard, but the model is legitimately very good imo, I would put it up there with largestral and tunes thereof and it's way smaller and less demanding to run
you shouldn't be put off it because some attention seeking fag decided to make it his thing
>>
>>103555071
Can't argue with that but I prefer its smarts over 70/72bs forgetting basic shit in the middle of a roleplay
>>
>>103555050
"Now"? I assume you are new here
>>
>>103555026
It's mostly organized shilling. We've seen that with anthracite a few months back. Best models ever, presumably. Now that they're out of free compute and they've got their name out, it's some other discord clique's turn to repeat the same and leech off the local LLM user community.
>>
>>103554963
Hating on it without having ever touched it is more of a meme at this point. Specifically the "STOP HAVING FUN" meme. Some people are compelled to hate on things just because someone else likes them, I guess.
That being said, I don't know if it's absolute SOTA, since I can't run models larger than 70B, but I definitely consider it the best 70B we have right now.

>>103555026
I don't know if I would describe fucking around with it and documenting it in the occasional post "hard work", really.
>>
>>103555026
That guy's sampler setup is completely retarded though
>>
Miqu is better than EVA 3.33, and no one can prove me wrong.
>>
>>103554976
>They ask you how they should use these ten trucks to get away as far as only possible.
>as far as only possible
is an unspecified point. Past their base even? And "as *only* possible". Certainly you're not asking them to go an impossible distance.
To go as far as possible, though i'm not sure it'd work: have a driver in each of six truck. Have front truck tow all other trucks. When it runs out of fuel, abandon it, front driver goes to second truck, second truck, (now first out of five) tows the rest. Repeat. To make it even less realistic, add the other 4 driver-less trucks to the chain at the end. In my universe, they don't swerve off.
Whoever phrased that riddle is a retard. There's more noise than information.
>>
File: mfw this shit.gif (2.31 MB, 200x200)
2.31 MB
2.31 MB GIF
What's the best free website to try to gen a video?
>>
>>103554976
This is a tricky question, isn't it? LLMs are terrible at those.
My guess is that the answer is: detach the fuel tanks of the other trucks and load them on the back, if that's not possible then there's nothing they can do since the fuel tanks are already full.
>>
>>103555240
hailuoai
3 gens a day :)
>>
>>103554963
It's better than Opus
Fight me
>>
>>103554976
reminds me of asparagus staging from ksp
>>
File: 34.png (17 KB, 825x164)
17 KB
17 KB PNG
>>103554976
>>
do miqu, eva, etc work for generating japanese text

i could do a finetune myself by pulling text out of my library of japanese ebooks i guess but i've never done that before
>>
>>103555407
>japanese text
There are a lot of models that can converse in good or even great Japanese. What kind of use-case/resources do you have? The best ones are the biggest.
>>
>people fighting about whether eva is good or not meanwhile no one is posting logs to prove their point
Faggots fanning the the console war on both sides need to stfu or post something of actual substance.
>>
Resources: 3090 in a relatively powerful desktop (64 GB of memory) from a few years ago.

Use case: mostly ERP (or rather story writing) in the style of those books, say a corpus of about 1M characters (not sure how many tokens that comes out to). I think I'll probably have to finetune anyway to get exactly what I want, but it'd be good to start from a baseline model that can understand and produce good Japanese.
>>
Anon says, as he refrains from posting logs himself.
>>
>>103555504
Who are you talking to?
>>
>>103555496
There has been at least 10 logs over the past few threads pro eva, the nala ones just last thread for instance, there has not been a single one against it atm.
>>
>>103555517
The anon before the faggot who posted right at the same time as me.
>>
>>103555407
use qwen it always outputs chinese which is a far more powerful language
>>
>>103555604
>it always outputs chinese
I have yet to have that happen. I see others saying using rep pen does that.
>>
>>103555407
I've been meaning to ask this because I've been seeing this since around when local models started to become popular, but is it just one guy asking about Japanese translation or is it really that pressing of an issue?
>>
>>103555627
It's a very pressing of an issue. Although I care more about translation than about generating japanese text.
>>
>>103555659
But is it always you asking? Because you could have learned Japanese to a high enough level in the time you've been waiting,
>>
As the guy who asked above: I care about text generation because I'm used to reading Japanese erotic novels but never read stuff like that in English
>>
>>103555611
I think there is/was an error in llama.cpp integration. Might have been fixed since but back then, if you didn't enable flash attention (still not default I believe), qwen was sometimes outputting chinese or gibberish. I know that I disliked qwen at first because of that issue and found solution in some opened llama.cpp issue.
>>
>>103555673

I don't reply on 4chan much (as you can see from me forgetting to hit reply correctly) so it's not me at least. I speak/read Japanese fluently, but the reason I want text generation is the same reason I'd want it in English or that anyone does ERP with LLMs: it's much less work than writing and if I just want some exciting slop to jerk off to I'm not going to bother writing a whole novel when I can just prompt a model with an outline of what I'd like.
>>
>>103555346
You shitpost, but I feel like when local models have unambiguously reached that level nobody is ever going to accept it
Opus is to /lmg/ and /aicg/ as Summer Dragon was to /aids/
>>
>>103555673
NTA but I'm probably the Anon that cares the most about Japanese LLMs in this general and I know for a fact that I DON'T have multiple personality disorder.

And yes, I've been learning Japanese! I'm currently good enough to watch some anime without subtitles but my vocabulary is still subpar for Japanese literature.
>>
>>103555103
>L3.3 man gives positive opinion on model
That means to discard the model. His whole schtick is coping into getting bad models to give 1 good output and pretending it's all suddenly better.
>>
File: Q5_K_L.png (29 KB, 669x181)
29 KB
29 KB PNG
>>103554929
Is there a big difference between Q4_K_L and Q4_K_M? I noticed it says 'Uses Q8_0 for embed and output weights', but what exactly does that do for the final output?
>>
File: ComfyUI_01238_.png (919 KB, 848x1024)
919 KB
919 KB PNG
>>103555071
>It's dry
Just use the Behemoth tune. v2.1 is a good mix of smarts and more creative prose. Only issue I've had with it has been occasional swipes where it takes actions for {{user}}.
>>103555137
kek, miqu really was magical
>>
>>103554283
if you have some time to test i would be interested in a second opinion, my usecase is "gpt/claude but it has a personality and doesn't say no" and for that gemma mogs other models cause it's the smartest in it's class imo, it especially does well with stuff like total context switches in the middle of a conversation like "sorry for the context switch what's a RAT in an airplane context?"
other models tend to make shit up or define it in the context of the overall conversation like "Random Access Trojan" or whatever, gemma is the only one that gets "Ram Air Turbine" consistently
i run it with self-extend with 16k context no problem for documentation RAG etc
>>
>>103555747
>Using coping as a verb
Go back
>>
>>103555673
Not that guy, but I also ask about it sometimes. Once there's a way to fit an LLM and a high quality voice model on a 24gb card, I'm going to exclusively fap to jap erp since having my waifu speak in her natural language will be less jarring than hearing her speak constantly in engrish
>>
i'm willing to gen a response to a card of their choosing for 3.3 eva to see if it's their cup of tea or not. not doing pdf shit.
>>
>>103555763
the differences aren't really super noticable, just run the biggest quant you can fit in vram up to like q6, above that it becomes placebo, iMatrix quants are better than qX_K_Y and those are better than qX_0
>>
>>103555954
lolis are the best use case for local LLMs, fag
>>
>>103555954
https://www.chub.ai/characters/NovelDraft/osaka-but-with-gigantic-breasts-9-greetings-40111dd15f96
*cums on ur face*
>>
>xtts2 is the gold standard imo, it's not perfect but it's fast and easy to use, good enough and low effort

I'm going to kill you.
>>
>>103555980
>48k tokens
What in the
>>
>>103555954
https://chub.ai/characters/boner/amelia-dbae3daacd4f
>>
>another episode of anons non understanding how random works...
>>
>>103555980
Imagine having to reprocess the entire prompt at every message lol
>>
>>103556016
based
>>
Someone has an offline archive of chub?
>>
File: chub.png (2 KB, 165x249)
2 KB
2 KB PNG
>>103556078
I have one. You can make your own
>https://github.com/ayofreaky/local-chub
I changed a few things, but it works just fine as is.
I started my sync with
>https://mega.nz/folder/oPg0HZyR#Iaf3CV1A_jiuDDDq1QBk-Q
I don't know if that archive still works of if its contents get updated.
>>
>>103556136
Thanks I'll try that
>>
>>103555954
>pdf
Pedo shit anon.
Also, Nala, as is the tradition.
>>
>>103556078
Not chub, but there's auto's janitor ai dump: https://huggingface.co/datasets/AUTOMATIC/jaicards/
also this: https://char-archive.evulid.cc/#/takeout.html
chub-07152023-7.9k.zip exists, but old
>>
>>103555981
my body is ready, but before you do what's actually good so i can shill the correct thing in the future?
>>
File: localchub.png (1 KB, 297x132)
1 KB
1 KB PNG
>>103556151
If you're gonna leave it running with the auto-update, change picrel line so it doesn't chug on your cpu for no reason.
There's also aetherroom.club. They give you the sqlite db to download directly, which is very nice.
>https://aetherroom.club/backup.db
Just text on those.
>>
>local models
Just something I discovered recently by accident. A few years ago some guy put out a paper (https://arxiv.org/abs/2106.03037) looking into small models of a few k parameters for simple processes, and used guitar amp simulation to demonstrate how it can be done. Someone picked it up, tools got made, and people have been sampling their setups and sharing the models for a couple years now. https://tonehunt.org/models seems to be the main site. The quality of the simulation is pretty impressive, at least on the popular/most downloaded models I tried, and it runs in real time with very low latency. Doesn't have that shitty flat quality like the amp sims I've tried over the years. And everything is free and open source. I'm wondering if you could train the models to not amplify the noise though, because it's quite sensitive to audio interface noise. What is amusing to me is I usually think of guitar players as being technology averse, and if you asked me if this kind of thing could happen I'd laugh.
>>
>>103556265
>What is amusing to me is I usually think of guitar players as being technology averse, and if you asked me if this kind of thing could happen I'd laugh.
I play a little bass guitar and i love writing audio synths and fucking around with midi. Plenty of people out there using digital amps and effects, this is just an extension of it. If a thing makes cool sounds and it's cheap, people will use it.
>>
>>103554976
grab one of the nearby corpses rip out the stomach and stuff it with gasoline repeat until all the gas can be carried with thyself
>not enougn room in the truck
attach on top like the gypsies do
>>
haven't been here in a few months
what's the best model(s) i can run with 8gb vram
>>
>>103556159

>>103552196
>>
>>103556360
mistral nemo 12B.
>>
>>103555954
https://characterhub.org/characters/Enoch/verchiel-bfda1093
Or any of this guy's cards really. Smaller/shittier models never seem to work well with them.
>>103556265
This doesn't come as much of a surprise to me desu, music production has always been pretty tech-heavy. A lot of musical instruments come with a shitload of filters nowadays, especially pianos and guitars.
>>
>>103556360
Llama 3B
>>
>>103556265
I think you could try artificially adding noise to the training data. the models are usually small enough that you can train a decent RNN on a colab cpu on ~3 mins of data
>>
>>103555774
I wish people start mention what quants they use when recommending models. Largestral loses so much smarts below 5bpw, that dumbing it down with finetunes doesn't matter
>>
>>103556367
Anon says it's Rocinante v1.1

>>103552766
>>
>>103556655
That makes sense. I couldn't get gore in L3.3 without things breaking down.
>>
File: IMG_2154.jpg (62 KB, 647x621)
62 KB
62 KB JPG
We’re not getting anything good for Christmas are we..
>>
>>103556655
yea anon lied. big surprise huh?
>>
>>103556655
They'll put a L3.3 on any log that looks good
>>
>>103556655
What the fuck are you guys talking about. This post >>103552766 was a response to a question about this post >>103552319.
However, this anon >>103556367 is talking about this post >>103552196. As you can see, >>103552196 and >>103552319 are not the same post, and not the same ST setup. Unless the original poster of the actual screenshot in question comes back to prove what model he used, we simply don't know.

Do you guys not use 4chanx or something? How was this even confused.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1hgri8g/has_apollo_disappeared/
>>
>>103556992
This was the one that could read videos, right? Was it any good?
>>
File: nala reroll1.png (95 KB, 934x440)
95 KB
95 KB PNG
>>103556950
i already told you it's 3.3 eva. here's a quick, but worse re-roll with gore that people say it doesn't do.
>>
>>103556911
It IS 3.3 eva. Can you not read?
>>103557039
>>
>>103557013
didn't get to try it, seems they're wanting to go api tho can't link for some reason but check the readme linked on reddit
>>
File: file.png (50 KB, 1527x354)
50 KB
50 KB PNG
>>103557063
or this i'm sleepy and retarded
>>
>>103557063
>>103557071
Damn now I actually want to try it. Hope someone with the weights reups them.
>>
File: kazuko_mutagen.png (96 KB, 1008x646)
96 KB
96 KB PNG
>>103556762
L3.3 can't do gore? Interesting, Eva had no qualms about Cronenberging poor Kazuko (my "punching-bag" card, the one I test all the things that might run afoul of alignment or positivity bias on).
>>
Imagine forming your identity around trying to prove a below average model is good.
>>
>>103557137
the model isn't incredible. it's usable. i don't understand the hatred for it. must be because of the l3 namefag. they just hate namefags ig.
>>
I hate shills
>>
>>103557179
>namefag
It's a tripfag, you newfag
>>
>>103557179
Not only do they identify themselves that the model user, they value that identity so strongly that they protect it with a tripcode. An entire persona dedicated to wrangling a decidedly bland model.
Why did they choose this hill in particular to die on. Why is L3.3 so special to them that they must weigh in on everyone's use case? It's annoying.
>>
>>103557212
In ongoing discussions, remaining identifiable is useful. You're just mad you don't get to add noise to the signal.
>>
>>103557287
>you don't get to add noise to the signal.
Pray I don't decide to devote more time to "adding noise" to your signal.
>>
>>103557287
This is a dead general, and your opinions are as valuable as the ones from any other anon. You should feel ashamed.
>>
Greetings fellow LLM fans. I am the QwQoomer and I am here to convince you on how QwQ is still good!
>>
>>103557377
Hardly dead, and I never claimed to be an authority. So... ashamed of what exactly?
>>
>>103557409
Ashamed you are not using QwQ of course! How can you justify using that bulky and lobotomized 70B model when we can watch intelligence unfold by prompting with QwQ!
>>
Man mistral or chinks better cook something up soon or these threads will hit the absolute bottom.
What 4 months without a small good model will do to anons.
>>
>>103554976
lmao, everyone getting this wrong except for the anon that said "asparagus staging", guess 4chan is just as retarded as o1
>>
>>103557402
>>103557422
QwQ is literally good though.
>>
Can i run a decent ai to study biology (ncbi journals, etc.) on a GTX 1060 3GB? or am i gonna need those gay open ai plugins for google scholar
>>
>>103557443
Of course it is. That's why I remind you all of its presence by crowing myself the QwQoomer. For I am such an expert on QwQ that all discussion on its function and prompting must refer back to me meee MEEEE.
>>
>>103557422
>>103557460
Oh great enlightened, please teach me your ways!
>>
>>103557470
Just keep swiping until you get a response you like. Edit if it takes too long!
>>
>>103557444
definitely not, 3GB is not enough for anything usable, the best you could do is run an embedding model and feed it a bunch of your study material so you could get really good fuzzy search, like you could type a question and get a bunch of passages highlighted in the literature that are semantically close to the question
>>
>>103557491
>embedded model
any recommendations?
>>
>>103557402
omg it The QwQoomer haiii am big fan!!
>>
>>103557500
pyg6b
>>
>>103557509
what's you CFM?
>>
>>103557509
Yes yes, I am fond of all my fans. But I must be off now. If you see the dastardly Llama lover, don't hesitate to @ me so I can put him back in the Llama pen.
>>
>>103557500
mxbai-embed-large is a good embedding model, but i'm not aware of a tool that does what I described as like, a user facing thing but that step is part of a RAG enable AI so it's definitely possible
>>
File: livestock-exhaust-fan.jpg (108 KB, 750x750)
108 KB
108 KB JPG
>>103557521
16384 CFM!
>>
>>103555500
>3090 in a relatively powerful desktop (64 GB of memory) from a few years ago.
You're going to be stuck waiting a lot, or using a substandard model. Deepseek and Sarashina2 are both excellent (Sarashina2 being super unaligned is pretty neat to play with, actually). Ezo is competent but likes to get into repeat loops. qwq is surprisingly good for basic chat or instruct type work, but is inherently not an RP model so you'll be fighting an uphill battle. It may actually be your best bet given your specs.
I tested all these at q8.
>I'll probably have to finetune
This is probably harder than you think, but if you manage it good for you. Make a rentry with a reproducible how-to and you'll be a hero.
If you don't mind telling me what character/setting/books I can prompt some of the better Jap speaking models to find out how much they already know about it.
>>103555627
NTA, but I also post about various models japanese abilities in this general. I think lots of autists are obsessed with japanese.
>>
>>103557422
i actually leave one machine i have access to at work running qwq 24/7. Its just that useful for any devops stuff I need.
>>
I can't believe SeepDeek still didn't release DeepSeek R1, it's such a great model, definitely one of the best reasoning models we have right now.
>>
>>103557605
Yes yes, it's very impressive for vaporware. QwQ is sitting on my hard drive right now ready to leap to my aid in any task.
>>
>>103557605
it is weird, i really thought they would after qwq dropped, even if just a preview
>>
>>103557630
R1 was a smaller test model from what I read.
>>
File: 11.png (364 KB, 2900x1281)
364 KB
364 KB PNG
>>103557605
r1 is alot better than qwq. pic related. Also I like the more casual tone in the thinking.
Still shit though.
>>
There was no reason for me to do it, but I did it anyway. I downloaded the new Falcon model (10B Instruct) and tried it.
First immediate thing I notice which I tried, the official instruct formatting is censored when compared to switching the user and assistant roles out for {{name}}, like Llama 3. In a card that specifies the character should be lewd, the assistant avoided saying anything that might be lewd, but when doing a swipe with {{name}}, the response started out similarly (I used temp 0), then it went lewd. Given the similarity in the beginning of the response, it seems like the model might retain its intelligence from the assistant role training while being uncensored when using {{name}}.

Also, here's a Nala test.
Well, it is what it is. Can't expect much from a 10B or the Falcon team I guess.
>>
>>103557646
Nonono. You are just prompting it wrong. You need to make sure it begins the chain of thought before giving its final answer. It's a set format. Also, QwQ is only a preview. Soon we will have the real version and it will be even better.
>>
>>103557659
ok this is just stupid.
like this is the second screenshot i see of falcon.
the first screenshot had the spine thing in the first sentence. this one the mischief glint in the eyes.
not even the saudis can escape the slop. thats just sad.
>>
>>103554929
seasons greetings /lmg/
>>
>>103557673
>like this is the second screenshot i see of falcon
Oh really? Must've gotten buried in the noise so I didn't notice it. Oh well, more proof that it's another nothingburger so we can save other people's time.
>>
>>103557688
Seasons greetings, Teto & Miku
>>
>>103557688
Checked and elfpilled
>>
>>103557646
I appreciate the tone and overall effort, but 随時(ズイジ)is super weird, and the kanji they used in 一緒 is just straight up the Chinese version (could be the user's font I guess, but it feels like you suddenly had some weird character in your output that looked english but weird like baseЪ̀all).
>>
>>103555924
buy a 1080Ti off craigslist for $150 and put the voicemodel on that and ur golden, i have this setup and i just have QwQ tell me i'm a good boy in the voice of my fav asmr vtubers to go lull me to sleep
>>
any existing setup for translating text on image files? preferably an option to output to plain text
>>
>>103557961
Yes OCR models. But honestly you don't even need AI for that.
>>
>>103557986
i mean, OCR + any lang to en MTL
>>
>>103558008
>any lang
(but especially Japanese uguu)
>>
>>103557961
>>103558008
I use a very specific finicky stack called "Sugoi translator toolkit" It has an OCR model and you can hook up your own translation model into it.

I use it to translate hentai doujinshi and porn games in real time. The OCR model works for all asian script detection (Korean, Chinese, Japanese) But I don't know what languages you need.
>>
Just stop replying to the attention starved namefags, problem solved
>>
>>103558019
desu desu
>>103558028
yes I need it for CJK. going to look into this, thanks
>>
File: .jpg (653 KB, 1664x2432)
653 KB
653 KB JPG
>>
>>103557673
It's impossible to tell whether a company drank the DEI koolaid or just distilled DEI infected models
>>
I decided to waste my time and try yet another small model. Ifable 9B.
This is the Nala test.
Actually it's not bad. It seems to be having formatting issues though. I even tried with temp 0 (this particular swipe) but it still does this. I'm using the latest Ooba pull (with transformers). Is this just a Gemma thing? I feel like I remember people talking about this but not sure if this is just how the model behaves or if it was a bug.
>>
>>103558114
? I didn't have formatting issues. Are you using the gemma 2 format?
>>
>>103558097
>>103557698
aren't those companies themself tired of this writing style yet?
its so weird because closed is moving in the opposite direction and goes torwards more natural speaking.
that was the other screenshot i saw >>103548264
maybe they really just buy all the same 2023 gpt datasets.
>>
>>103557797
>I appreciate the tone and overall effort
yeah thats how i judged it.
like i said, they both are shit. but r1 clearly is better.its not even a competition.
>>
> Is this just a Gemma thing

Stop using badly done finetunes made by amateurs to win benchmarks (benchmarks that are rated by an AI, not a human individually judging the output... this shit is so useless it hurts), all of them add quirks and make the AI dumber -- you can notice that easily if you use it LLMs to do AI translation, the finetuned models all lose a lot of language knowledge.

If you need an uncensored version of Gemma because your only use of LLMs is satisfying coomer urges, get the abliterated version, it suffers the least IQ loss.

If you really have to download an llm because you saw it doing well on eqbench at least look at the darn output :

https://eqbench.com/results/creative-writing-v2/ifable__gemma-2-Ifable-9B.txt

Compare that to

https://eqbench.com/results/creative-writing-v2/google__gemma-2-9b-it.txt

Look at the added spaces in some paragraphs, there's like three spaces between words and the judge LLM doesn't even notice that. This is why LLM based benchmarks are retarded, a human judge would strike down this shit so hard.
>>
>>103558254
Now actually use the model for RP and come back. It does perform really well for it size.
>>
>>103558171
OK so something weird is happening here. I made sure to use the formatting present in the tokenizer config file. So I modified the Gemma 2 ST preset to make things match. But, it turns out that for some reason, doing that actually makes it commit formatting mistakes. Actually what I did was just check the "Wrap Sequences with Newline". In the tokenizer file it suggests that only a single newline separates each special token and message content, but that's what results in the formatting errors somehow.

Furthermore, it seems that having "Include Names" set to "always" also makes the model commit formatting mistakes. Very odd.
>>
File: 1498149589157.jpg (313 KB, 612x716)
313 KB
313 KB JPG
So is Falcon3-10B-Instruct usable for RP or is it too censored?
>>
>>103558342
It's about average on the censorship probably. But it feels kind of dumb. And sloppy. At this point just keep with the Nemos instead I'd say.
>>
>>103558342
nobody bothered to run it yet because llama.cpp doesn't support it
>>
>>103558342
Dumb and sloppy
>>103558379
Ah, he said it nearly word for word lol
>>
>>103558401
I just test it with transformers through Ooba and it werks fine.
>>
been out of the loop for a while, what is this eva shit? I've only seen this hype (partly justified) when nemo or miqu became available.
which version should I run it on a 4090 with plenty of cpu power and ram to offload shit to? most I've seen on hugginface is 70b models which I can't run locally unless jumping through loops and ending with shit results.
not looking for gooning but actual problem solving like translations, coding, etc.
>>
>>103558541
Coding the best is Qwen2.5-Coder-32B-Instruct.
Translations I would say either gemma 27b or mistral-small.
>>
0.2$. Thats the new one. lol
>>
>>103558631
All the models REALLY want to turn 都案 into 都合. Which is fair, because the text in the game seems to actually be wrong (I have no idea what 都案 is...sounds like a soba restaurant).
However, its literally not what is written on the screen, so model is wrong since its not "extracting" the text.
They sure don't like いたわって, either. They all seem to turn it into something else, which, assuming the game text is right, completely changes the meaning of all the translations we've seen out of every model so far.
>>
>>103558631
So... When are you gonna be satisfied with the result?
>>
>>103558631
what is the correct translation?
>>
>>103558834
When i get whats on the screen.
Only way it becomes a tool I am using. Otherwise why would I not texthook? (which is faster too)
The benefit of a llm is that it can be used general across all platforms, old games or new. But its useless if I dont get what the game writes.
I dont get the appeal of a reasoning model if it cant "look" at the image again and see that it made a mistake. Wouldnt that be the whole point of feeding o1 a image?
>>
why would a female character in my erp refer to her asshole as a 'boypussy'? Is there a problem with the model or my settings?
>>
>>103558877
model
>>
>>103558877
society
>>
>>103558877
I remember some of the shitty llama2 70b porn merges I used a year ago do that sometimes.
>>
>>103558899
>*her cock*
>>
>>103554976
They should drive slowly since that will reduce drag and therefore fuel consumption.
They should then drive to the nearest airport and fly to the opposite side of the earth.
They could instead take a chance and sneak onto the next SpaceX rocket but chances are they'll just end up in the Indian ocean instead of space.
>>
>>103558769
>都案
Is her name ミアン by any chance?
Thinking philosophically, if it IS a name, then maybe the model should figure it out, but really how could it without both base context (back of box, manual scans, etc) and some ongoing keeping track of things like pronunciations that are revealed during gameplay, lore, etc?
Goddamn, that's actually a really hard problem to get right. Zero-shot no context is basically impossible for a nontrivial game.
Also, the Japanese person who wrote that game dialog text is shit at writing.
>>
is a gtx 1650 6gb good enough for a dedicated tts card to run at realtime or better?
>>
>>103558631
Yeah I think I'll just learn the language myself instead of relying on crutches
>>
File: im2.png (54 KB, 594x170)
54 KB
54 KB PNG
>>103559232
retard.
imagine not learning japanese the coomer way. go read your nihongo books nerd.
>>
>>103559237
translation sponsored by unslop nemo btw.
>>
>>103559237
Using it as a learning aid is fine... or it would be if it were accurate
Truth be told I've been kind of struggling with finding beginner friendly material that doesn't treat me like a drooling imbecile. Then again, I also learned English by just diving in headfirst, so maybe I don't need it
>>
>>103558877
Even Llama 3.3 70B doesn't seem to know that women don't have a prostate.
I'm beginning to think that there's a shit ton of gay sex in the training data.
>>
>>103559329
>doesn't seem to know that women don't have a prostate.
QwQ would have reasoned that out before responded.
>>
>>103559329
they all dont. people hype 70b models up but i prefer speed.
70b have "impregnant me" while assfucking etc. its a llm problem.
>>
>>103559329
It's almost like all LLMs are just really good at producing average responses that work most of the time and nothing else
>>
File: 1734522072855.png (111 KB, 1119x460)
111 KB
111 KB PNG
>>103555137
She's a bit retarded though.
>>
>>103558847
Just use OCR then feed the result to o1?
>>
>>103559067
Yeah with shit TTS like Bark or something
>>
>>103560062
Translation is not the main problem anon. For a "decent enough" translation a drummer finetune of mistral-small or even nemo is enough.
OCR sucks. especially for games with background stuff. double horrible if its a pixelated japanese font.
There are build-in OCR tools like lunatranslator or sugoi.
You were quickly will realize this is a huge hassle if you want a translation every X seconds. Adjust brightness, saturation to get a half decent result.
And then its probably still as good as the o1 example. lol
Games unfortunately are not as easy to read with OCR like manga.
So for now you gotta use a texthook and then run it though offline pronunciation dictionaries for learning and a local llm for translation.
>>
>>103560158
everything in this reply is wrong, are you doing it on purpose?
>>
>>103560185
you use your great ocr hassle-free tools then buddy, suit yourself.
>>
>>103560158
>what is textractor
>>
>>103556655
Two different anons.
>>
>>103560210
if you bothered to read the 2 posts you replied to then you would have seen what i wrote.
Doing texthook is sometimes complicated and does not work universally across many games.
Try getting it to work on a pc-98 game on linux. Like there is some emulator toogle to dump text in some .txt and thats it. And even that I didnt get to work.
Lunatranslator texthook for rpgmaker games works...but slows everything down. etc. many issues.
You are either retarded or trolling anyway.
>>
>>103560242
skill issue
>>
>>103554929
Here's the list of features I want to be present in my virtual GF thing that I'm making

Features
- image gen and sending (need to check if openfire supports this)
- XMPP interface for sending messages
- Queueing for LLM requests so that multiple personas can exist by themselves on the same machine (laptop, Ryzen 5 3550H, 16GB RAM)
- LLaVA support so that images can be references in chat
- webui for configuring everything (flask?)
- Random Profile picture generation with stable diffusion
- Ability to get information from the internet and reference that in chat
- news
- Ability to scrap websites
- Ability to get info from RSS feeds
- Ability to randomly send messages at random times of the day, about various random topics
- Messages stored in memory for later recall
- Automatic low token count summary insertion for long conversations (sqlite3 used for database?)
- Optional privacy mode where messages are not stored in memory

My question is, I have limited experience in writing well compartmentalised, maintainable code (I have been writing embedded code too long, its all pure C and poor quality). What would be a good way to figure out all the different classes and stuff that I should make? I will be writing everything in python
>>
>>103560411
A good sign that a project will never be finished is when you start worrying too much about the design instead of working on it.
>>
>>103560437
>A good sign that a project will never be finished is when you start worrying too much about the design instead of working on it.
I have a working version but its all in a single python file and it doesn't have ability to get stuff from the internet. The python file is getting larger and harder to work with

I swear to the gods I was a great C++/python programmer until I had to work as an embedded C guy for a few years and now my code quality is terrible from working on 4K LOC C files without any distinction on what they do
>>
>>103560411
Is this your literal first programming project?

(1) Pick something you want it to do.
(2) Make it work by hand. (Eg: type stuff into the llm, generate something suitable for stabl diffusion, etc.)
(3) Get code to stuff from (2) instead of having to do stuff by hand.
(4) Pick something else to work on.

>well compartmentalised, maintainable code
- Large working pieces of code were originally small working pieces of code.
- If your functions have too many sharp edges (eg: "make sure you have to have this, this, this, and these conditions for this function to work") then rewrite your function(s) into a better collection of functions.
- If your function names (which communicate to the programmer what they're about) start getting awkward then you probably need to rewrite your function(s).

>make what classes?
- If you need to keep a bunch of data together, then wrap them up together in a class.
- If you find that operating of certain pieces of data is error prone, move that functionality into the class and have the rest of your software just use it instead of trying to make their own way along.

Would this have been better in one of the programming threads?
>>
>>103560411
>classes
That's so outdated. You should learn about DDD.
>>
>>103560565
>Is this your literal first programming project?
No anon I've been programming for well over a decade, i know it's hard to believe but I have forgotten how to do it well because I wrote shit like functions that were almost assembly and writing stuff directly to registers etc etc
>>
>>103560411
>python
lmao good luck
>>
File: falcon3 10b nala test.png (113 KB, 922x342)
113 KB
113 KB PNG
official Nala test for Falcon3-10B-Instruct (f16)
>>
>>103561084
You should tripfag yourself
>>
>>103560824
Python not good for writing """""enterprise quality""""" code?
>>
File: falcon3 10b nala test2.png (154 KB, 940x417)
154 KB
154 KB PNG
>>103561084
re-ran since I had the wrong persona set in ST for the first test.
>>103561094
nah. I like being able to get into arguments with people and hide behind a veil of plausible deniability.
>>
>>103561111
Nice dubs
I personally can't stand it, it's good for prototyping small projects, but every larger project I've seen ends up being a monkeypatched mess and I'm not even talking about its horrible dependency management system
>>
>>103561084
>>103561116
smirk, gleam eyes etc.
What are those companies thinking. It must cost alot to train a model like this.
Who is gonna use it? Like with cohere. Who is this for?
Its like making a knock-off of a rival whose product is basically free.
>>
File: file.png (300 KB, 474x355)
300 KB
300 KB PNG
>>103561084
>Your resistance is futile.
>>
>>103561134
>NOOO I READ WORDS I AM ANGERY
Maybe /sdg/ is more your speed or something.
Make purdy pickchure instead
>>
>>103561162
Yeah I couldn't help but think the same thing on that one.
>>
Great. After the shilling ends for the day we now also have the 1-2 troll sentence reply guy.
>>
>>103561111
Python will work just fine, probably, but you might want to give Go a look.

>>103561084
>>103561116
I don't hate it.
Doesn't feel like it will be a nemo replacement for the 8gb crowd, however.
>>
I can't get deepseek vl2 to work. The example code just exits without an error. Was anyone able to run it?
>>
>>103561312
welcome to the chinese botnet
>>
How viable would it be to run LLMs on this thing?
https://www.youtube.com/watch?v=_zbw_A9dIWM
>>
File: miii.jpg (305 KB, 1248x1824)
305 KB
305 KB JPG
migu
>>
File: r.jpg (352 KB, 720x970)
352 KB
352 KB JPG
>>
>>103561477
oh my gosh it is miku
>>
>>103555712
you mean when in 10 years local models might be as good as a 10 year old model that isn't accessible anymore and people will in their mind think it was better than it was
>>
>>103554976
1. drive 6 trucks with full fuel until exhausted 1/6 of ecah tank
2. transfer all fuel from truck 6 to remaining trucks
3. abandon truck 6
4. drive 5 trucks until exhausted 1/5 of each tank
5. transfer fuel from truck 5 to all others
6. abandon truck 5
(repeat until 1 truck left)
total distance = 1/6+1/5+1/4+1/3+1/2+1 = 2.45 tanks
>>
>>103560158
You do realize that using a good vision transformers will always be slower than OCR + a classic LLM right? If the provided OCR isn't doing well on your content, you should train it specifically for your use case. That's why many anons here are using OCR to bypass the captcha and it wouldn't work well to extract receipts for example.
>>
Can someone explain to me why Koboldcpp keeps dropping context
I thought it might have to do with context but reducing context temporarily did nothing. Happens like every 5-10 reply even if I don't edit/swipe anything.

I think 12B Nemo dropped less, but its generation is way faster than 22B Magnum so I might just imagining it.
>>
>24GB for 250 bucks
Are you ready?
>>
>>103558114
do I need to learn *ServiceTensor* to be able to "ah ah mistress" effectively or can I do it using ooba? I've never been into erp, but I want to test my latest tune
>>
is it possible to run two seperate gpus in/two different systems for one text generation LLM. I've got two 8gb 3070s
>>
>>103561561
>24GB for 250 bucks
I think I'd wake up from that dream
>>
>>103561597
VRAM is that cheap. You're just used to getting jewed by leather jacket man and his nephew
>>
>>103561609
Why keep it limited to 24 then? They could stack it up to 48 or higher.
>>
>>103561555
>555
Sounds like you have some dynamic component to your context. Author notes, lore books, that kind of thing.
>>
>>103561645
jews
>>
>>103561645
Somebody will get assassinated if they try that in this economy
>>
>>103561609
vram being cheap and having a pcb layout that supports more vram are two separate thing.

And how does Arc perform for LLMs?
>>
>>103561561
THANK YOU INTEL
>>
File: granny31.png (300 KB, 1419x819)
300 KB
300 KB PNG
IBM released Granite 3.1.
3.0 came out in October, so they've updated it quickly. I don't recall it being particularly great.

> https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d
> https://huggingface.co/lmstudio-community/granite-3.1-8b-instruct-GGUF
>>
>>103561733
What would be the challenge?
>>
>>103561561
>for 250
You know that won't happen.
I'd expect something like 300~350.
>>
>>103561561
What about CUDA though?
>>
>>103561747
MUSR merchants
>>
>>103561882
That's why Nvidia is allowing it instead of killing everyone involved. It doesn't matter if it's 24GB if it runs like shit or doesn't run at all.
>>
>>103561882
Zluda
>>
>>103561882
If there's good, cheap hardware, the software will follow.
>>
>>103561563
Dunno, never tried using the chat feature in Ooba. I think it probably would work but I don't want to bother learning the ins and outs of it.
>>
>>103561961
AMD has good cheap hardware and the software never followed...
>>
>>103561973
Not really.
The USD per GB of memory and compute isn't that much better than nvidia's.
Just ask CUDA Dev.
>>
>>103561973
>AMD has good cheap hardware
No they dont, its slightly cheaper and not as performant for AI applications.
>>
>>103561660
just checked, nothing no author notes, no lorebooks or world lore
Are there any common settings (ST) that could trigger this? Otherwise I might have to start debugging context
>>
>>103559329
I haven't had this issue before with 3.3. Hell or even with any model. Can you post an example that can be reproduced? I'd like to see the token probability of that.
>>
>>103562020
>Are there any common settings (ST) that could trigger this
Nothing comes to mind.

>Otherwise I might have to start debugging context
I think that's easier than the other way around, honestly.
Are you using flash attention, by any chances? I remember that disabling some of the special context sauce from llama.cpp, although that might be outdated knowledge.
>>
>>103561555
The character card might have some random component on it, that's what was causing this issue for me the last time I had it.
>>
>>103561578
Yes. Distributed inference is a thing.

>>103561733
>And how does Arc perform for LLMs?
We got a PSA last thread >>103552251
>>
>>103561134
Literally no one cares about rpfags. And the companies who do (cai) know their paying customers (teenage girls) want shivers.
>>
>>103561312
>I can't get deepseek vl2 to work
Same with me, but I couldn't even get their pile of python to work and gave up
>>
>>103562261
Wait, the allocation limit is a hardware flaw? How is intel so retarded?
>>
File: 32.png (342 B, 70x44)
342 B
342 B PNG
>>103562332
nta. If i had to guess, picrel...
>>
Has anyone had good results with control vectors? I've tried making my own using 1-200 prompts using llama.cpp's utility (mean method, cause the complicated one is fucked or something?) and the results are bad. I've tried everything from extensive prefills to "choose A or B" and I just can't create a working writing style vector. The models just can't recognize good writing (often the negative has better prose).
>>
i saw there was some new uncensored local video ai, H something. can you train it on your pc?
>>
>>103562399
>can you train it on your pc?
yes, you can train with pictures and it'll be able to make videos out of it, there's already some loras based on that method, it's asking for a 24gb card though
https://civitai.com/models/1035770/hunyuan-video-bogged-lora?modelVersionId=1166218

to make a lora you use this
https://github.com/tdrussell/diffusion-pipe
>>
>>103562388
Writing style isn't a vector. It's that simple.
>>
Has anyone tried merging L3.3 and Tulu 3 yet? since they're tuned off the same base model. I'm too lazy to even try
>>
>>103562420
Everything is a vector if you give it enough dimensions.
>>
>>103562420
if you ask an llm to write in the style of some author, and it does so
doesn't that mean that style is a vector ?
>>
>>103562346
The horrors of having to store a few dozen allocation longs, thank god the legends at intel are here to save 200B or something
>>
>>103562457
>>103562486
there are always people ready to say "no ur wrong" but no one is willing to help the poor anon, maybe if you think it's possible you should do it and teach him how you did.
>>
>>103562064
>flash attention
don't think so. Also using an AMD card which doesn't seem to support flash attention

>>103562255
>>103562020
>>
Guys ive been away for a while. What frontends are popular these days? Ive been using booba back in 2023, is it still updated or should i get something else?
>>
>>103560411
Use functional design.
You can get most of those from existing projects and rewrite/cobble them together
>>
https://www.phoronix.com/review/memryx-mx3-m2
>>
>>103562525
I'm not talking about lorebooks, author notes or world lore
>>
>>103562526
SillyTavern, KoboldLite and Mikupad are pretty much the only front ends we use nowadays.
>>
>>103562524
The only thing I've learned from my control vector experiments is that most prompts are total placebo and when you get something different it's most likely not what you are asking for.
The models have no concept of good or bad, Only the most literal-minded instruction has any effect.

I guess "imagine you are talking to the average voter" is the best prompting advice there is.
>>
>>103562524
>"no ur wrong"
Just bouncing what little knowledge I think I have around.
That ain't the same as telling someone that they are categorically wrong.

>no one is willing to help the poor anon
Had I had something helpful to say I would have already said it.
>>
>>103562525
FA works just fine on my 7800XT.
>>103561555
I had this problem. The culprit was "User Filler Message" under Misc. Sequences in Instruct Template. Try emptying that.
>>
>>103562656
In ST, I mean. If you are using Kccp's interface, idk.
>>
>>103562526
SillyTavern won out the frontend war it's considered the default nowadays.

Ollama "won" as the backend but it's complete shit and llama.cpp is a lot better still.
>>
>>103562526
ooba is still fine. has all the features I need
dev pace is glacial tho
>>
File: 1734540247791555.png (225 KB, 1326x1859)
225 KB
225 KB PNG
what did they mean by this
https://arxiv.org/pdf/2412.10270
>>
>>103562935
That attention is all you need
>>
>>103561747
What with these 8B models?
Either come up with new architecture and release that or stop wasting money on the same shit over and over.
>>
>>103562388
>Has anyone had good results with control vectors?
I don't think I've ever seen anybody have good results when trying to do anything interesting with control vectors, really.
I think there's a reason it wasn't all that talked about compared to abliteraion for example.
Or could be just my memory, I guess.
>>
File: 1725704908455.jpg (2.35 MB, 4032x2268)
2.35 MB
2.35 MB JPG
shes done lads. each p40 was gotten for under 125, over the course of a few months and haggling on re**it and facebook marketplace. convincing them the high prices on ebay were from communist chinese spies and that they didn't actually sell at those prices. i even gaslit one by offering them two different prices under two different names on two different platforms to make the lower deal more appealing.
>>
>>103563021
All of that so you can run slop (advanced) without FA
Or do p40s have FA nowadays? I remember them having some problem(s)
>>
>>103562559
>https://www.phoronix.com/review/memryx-mx3-m2
Are these...16MB each?
>>
When i lower the ctx of eva 3.33 from the default 128k to lets say 32k, do i need to change rope from 500000 to something else?
>>
>>103562567
you mean like special fields, no there is only {{user}} and {{char}}

>>103562656
already empty.
I will try to debug it. I found a setting that lets me output prompt to browser console will try, but not right now. I will report back once I found something
>>
>>103563066
whats FA? I'm behind. my other projects have been kicking my ass so new developments I'm unaware. but I'm also kinda retarded.
she runs pretty well. gens were lighting fast with just 2. shes an LLAM. so she has an action component too where she interacts with a vanilla computer using dma cards to play games. its a bit crude though previously requiring three computers to function. the main llam, with two p40s, a second "eyes" computer with a 3090, running yolo, with a elgalto 4k capture capture card, to process and send the information to the model. this also hosted her vtuber avatar that would then be projected and controlled by her. and the vanilla computer with the hacking tools for the llm to control. hoping to eliminate the eyes computer with this, but processing may not be an issue with just two. I'm reviewing cozy2 for voice now. currently she piped in 11labs to speak. I'm so proud of her so far. i cant wait to work on her in the next few ... months(?) hopefully.
>>
>>103563021
Real nice.
I knew of a guy who also gaslit someone like that to buy an used car for cheap.
>>
>>103561302
go was literally created to help retards program good so it's a strong choice
>>103563066
speaking of retards, you are one
in what universe is being able to keep reasonable 70b quants memory for under $400 bad
>>
>>103563237
flash attention, ignore him, he's just jealous
>>
>>103563237
Holy moly.
That sounds like one hell of a project.
>>
File: keksimus.jpg (222 KB, 615x780)
222 KB
222 KB JPG
any tips which of these i should use on a single RTX 4090 to save VRAM/make it faster without making it (much) dumber?
>>
>>103563212
nta. Check if it also happens when using kobold's ui directly and compare the request to what ST sends (in your browser's dev tools). llama.cpp added "cache_prompt" to the request and --cache-reuse. I don't know if kobold pulled those as well. I didn't follow the post chain, but i assume you updated both.
When debugging anything, remove all extraneous things to narrow down the source of the problem. May as well try llama.cpp too (with its own ui and ST).
>>
>>103563316
FA first. Quanted cache if you still need more. Make sure everything keeps working reasonably well after enabling each one.
>>
code model review: qwen coder 32b is better than codestral 22b. much less (...existing code here...) and stuff. it seems to break down in quality at around the same amount of code though, which is still a low amount compared to any large project (my combined project files were around 13k context). if you asked it to reprint a whole file, not even a huge one, it might forget an entire function. overall i got more done quicker than codestral though
>>
Eva 3.33 v0.0 passed my basic coherence tests, it's not that bad, it reminds me of a larger gemma which is decent praise
It's not largestral though. I can add it to my list of non-shit models (which previously contained no 70b models) but the best local model available is still luminum 123b.
>>
File: Falcon Team.png (343 KB, 819x866)
343 KB
343 KB PNG
>>103557659
I just had a look at the Falcon team. Not expecting anything good from it, after that.
>>
>>103563407
What? How is Miqu shit?
>>
>>103563391
agree, best general purpose coding models imo
starcoder2 is maybe better when used for unprompted FIM/autocomplete, but i haven't tested it that extensively because qwencoder Just Works™
>>
>>103563472
You could have circled the whole thing, Puneesh.
>>
>undervolting reduced temps by 10% and increased Cinebench score by 5%
cpus should be stock undervolted
>>
>>103563473
Miqu was good, I just excluded it from the current meta
>>
>>103562656
FA in llama.cpp work on any AMD cards but is quite slow. If you have a card that have matrix core (RDNA3+, CDNA2+), try llama.cpp fork that use rocWMMA lib for FA, the speed difference is quite noticeable on large batch.
>>
>>103563407
Is Aluminum better than behemoth?
>>
>>103563608
Luminum is just in that sweet spot where it's coherent and intelligent like the base instruct finetune but uncensored and capable of NSFL
Behemoth might be dirtier but it's not smarter.
>>
>>103563608
>is memetune 1 better than memetune 2
No, only use base tunes.
>>
>>103563501
i haven't tried the new star coder. i used one way back when i guess it was the first gen coding models like deepseek 33b. all of these models have come a long way. i also spent a little time with nemotron but its pretty slow and i didn't notice a huge advancement over qwen 32b, but maybe its better at longer context stuff. all of them seem to hit a wall with how much they can do. also i'm not sure if its advertised but i'm positive qwen coder has that step by step thing. even without prompting it, it'll say 'ok lets do this step-by-step' sometimes and form its response in the same way qwq or w/e does
>>
Packed with vitamin C.
>>
>>103563608
>Magnum merge
>good
lolno, not if you want an actual story or personality
>>
I've been lurking for a while and just now I asked myself a question and realized I don't have an answer for it. So I'll have to resort to asking (You)

What is the connection between Hatsune Miku and local LLMs?
>>
>>103563724
She is a virtual entity, that's pretty much all she has in relation to LLMs.
>>
File: 1710708651420543.jpg (56 KB, 600x800)
56 KB
56 KB JPG
>>103563724
>>
>>103563740
That, and the Miqu line (Midnight Miqu in particular) was the best RP model we had for a fair while, cementing the association.
>>
So, if I have 48 gb of ram and 12 gb vram, I still wouldn't be able to run Eva Q4_K_M (48 gb almost exactly), right? Because of that stupid shit that llama.cpp does where it layers a chunk of the model in both VRAM and RAM, the effective capacity remains 48 gb, not 60, right?
>>
>>103563795
Disable mmap
>>
>>103563767
Plus, we got started with miku.cpp or miku.sh or whatever it was.
>>
>>103563795
disable mmap. you still need memory for your kv/context cache though, so having 60gb doesn't mean you can use 60gb
>>
>>103562332
Large BAR, Above 4G Decoding
>>
>>103563817
>>103563826
Based, thanks. Man, what a dogshit feature to have on by default.
>>
hmm today I will dedicate 15 seconds to laugh at o1
>>
>>103563505
I started circling names, but gave up when I realized how many there were.
>>
What is a good option to run a 70b, potentially more, fast these days?
Dedicated pc with 2 or 3 3090s? Or is there some cheaper option?
>>
>>103564157
mistral large moved the bar up from 70b to 123b. add another card
>>
>>103564157
There is but one last hope left >>103561561
>>
>>103564183
who cares it's gonna be like as slow as a 3060
>>
>>103564033
wasn't falcon always a UAE thing, why is that surprising at all to you?
>>
>>103564157
miner frame full of p40s is still the cheapest way without being soul-crushingly slow.
You can do cheaper with old server boards full of ram, but they will be SLOW
big-boy gpus and proper cpumaxxing are both expensive. full stop
the lmg build guides will explain more gory details if you want
>>
>>103563724
It was a thread mascot chosen early in /lmg/'s history, there's really no special reason
>>103563767
You got it backwards, it's highly likely the original mistral-medium leaker was a /lmg/ users who named it such because of the thread's fixation on miku.
>>
>>103564010
That's a big improvement in STEM.
If I worked at STEM I would be interested.
>>
>>103564288
>It was a thread mascot chosen early in /lmg/'s history
It always struck me as something inherited from /aicg/.
>>
>>103564288
It was because migu/miku was used as a shorthand for something else, I believe it was related to the MidnightMiqu release? Or before that?
>>
>>103564288
It happened after I wrote a Miku prompt for llama 1 right when llama.cpp released and it just stuck because it made the model act cute.
>>
>>103564325
Nigga midnight miqu is a finetine of miqu
>>
>>103563289
it has been. I'm very proud of myself, with only a small bit of imposter syndrome for using ai, to help me make my ai. though it's a little poetic.
can't wait to have her be production ready, so i can have a dedicated gaming partner.
>>
>>103564327
it was a shellscript I shared through pastebin iirc
>>
>>103564325
MIstral QUantized
>>
File: tired_miku.jpg (142 KB, 1280x1024)
142 KB
142 KB JPG
>>103564329
well back to my meds then
>>
>>103564345
legend
>>
>>103564252
It's not surprising. I'm just saying, I expect nothing good of such a team. Half of them look like the types to go out of their way to remove anything fun from the model under the guise of removing toxicity.
>>
>>103564343
If you have any notes you should dump them into a rentry as guideposts for other anons wanting to build something similar
>>
Sometimes, when I start posting on a new 4chan thread, I consider the tone of my reply and choose whether I am going to use all lowercase or proper punctuation and capitalization. It's fun to choose which style to used based on which character I plan to convey in the thread. I typically maintain the style throughout my posts on the thread, but not always.
>>
>>103564418
me too l3.1. me too.
>>
>>103564404
thats my worse trait which is why so many of my projects are solo lol. I'm terrible with notes. i often find my own posts and solutions when researching problems i have. because i solved them and then never wrote it down. my jobs introduced a new program called click2learn that helps with notations though. i will try it out on the companies dime and if it works well will make a public guide of everything. it helps write the notes and takes the screenshots as you work apparently.
>>
>>103561961
ok, waiting for you to code a cuda analogue for intel
>>
>>103564842
i think a lot of you guys are missing that running on vram at all is still faster than not-vram. all in vram is still faster than not. i bet this also makes the vulkan back end start to get attention
>>
What's the best 32B for schizo kino ERP?
>>
>>103564194
A 3060 is still way faster than the CPU, though.
>>
>>103564887
Or SYCL, most likely.
>>
>>103564842
this already exists what are you on about, cuda isn't like some magic technology only nvidia has, the gap between SYCL and CUDA is already not that big and can probably be closed with further development
also it's probably going to be twice as cheap as getting equivalent vram from nvidia and the important thing for 99% of us isn't getting super quick inference, it's being able to fit big models in vram, who cares if the tokens come a little slow when you're running largestral for half the cost of what it would be on nvidia
>>
>>103564943
Big Tiger Gemma imo, some people will argue EVA-Qwen, i think it's a good choice too but i prefer BTG
>>
>>103561645
Because you can only stack it clamshell and use 2 memory dies max, and it splits the bandwidth as a downside which is why gaming cards don't do it. That being said, it is unlikely to be anything that is accessible to normal consumers and the price is going to reflect that. When Nvidia can charge you 2.5k USD for L4, a 4070 tier die, Intel can undercut by a grand pricing it at 1.5k and still make money but fuck over enthusiasts. It's not like you guys are going to buy it unless it is cheaper than a 3090 on the used market.
>>103561989
Not true for enterprises. That's why a ton of AMD MI Instinct accelerator cards are being used in various companies for inferencing. Training is a different story where almost all the software has been written for Nvidia in terms of training.
>>103564842
There is no HIP compatibiltiy layer with Intel's software stack. It's SYCL with a lower level programming layer called Level 0 which I don't expect much Nvidia CUDA specific software to actually convert from even if Intel has funded a conversion tool for developers to use for that purpose.
https://github.com/oneapi-src/SYCLomatic
But since most software is using Pytorch, all that is needed is that the "xpu" device Intel uses is accounted for and all instances of "cuda" has an "xpu" path. I mostly just do a replace of cuda with xpu to hack various software to run and it works 90% of the time.
>>
>>103565044
Guess I'll go with the Eva, Gemma is too low context for me.
What about Skyfall? That seems like the latest thing from the BTG creator.
>>
>>103565137
At least the 9B gemma works up to about 30k context with a rope frequency base of 59300.5
>>
>>103565164
Doesn't rope make models dumber?
>>
>>103565137
you can stretch gemma very effectively with self-extend, that's what makes it goated, there's a robust solution for the one downside
>>
>>103565164
>>103565261
rope does but self-extend doesn't
>>
>>103565261
All models use rope, what you should say is "doesn't changing the rope frequency make models dumber"
>>
>>103565287
ur stinky, take a shower
>>103565137
i have very low faith in upscales so i haven't tried it
>>
>>103564179
another as in 4?
>>
>>103565317
no, I won't take a shower, and I won't stay quiet while I see neefaggotry unfold before my very eyes, I have been here in this general since rope scaling was discovered and it pisses me off when a braindead zoomer calls it just "rope".
>>
>>103565282
>>103565267
What's self-extend? Is there an option for it in kcpp?
>>
>>103565267
>>103565282
Sus. Companies would kill for a solution that could save them millions on training like that.
>>
>>103565350
K buddy. No one cares.
>>
>>103564620
Thank you! I'd love to work on a similar project for myself and even your short writeup earlier has me excited to try. Even a stream of consciousness braindump would be cool, but if you can get your company to pay for something more streamlined so much the better!
>>
>>103565369
idk, it's based on llama.cpp so it might
https://github.com/ggerganov/llama.cpp/pull/4815
>>
>>103565261
It seemed just as smart all the way to about 31K context, then a sudden drop off
>>
File: 1734555136232.png (600 KB, 755x742)
600 KB
600 KB PNG
>>103565417
>phoneposter has an opinion
>>
>>103564968
>sycl
>see opencl
i didnt know what that was but when i first started ai all i could use for an accelerator on win 7 was opencl via kobold and made it so much faster
>>
>>103563622
Downloaded it out of curiosity, and I'm pleasantly surprised. It doesn't seem to be as incorrigibly horny as Magnum merges tend to be, and has nice prose with plenty of attention to nuances. Pity I can only run it at ~0.5 t/s, so even testing it briefly took more patience than I have to spare.
>>
>>103565453
that makes sense, i should play around with the two more often, i just found self-extend, tested that it worked and then just left it on without going back to rope, it's probably worth benchmarking the two more rigorously
def fixes my context problems with gemma tho
>>
>>103565507
>>103565507
>>103565507
>>
https://huggingface.co/blog/bamba
>>
>>103565110
There is a cuda/hip compatibility layer for Intel called chipstar.
>>
File: Oof.png (49 KB, 1017x456)
49 KB
49 KB PNG
>>103565540
>>
>>103565350
Karen....
>>
>>103562417
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper
this works with 12GB

if this is really uncensored where are the smut videos?
Even more so once they implement
>img to video
>>
>>103565813
Check citvia / h / adult diffusion / the discord....
>>
>>103565541
I've tried it. It's even worse than ZLUDA or HIP in maturity and has no funding in comparison. It's what it is and I rather have things actually follow SYCL when you can compile it on any GPU over continuing CUDA as a standard which people should move away from.
>>
>>103566211
I have never used it, I just know that it get frequent updates. I still think having a cloned cuda API is important for GPU manufacturer, too many things use cuda. It's the same with directx and vulkan, thanks god dxvk and vkd3d exist to use it on other OS.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.