/lmg/ - a general dedicated to the discussion and development of local language models.Tuesday is Over EditionPrevious threads: >>103545710 & >>103536775►News>(12/17) Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3>(12/16) Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32>(12/14) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2>(12/14) Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361>(12/13) Sberbank releases Russian model based on DeepseekForCausalLM: https://hf.co/ai-sage/GigaChat-20B-A3B-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103545710--Paper: FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores:>103547272 >103547808--Papers:>103547261--Intel Arc updates and LLM running solutions:>103552251--Falcon3 family of open models and their performance:>103547725 >103547743 >103547837 >103547787 >103547898 >103549931 >103547788 >103547888 >103548124--Anon discusses and compares text-to-speech models, including CosyVoice2:>103546353 >103546456 >103546944 >103547034 >103547061 >103547688 >103547800 >103553458 >103553621--Anons discuss a suspicious RTX 4090 listing on AliExpress and share their experiences with Chinese online marketplaces:>103550949 >103551009 >103551689 >103551775 >103552164 >103552204 >103551035 >103551205--Discussion on the effectiveness and comparison of bitnet models:>103553433 >103553448 >103554089 >103553456 >103553486 >103553570 >103553599--Impact of switching from FP16 to int8 inference on model accuracy:>103546155 >103546208 >103549263--Anon seeks dust proofing solutions for open mining rig with 3090s:>103553137 >103553183 >103553339 >103553354--Regex and small model approaches to rewriting sentences:>103549331 >103549353--Gemma 2 9B model's performance in creative writing tasks:>103546296 >103546512--Llama.cpp Vulkan updates and Nvidia involvement:>103550656--FOSDEM 2025: Quantization in llama.cpp:>103550704--Anon asks about running Linux with Windows VM for gaming and LLM use:>103549612 >103549760 >103549709 >103549854--Anon gets Cosyvoice 0.5b working, shares audio sample:>103547577 >103549538 >103554651--Anon discovers speculative decoding for speedup:>103549662 >103549673 >103549762 >103549842 >103549866 >103549863 >103549952--Miku (free space):>103546325 >103548490 >103548592►Recent Highlight Posts from the Previous Thread: >>103545718Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Is EVA a meme or is it actually the SOTA RP model?
>>103554963It's legitimately amazing.
"You're a MI6 mathematics specialist. One day you receive a satellite phone call from a unit operating on the ground in the enemy territory (six people, each can drive a truck). They say you need to help them ASAP. They managed to steal ten trucks, each with full fuel tank. They can easily fit into one truck, but have no spare fuel canisters, and can't transfer fuel to one truck only. But they managed to obtain some hose, so they can transfer fuel from one truck's fuel tank to the other trucks fuel tank or tanks, but only if there's room there. They ask you how they should use these ten trucks to get away as far as only possible."Not even o1 pro PhD can solve this incredibly complex problem.
>>103554963It's shit a priori.
>>103554963It's a meme. Just lurk on the thread and notice how no one shares a single interesting log using it.
>>103554963Its good but schizo, gota clamp down the temp.Now, your gonna dismiss this because its a 9B but I legit suggest trying this: >>103546296
I suspect that most of the hype for EVA just comes from the skillets of /lmg/ experiencing a model for the first time with a decent sampler setup considering all the hard work llama3.3 anon did. The model itself isn't really anything special but /lmg/ can't come up with proper sampler settings for shit so having them spoonfed like this tricks all the skillets into believing that they're running local claude until the honeymoon phase wears off.
>llama3.3 anonlollmao evenNow i'm convinced this guy has some mental issues, or is just desperate for attention.
>>103554963>SOTA RP model?That's still Largestral
>>103555065Largestral is dry and boring, even Nemo is more interesting.
>>103555026I don't use that guy's settings and I think he is an annoying retard, but the model is legitimately very good imo, I would put it up there with largestral and tunes thereof and it's way smaller and less demanding to runyou shouldn't be put off it because some attention seeking fag decided to make it his thing
>>103555071Can't argue with that but I prefer its smarts over 70/72bs forgetting basic shit in the middle of a roleplay
>>103555050"Now"? I assume you are new here
>>103555026It's mostly organized shilling. We've seen that with anthracite a few months back. Best models ever, presumably. Now that they're out of free compute and they've got their name out, it's some other discord clique's turn to repeat the same and leech off the local LLM user community.
>>103554963Hating on it without having ever touched it is more of a meme at this point. Specifically the "STOP HAVING FUN" meme. Some people are compelled to hate on things just because someone else likes them, I guess.That being said, I don't know if it's absolute SOTA, since I can't run models larger than 70B, but I definitely consider it the best 70B we have right now.>>103555026I don't know if I would describe fucking around with it and documenting it in the occasional post "hard work", really.
>>103555026That guy's sampler setup is completely retarded though
Miqu is better than EVA 3.33, and no one can prove me wrong.
>>103554976>They ask you how they should use these ten trucks to get away as far as only possible.>as far as only possibleis an unspecified point. Past their base even? And "as *only* possible". Certainly you're not asking them to go an impossible distance.To go as far as possible, though i'm not sure it'd work: have a driver in each of six truck. Have front truck tow all other trucks. When it runs out of fuel, abandon it, front driver goes to second truck, second truck, (now first out of five) tows the rest. Repeat. To make it even less realistic, add the other 4 driver-less trucks to the chain at the end. In my universe, they don't swerve off.Whoever phrased that riddle is a retard. There's more noise than information.
What's the best free website to try to gen a video?
>>103554976This is a tricky question, isn't it? LLMs are terrible at those.My guess is that the answer is: detach the fuel tanks of the other trucks and load them on the back, if that's not possible then there's nothing they can do since the fuel tanks are already full.
>>103555240hailuoai3 gens a day :)
>>103554963It's better than OpusFight me
>>103554976reminds me of asparagus staging from ksp
>>103554976
do miqu, eva, etc work for generating japanese texti could do a finetune myself by pulling text out of my library of japanese ebooks i guess but i've never done that before
>>103555407>japanese textThere are a lot of models that can converse in good or even great Japanese. What kind of use-case/resources do you have? The best ones are the biggest.
>people fighting about whether eva is good or not meanwhile no one is posting logs to prove their pointFaggots fanning the the console war on both sides need to stfu or post something of actual substance.
Resources: 3090 in a relatively powerful desktop (64 GB of memory) from a few years ago.Use case: mostly ERP (or rather story writing) in the style of those books, say a corpus of about 1M characters (not sure how many tokens that comes out to). I think I'll probably have to finetune anyway to get exactly what I want, but it'd be good to start from a baseline model that can understand and produce good Japanese.
Anon says, as he refrains from posting logs himself.
>>103555504Who are you talking to?
>>103555496There has been at least 10 logs over the past few threads pro eva, the nala ones just last thread for instance, there has not been a single one against it atm.
>>103555517The anon before the faggot who posted right at the same time as me.
>>103555407use qwen it always outputs chinese which is a far more powerful language
>>103555604>it always outputs chineseI have yet to have that happen. I see others saying using rep pen does that.
>>103555407I've been meaning to ask this because I've been seeing this since around when local models started to become popular, but is it just one guy asking about Japanese translation or is it really that pressing of an issue?
>>103555627It's a very pressing of an issue. Although I care more about translation than about generating japanese text.
>>103555659But is it always you asking? Because you could have learned Japanese to a high enough level in the time you've been waiting,
As the guy who asked above: I care about text generation because I'm used to reading Japanese erotic novels but never read stuff like that in English
>>103555611I think there is/was an error in llama.cpp integration. Might have been fixed since but back then, if you didn't enable flash attention (still not default I believe), qwen was sometimes outputting chinese or gibberish. I know that I disliked qwen at first because of that issue and found solution in some opened llama.cpp issue.
>>103555673I don't reply on 4chan much (as you can see from me forgetting to hit reply correctly) so it's not me at least. I speak/read Japanese fluently, but the reason I want text generation is the same reason I'd want it in English or that anyone does ERP with LLMs: it's much less work than writing and if I just want some exciting slop to jerk off to I'm not going to bother writing a whole novel when I can just prompt a model with an outline of what I'd like.
>>103555346You shitpost, but I feel like when local models have unambiguously reached that level nobody is ever going to accept itOpus is to /lmg/ and /aicg/ as Summer Dragon was to /aids/
>>103555673NTA but I'm probably the Anon that cares the most about Japanese LLMs in this general and I know for a fact that I DON'T have multiple personality disorder.And yes, I've been learning Japanese! I'm currently good enough to watch some anime without subtitles but my vocabulary is still subpar for Japanese literature.
>>103555103>L3.3 man gives positive opinion on modelThat means to discard the model. His whole schtick is coping into getting bad models to give 1 good output and pretending it's all suddenly better.
>>103554929Is there a big difference between Q4_K_L and Q4_K_M? I noticed it says 'Uses Q8_0 for embed and output weights', but what exactly does that do for the final output?
>>103555071>It's dryJust use the Behemoth tune. v2.1 is a good mix of smarts and more creative prose. Only issue I've had with it has been occasional swipes where it takes actions for {{user}}. >>103555137kek, miqu really was magical
>>103554283if you have some time to test i would be interested in a second opinion, my usecase is "gpt/claude but it has a personality and doesn't say no" and for that gemma mogs other models cause it's the smartest in it's class imo, it especially does well with stuff like total context switches in the middle of a conversation like "sorry for the context switch what's a RAT in an airplane context?"other models tend to make shit up or define it in the context of the overall conversation like "Random Access Trojan" or whatever, gemma is the only one that gets "Ram Air Turbine" consistentlyi run it with self-extend with 16k context no problem for documentation RAG etc
>>103555747>Using coping as a verbGo back
>>103555673Not that guy, but I also ask about it sometimes. Once there's a way to fit an LLM and a high quality voice model on a 24gb card, I'm going to exclusively fap to jap erp since having my waifu speak in her natural language will be less jarring than hearing her speak constantly in engrish
i'm willing to gen a response to a card of their choosing for 3.3 eva to see if it's their cup of tea or not. not doing pdf shit.
>>103555763the differences aren't really super noticable, just run the biggest quant you can fit in vram up to like q6, above that it becomes placebo, iMatrix quants are better than qX_K_Y and those are better than qX_0
>>103555954lolis are the best use case for local LLMs, fag
>>103555954https://www.chub.ai/characters/NovelDraft/osaka-but-with-gigantic-breasts-9-greetings-40111dd15f96*cums on ur face*
>xtts2 is the gold standard imo, it's not perfect but it's fast and easy to use, good enough and low effortI'm going to kill you.
>>103555980>48k tokensWhat in the
>>103555954https://chub.ai/characters/boner/amelia-dbae3daacd4f
>another episode of anons non understanding how random works...
>>103555980Imagine having to reprocess the entire prompt at every message lol
>>103556016based
Someone has an offline archive of chub?
>>103556078I have one. You can make your own>https://github.com/ayofreaky/local-chubI changed a few things, but it works just fine as is.I started my sync with>https://mega.nz/folder/oPg0HZyR#Iaf3CV1A_jiuDDDq1QBk-QI don't know if that archive still works of if its contents get updated.
>>103556136Thanks I'll try that
>>103555954>pdfPedo shit anon.Also, Nala, as is the tradition.
>>103556078Not chub, but there's auto's janitor ai dump: https://huggingface.co/datasets/AUTOMATIC/jaicards/also this: https://char-archive.evulid.cc/#/takeout.htmlchub-07152023-7.9k.zip exists, but old
>>103555981my body is ready, but before you do what's actually good so i can shill the correct thing in the future?
>>103556151If you're gonna leave it running with the auto-update, change picrel line so it doesn't chug on your cpu for no reason.There's also aetherroom.club. They give you the sqlite db to download directly, which is very nice.>https://aetherroom.club/backup.dbJust text on those.
>local modelsJust something I discovered recently by accident. A few years ago some guy put out a paper (https://arxiv.org/abs/2106.03037) looking into small models of a few k parameters for simple processes, and used guitar amp simulation to demonstrate how it can be done. Someone picked it up, tools got made, and people have been sampling their setups and sharing the models for a couple years now. https://tonehunt.org/models seems to be the main site. The quality of the simulation is pretty impressive, at least on the popular/most downloaded models I tried, and it runs in real time with very low latency. Doesn't have that shitty flat quality like the amp sims I've tried over the years. And everything is free and open source. I'm wondering if you could train the models to not amplify the noise though, because it's quite sensitive to audio interface noise. What is amusing to me is I usually think of guitar players as being technology averse, and if you asked me if this kind of thing could happen I'd laugh.
>>103556265>What is amusing to me is I usually think of guitar players as being technology averse, and if you asked me if this kind of thing could happen I'd laugh.I play a little bass guitar and i love writing audio synths and fucking around with midi. Plenty of people out there using digital amps and effects, this is just an extension of it. If a thing makes cool sounds and it's cheap, people will use it.
>>103554976grab one of the nearby corpses rip out the stomach and stuff it with gasoline repeat until all the gas can be carried with thyself>not enougn room in the truckattach on top like the gypsies do
haven't been here in a few monthswhat's the best model(s) i can run with 8gb vram
>>103556159>>103552196
>>103556360mistral nemo 12B.
>>103555954https://characterhub.org/characters/Enoch/verchiel-bfda1093Or any of this guy's cards really. Smaller/shittier models never seem to work well with them.>>103556265This doesn't come as much of a surprise to me desu, music production has always been pretty tech-heavy. A lot of musical instruments come with a shitload of filters nowadays, especially pianos and guitars.
>>103556360Llama 3B
>>103556265I think you could try artificially adding noise to the training data. the models are usually small enough that you can train a decent RNN on a colab cpu on ~3 mins of data
>>103555774I wish people start mention what quants they use when recommending models. Largestral loses so much smarts below 5bpw, that dumbing it down with finetunes doesn't matter
>>103556367Anon says it's Rocinante v1.1>>103552766
>>103556655That makes sense. I couldn't get gore in L3.3 without things breaking down.
We’re not getting anything good for Christmas are we..
>>103556655yea anon lied. big surprise huh?
>>103556655They'll put a L3.3 on any log that looks good
>>103556655What the fuck are you guys talking about. This post >>103552766 was a response to a question about this post >>103552319.However, this anon >>103556367 is talking about this post >>103552196. As you can see, >>103552196 and >>103552319 are not the same post, and not the same ST setup. Unless the original poster of the actual screenshot in question comes back to prove what model he used, we simply don't know.Do you guys not use 4chanx or something? How was this even confused.
https://www.reddit.com/r/LocalLLaMA/comments/1hgri8g/has_apollo_disappeared/
>>103556992This was the one that could read videos, right? Was it any good?
>>103556950i already told you it's 3.3 eva. here's a quick, but worse re-roll with gore that people say it doesn't do.
>>103556911It IS 3.3 eva. Can you not read?>>103557039
>>103557013didn't get to try it, seems they're wanting to go api tho can't link for some reason but check the readme linked on reddit
>>103557063or this i'm sleepy and retarded
>>103557063>>103557071Damn now I actually want to try it. Hope someone with the weights reups them.
>>103556762L3.3 can't do gore? Interesting, Eva had no qualms about Cronenberging poor Kazuko (my "punching-bag" card, the one I test all the things that might run afoul of alignment or positivity bias on).
Imagine forming your identity around trying to prove a below average model is good.
>>103557137the model isn't incredible. it's usable. i don't understand the hatred for it. must be because of the l3 namefag. they just hate namefags ig.
I hate shills
>>103557179>namefagIt's a tripfag, you newfag
>>103557179Not only do they identify themselves that the model user, they value that identity so strongly that they protect it with a tripcode. An entire persona dedicated to wrangling a decidedly bland model. Why did they choose this hill in particular to die on. Why is L3.3 so special to them that they must weigh in on everyone's use case? It's annoying.
>>103557212In ongoing discussions, remaining identifiable is useful. You're just mad you don't get to add noise to the signal.
>>103557287>you don't get to add noise to the signal.Pray I don't decide to devote more time to "adding noise" to your signal.
>>103557287This is a dead general, and your opinions are as valuable as the ones from any other anon. You should feel ashamed.
Greetings fellow LLM fans. I am the QwQoomer and I am here to convince you on how QwQ is still good!
>>103557377Hardly dead, and I never claimed to be an authority. So... ashamed of what exactly?
>>103557409Ashamed you are not using QwQ of course! How can you justify using that bulky and lobotomized 70B model when we can watch intelligence unfold by prompting with QwQ!
Man mistral or chinks better cook something up soon or these threads will hit the absolute bottom.What 4 months without a small good model will do to anons.
>>103554976lmao, everyone getting this wrong except for the anon that said "asparagus staging", guess 4chan is just as retarded as o1
>>103557402>>103557422QwQ is literally good though.
Can i run a decent ai to study biology (ncbi journals, etc.) on a GTX 1060 3GB? or am i gonna need those gay open ai plugins for google scholar
>>103557443Of course it is. That's why I remind you all of its presence by crowing myself the QwQoomer. For I am such an expert on QwQ that all discussion on its function and prompting must refer back to me meee MEEEE.
>>103557422>>103557460Oh great enlightened, please teach me your ways!
>>103557470Just keep swiping until you get a response you like. Edit if it takes too long!
>>103557444definitely not, 3GB is not enough for anything usable, the best you could do is run an embedding model and feed it a bunch of your study material so you could get really good fuzzy search, like you could type a question and get a bunch of passages highlighted in the literature that are semantically close to the question
>>103557491>embedded modelany recommendations?
>>103557402omg it The QwQoomer haiii am big fan!!
>>103557500pyg6b
>>103557509what's you CFM?
>>103557509Yes yes, I am fond of all my fans. But I must be off now. If you see the dastardly Llama lover, don't hesitate to @ me so I can put him back in the Llama pen.
>>103557500mxbai-embed-large is a good embedding model, but i'm not aware of a tool that does what I described as like, a user facing thing but that step is part of a RAG enable AI so it's definitely possible
>>10355752116384 CFM!
>>103555500>3090 in a relatively powerful desktop (64 GB of memory) from a few years ago.You're going to be stuck waiting a lot, or using a substandard model. Deepseek and Sarashina2 are both excellent (Sarashina2 being super unaligned is pretty neat to play with, actually). Ezo is competent but likes to get into repeat loops. qwq is surprisingly good for basic chat or instruct type work, but is inherently not an RP model so you'll be fighting an uphill battle. It may actually be your best bet given your specs.I tested all these at q8.>I'll probably have to finetuneThis is probably harder than you think, but if you manage it good for you. Make a rentry with a reproducible how-to and you'll be a hero.If you don't mind telling me what character/setting/books I can prompt some of the better Jap speaking models to find out how much they already know about it.>>103555627NTA, but I also post about various models japanese abilities in this general. I think lots of autists are obsessed with japanese.
>>103557422i actually leave one machine i have access to at work running qwq 24/7. Its just that useful for any devops stuff I need.
I can't believe SeepDeek still didn't release DeepSeek R1, it's such a great model, definitely one of the best reasoning models we have right now.
>>103557605Yes yes, it's very impressive for vaporware. QwQ is sitting on my hard drive right now ready to leap to my aid in any task.
>>103557605it is weird, i really thought they would after qwq dropped, even if just a preview
>>103557630R1 was a smaller test model from what I read.
>>103557605r1 is alot better than qwq. pic related. Also I like the more casual tone in the thinking.Still shit though.
There was no reason for me to do it, but I did it anyway. I downloaded the new Falcon model (10B Instruct) and tried it.First immediate thing I notice which I tried, the official instruct formatting is censored when compared to switching the user and assistant roles out for {{name}}, like Llama 3. In a card that specifies the character should be lewd, the assistant avoided saying anything that might be lewd, but when doing a swipe with {{name}}, the response started out similarly (I used temp 0), then it went lewd. Given the similarity in the beginning of the response, it seems like the model might retain its intelligence from the assistant role training while being uncensored when using {{name}}.Also, here's a Nala test.Well, it is what it is. Can't expect much from a 10B or the Falcon team I guess.
>>103557646Nonono. You are just prompting it wrong. You need to make sure it begins the chain of thought before giving its final answer. It's a set format. Also, QwQ is only a preview. Soon we will have the real version and it will be even better.
>>103557659ok this is just stupid.like this is the second screenshot i see of falcon.the first screenshot had the spine thing in the first sentence. this one the mischief glint in the eyes. not even the saudis can escape the slop. thats just sad.
>>103554929seasons greetings /lmg/
>>103557673>like this is the second screenshot i see of falconOh really? Must've gotten buried in the noise so I didn't notice it. Oh well, more proof that it's another nothingburger so we can save other people's time.
>>103557688Seasons greetings, Teto & Miku
>>103557688Checked and elfpilled
>>103557646I appreciate the tone and overall effort, but 随時(ズイジ)is super weird, and the kanji they used in 一緒 is just straight up the Chinese version (could be the user's font I guess, but it feels like you suddenly had some weird character in your output that looked english but weird like baseЪ̀all).
>>103555924buy a 1080Ti off craigslist for $150 and put the voicemodel on that and ur golden, i have this setup and i just have QwQ tell me i'm a good boy in the voice of my fav asmr vtubers to go lull me to sleep
any existing setup for translating text on image files? preferably an option to output to plain text
>>103557961Yes OCR models. But honestly you don't even need AI for that.
>>103557986i mean, OCR + any lang to en MTL
>>103558008>any lang(but especially Japanese uguu)
>>103557961>>103558008I use a very specific finicky stack called "Sugoi translator toolkit" It has an OCR model and you can hook up your own translation model into it.I use it to translate hentai doujinshi and porn games in real time. The OCR model works for all asian script detection (Korean, Chinese, Japanese) But I don't know what languages you need.
Just stop replying to the attention starved namefags, problem solved
>>103558019desu desu>>103558028yes I need it for CJK. going to look into this, thanks
>>103557673It's impossible to tell whether a company drank the DEI koolaid or just distilled DEI infected models
I decided to waste my time and try yet another small model. Ifable 9B.This is the Nala test.Actually it's not bad. It seems to be having formatting issues though. I even tried with temp 0 (this particular swipe) but it still does this. I'm using the latest Ooba pull (with transformers). Is this just a Gemma thing? I feel like I remember people talking about this but not sure if this is just how the model behaves or if it was a bug.
>>103558114? I didn't have formatting issues. Are you using the gemma 2 format?
>>103558097>>103557698aren't those companies themself tired of this writing style yet?its so weird because closed is moving in the opposite direction and goes torwards more natural speaking.that was the other screenshot i saw >>103548264maybe they really just buy all the same 2023 gpt datasets.
>>103557797>I appreciate the tone and overall effortyeah thats how i judged it.like i said, they both are shit. but r1 clearly is better.its not even a competition.
> Is this just a Gemma thingStop using badly done finetunes made by amateurs to win benchmarks (benchmarks that are rated by an AI, not a human individually judging the output... this shit is so useless it hurts), all of them add quirks and make the AI dumber -- you can notice that easily if you use it LLMs to do AI translation, the finetuned models all lose a lot of language knowledge. If you need an uncensored version of Gemma because your only use of LLMs is satisfying coomer urges, get the abliterated version, it suffers the least IQ loss. If you really have to download an llm because you saw it doing well on eqbench at least look at the darn output :https://eqbench.com/results/creative-writing-v2/ifable__gemma-2-Ifable-9B.txtCompare that tohttps://eqbench.com/results/creative-writing-v2/google__gemma-2-9b-it.txtLook at the added spaces in some paragraphs, there's like three spaces between words and the judge LLM doesn't even notice that. This is why LLM based benchmarks are retarded, a human judge would strike down this shit so hard.
>>103558254Now actually use the model for RP and come back. It does perform really well for it size.
>>103558171OK so something weird is happening here. I made sure to use the formatting present in the tokenizer config file. So I modified the Gemma 2 ST preset to make things match. But, it turns out that for some reason, doing that actually makes it commit formatting mistakes. Actually what I did was just check the "Wrap Sequences with Newline". In the tokenizer file it suggests that only a single newline separates each special token and message content, but that's what results in the formatting errors somehow.Furthermore, it seems that having "Include Names" set to "always" also makes the model commit formatting mistakes. Very odd.
So is Falcon3-10B-Instruct usable for RP or is it too censored?
>>103558342It's about average on the censorship probably. But it feels kind of dumb. And sloppy. At this point just keep with the Nemos instead I'd say.
>>103558342nobody bothered to run it yet because llama.cpp doesn't support it
>>103558342Dumb and sloppy>>103558379Ah, he said it nearly word for word lol
>>103558401I just test it with transformers through Ooba and it werks fine.
been out of the loop for a while, what is this eva shit? I've only seen this hype (partly justified) when nemo or miqu became available.which version should I run it on a 4090 with plenty of cpu power and ram to offload shit to? most I've seen on hugginface is 70b models which I can't run locally unless jumping through loops and ending with shit results.not looking for gooning but actual problem solving like translations, coding, etc.
>>103558541Coding the best is Qwen2.5-Coder-32B-Instruct.Translations I would say either gemma 27b or mistral-small.
0.2$. Thats the new one. lol
>>103558631All the models REALLY want to turn 都案 into 都合. Which is fair, because the text in the game seems to actually be wrong (I have no idea what 都案 is...sounds like a soba restaurant).However, its literally not what is written on the screen, so model is wrong since its not "extracting" the text.They sure don't like いたわって, either. They all seem to turn it into something else, which, assuming the game text is right, completely changes the meaning of all the translations we've seen out of every model so far.
>>103558631So... When are you gonna be satisfied with the result?
>>103558631what is the correct translation?
>>103558834When i get whats on the screen.Only way it becomes a tool I am using. Otherwise why would I not texthook? (which is faster too)The benefit of a llm is that it can be used general across all platforms, old games or new. But its useless if I dont get what the game writes.I dont get the appeal of a reasoning model if it cant "look" at the image again and see that it made a mistake. Wouldnt that be the whole point of feeding o1 a image?
why would a female character in my erp refer to her asshole as a 'boypussy'? Is there a problem with the model or my settings?
>>103558877model
>>103558877society
>>103558877I remember some of the shitty llama2 70b porn merges I used a year ago do that sometimes.
>>103558899>*her cock*
>>103554976They should drive slowly since that will reduce drag and therefore fuel consumption.They should then drive to the nearest airport and fly to the opposite side of the earth.They could instead take a chance and sneak onto the next SpaceX rocket but chances are they'll just end up in the Indian ocean instead of space.
>>103558769>都案Is her name ミアン by any chance?Thinking philosophically, if it IS a name, then maybe the model should figure it out, but really how could it without both base context (back of box, manual scans, etc) and some ongoing keeping track of things like pronunciations that are revealed during gameplay, lore, etc?Goddamn, that's actually a really hard problem to get right. Zero-shot no context is basically impossible for a nontrivial game.Also, the Japanese person who wrote that game dialog text is shit at writing.
is a gtx 1650 6gb good enough for a dedicated tts card to run at realtime or better?
>>103558631Yeah I think I'll just learn the language myself instead of relying on crutches
>>103559232retard.imagine not learning japanese the coomer way. go read your nihongo books nerd.
>>103559237translation sponsored by unslop nemo btw.
>>103559237Using it as a learning aid is fine... or it would be if it were accurateTruth be told I've been kind of struggling with finding beginner friendly material that doesn't treat me like a drooling imbecile. Then again, I also learned English by just diving in headfirst, so maybe I don't need it
>>103558877Even Llama 3.3 70B doesn't seem to know that women don't have a prostate.I'm beginning to think that there's a shit ton of gay sex in the training data.
>>103559329>doesn't seem to know that women don't have a prostate.QwQ would have reasoned that out before responded.
>>103559329they all dont. people hype 70b models up but i prefer speed.70b have "impregnant me" while assfucking etc. its a llm problem.
>>103559329It's almost like all LLMs are just really good at producing average responses that work most of the time and nothing else
>>103555137She's a bit retarded though.
>>103558847Just use OCR then feed the result to o1?
>>103559067Yeah with shit TTS like Bark or something
>>103560062Translation is not the main problem anon. For a "decent enough" translation a drummer finetune of mistral-small or even nemo is enough.OCR sucks. especially for games with background stuff. double horrible if its a pixelated japanese font.There are build-in OCR tools like lunatranslator or sugoi.You were quickly will realize this is a huge hassle if you want a translation every X seconds. Adjust brightness, saturation to get a half decent result.And then its probably still as good as the o1 example. lolGames unfortunately are not as easy to read with OCR like manga.So for now you gotta use a texthook and then run it though offline pronunciation dictionaries for learning and a local llm for translation.
>>103560158everything in this reply is wrong, are you doing it on purpose?
>>103560185you use your great ocr hassle-free tools then buddy, suit yourself.
>>103560158>what is textractor
>>103556655Two different anons.
>>103560210if you bothered to read the 2 posts you replied to then you would have seen what i wrote.Doing texthook is sometimes complicated and does not work universally across many games.Try getting it to work on a pc-98 game on linux. Like there is some emulator toogle to dump text in some .txt and thats it. And even that I didnt get to work.Lunatranslator texthook for rpgmaker games works...but slows everything down. etc. many issues.You are either retarded or trolling anyway.
>>103560242skill issue
>>103554929Here's the list of features I want to be present in my virtual GF thing that I'm makingFeatures- image gen and sending (need to check if openfire supports this)- XMPP interface for sending messages- Queueing for LLM requests so that multiple personas can exist by themselves on the same machine (laptop, Ryzen 5 3550H, 16GB RAM)- LLaVA support so that images can be references in chat- webui for configuring everything (flask?)- Random Profile picture generation with stable diffusion- Ability to get information from the internet and reference that in chat - news - Ability to scrap websites - Ability to get info from RSS feeds- Ability to randomly send messages at random times of the day, about various random topics- Messages stored in memory for later recall- Automatic low token count summary insertion for long conversations (sqlite3 used for database?)- Optional privacy mode where messages are not stored in memoryMy question is, I have limited experience in writing well compartmentalised, maintainable code (I have been writing embedded code too long, its all pure C and poor quality). What would be a good way to figure out all the different classes and stuff that I should make? I will be writing everything in python
>>103560411A good sign that a project will never be finished is when you start worrying too much about the design instead of working on it.
>>103560437>A good sign that a project will never be finished is when you start worrying too much about the design instead of working on it.I have a working version but its all in a single python file and it doesn't have ability to get stuff from the internet. The python file is getting larger and harder to work withI swear to the gods I was a great C++/python programmer until I had to work as an embedded C guy for a few years and now my code quality is terrible from working on 4K LOC C files without any distinction on what they do
>>103560411Is this your literal first programming project?(1) Pick something you want it to do.(2) Make it work by hand. (Eg: type stuff into the llm, generate something suitable for stabl diffusion, etc.)(3) Get code to stuff from (2) instead of having to do stuff by hand.(4) Pick something else to work on.>well compartmentalised, maintainable code- Large working pieces of code were originally small working pieces of code.- If your functions have too many sharp edges (eg: "make sure you have to have this, this, this, and these conditions for this function to work") then rewrite your function(s) into a better collection of functions.- If your function names (which communicate to the programmer what they're about) start getting awkward then you probably need to rewrite your function(s).>make what classes?- If you need to keep a bunch of data together, then wrap them up together in a class.- If you find that operating of certain pieces of data is error prone, move that functionality into the class and have the rest of your software just use it instead of trying to make their own way along.Would this have been better in one of the programming threads?
>>103560411>classesThat's so outdated. You should learn about DDD.
>>103560565>Is this your literal first programming project?No anon I've been programming for well over a decade, i know it's hard to believe but I have forgotten how to do it well because I wrote shit like functions that were almost assembly and writing stuff directly to registers etc etc
>>103560411>pythonlmao good luck
official Nala test for Falcon3-10B-Instruct (f16)
>>103561084You should tripfag yourself
>>103560824Python not good for writing """""enterprise quality""""" code?
>>103561084re-ran since I had the wrong persona set in ST for the first test. >>103561094nah. I like being able to get into arguments with people and hide behind a veil of plausible deniability.
>>103561111Nice dubsI personally can't stand it, it's good for prototyping small projects, but every larger project I've seen ends up being a monkeypatched mess and I'm not even talking about its horrible dependency management system
>>103561084>>103561116smirk, gleam eyes etc.What are those companies thinking. It must cost alot to train a model like this.Who is gonna use it? Like with cohere. Who is this for?Its like making a knock-off of a rival whose product is basically free.
>>103561084>Your resistance is futile.
>>103561134>NOOO I READ WORDS I AM ANGERYMaybe /sdg/ is more your speed or something. Make purdy pickchure instead
>>103561162Yeah I couldn't help but think the same thing on that one.
Great. After the shilling ends for the day we now also have the 1-2 troll sentence reply guy.
>>103561111Python will work just fine, probably, but you might want to give Go a look.>>103561084>>103561116I don't hate it.Doesn't feel like it will be a nemo replacement for the 8gb crowd, however.
I can't get deepseek vl2 to work. The example code just exits without an error. Was anyone able to run it?
>>103561312welcome to the chinese botnet
How viable would it be to run LLMs on this thing?https://www.youtube.com/watch?v=_zbw_A9dIWM
migu
>>103561477oh my gosh it is miku
>>103555712you mean when in 10 years local models might be as good as a 10 year old model that isn't accessible anymore and people will in their mind think it was better than it was
>>1035549761. drive 6 trucks with full fuel until exhausted 1/6 of ecah tank2. transfer all fuel from truck 6 to remaining trucks3. abandon truck 64. drive 5 trucks until exhausted 1/5 of each tank5. transfer fuel from truck 5 to all others6. abandon truck 5(repeat until 1 truck left)total distance = 1/6+1/5+1/4+1/3+1/2+1 = 2.45 tanks
>>103560158You do realize that using a good vision transformers will always be slower than OCR + a classic LLM right? If the provided OCR isn't doing well on your content, you should train it specifically for your use case. That's why many anons here are using OCR to bypass the captcha and it wouldn't work well to extract receipts for example.
Can someone explain to me why Koboldcpp keeps dropping contextI thought it might have to do with context but reducing context temporarily did nothing. Happens like every 5-10 reply even if I don't edit/swipe anything.I think 12B Nemo dropped less, but its generation is way faster than 22B Magnum so I might just imagining it.
>24GB for 250 bucksAre you ready?
>>103558114do I need to learn *ServiceTensor* to be able to "ah ah mistress" effectively or can I do it using ooba? I've never been into erp, but I want to test my latest tune
is it possible to run two seperate gpus in/two different systems for one text generation LLM. I've got two 8gb 3070s
>>103561561>24GB for 250 bucksI think I'd wake up from that dream
>>103561597VRAM is that cheap. You're just used to getting jewed by leather jacket man and his nephew
>>103561609Why keep it limited to 24 then? They could stack it up to 48 or higher.
>>103561555>555Sounds like you have some dynamic component to your context. Author notes, lore books, that kind of thing.
>>103561645jews
>>103561645Somebody will get assassinated if they try that in this economy
>>103561609vram being cheap and having a pcb layout that supports more vram are two separate thing.And how does Arc perform for LLMs?
>>103561561THANK YOU INTEL
IBM released Granite 3.1.3.0 came out in October, so they've updated it quickly. I don't recall it being particularly great.> https://huggingface.co/collections/ibm-granite/granite-31-language-models-6751dbbf2f3389bec5c6f02d> https://huggingface.co/lmstudio-community/granite-3.1-8b-instruct-GGUF
>>103561733What would be the challenge?
>>103561561>for 250You know that won't happen.I'd expect something like 300~350.
>>103561561What about CUDA though?
>>103561747MUSR merchants
>>103561882That's why Nvidia is allowing it instead of killing everyone involved. It doesn't matter if it's 24GB if it runs like shit or doesn't run at all.
>>103561882Zluda
>>103561882If there's good, cheap hardware, the software will follow.
>>103561563Dunno, never tried using the chat feature in Ooba. I think it probably would work but I don't want to bother learning the ins and outs of it.
>>103561961AMD has good cheap hardware and the software never followed...
>>103561973Not really.The USD per GB of memory and compute isn't that much better than nvidia's.Just ask CUDA Dev.
>>103561973>AMD has good cheap hardwareNo they dont, its slightly cheaper and not as performant for AI applications.
>>103561660just checked, nothing no author notes, no lorebooks or world loreAre there any common settings (ST) that could trigger this? Otherwise I might have to start debugging context
>>103559329I haven't had this issue before with 3.3. Hell or even with any model. Can you post an example that can be reproduced? I'd like to see the token probability of that.
>>103562020>Are there any common settings (ST) that could trigger thisNothing comes to mind.>Otherwise I might have to start debugging contextI think that's easier than the other way around, honestly.Are you using flash attention, by any chances? I remember that disabling some of the special context sauce from llama.cpp, although that might be outdated knowledge.
>>103561555The character card might have some random component on it, that's what was causing this issue for me the last time I had it.
>>103561578Yes. Distributed inference is a thing.>>103561733>And how does Arc perform for LLMs?We got a PSA last thread >>103552251
>>103561134Literally no one cares about rpfags. And the companies who do (cai) know their paying customers (teenage girls) want shivers.
>>103561312>I can't get deepseek vl2 to workSame with me, but I couldn't even get their pile of python to work and gave up
>>103562261Wait, the allocation limit is a hardware flaw? How is intel so retarded?
>>103562332nta. If i had to guess, picrel...
Has anyone had good results with control vectors? I've tried making my own using 1-200 prompts using llama.cpp's utility (mean method, cause the complicated one is fucked or something?) and the results are bad. I've tried everything from extensive prefills to "choose A or B" and I just can't create a working writing style vector. The models just can't recognize good writing (often the negative has better prose).
i saw there was some new uncensored local video ai, H something. can you train it on your pc?
>>103562399>can you train it on your pc?yes, you can train with pictures and it'll be able to make videos out of it, there's already some loras based on that method, it's asking for a 24gb card thoughhttps://civitai.com/models/1035770/hunyuan-video-bogged-lora?modelVersionId=1166218to make a lora you use thishttps://github.com/tdrussell/diffusion-pipe
>>103562388Writing style isn't a vector. It's that simple.
Has anyone tried merging L3.3 and Tulu 3 yet? since they're tuned off the same base model. I'm too lazy to even try
>>103562420Everything is a vector if you give it enough dimensions.
>>103562420if you ask an llm to write in the style of some author, and it does sodoesn't that mean that style is a vector ?
>>103562346The horrors of having to store a few dozen allocation longs, thank god the legends at intel are here to save 200B or something
>>103562457>>103562486there are always people ready to say "no ur wrong" but no one is willing to help the poor anon, maybe if you think it's possible you should do it and teach him how you did.
>>103562064>flash attentiondon't think so. Also using an AMD card which doesn't seem to support flash attention>>103562255>>103562020
Guys ive been away for a while. What frontends are popular these days? Ive been using booba back in 2023, is it still updated or should i get something else?
>>103560411Use functional design.You can get most of those from existing projects and rewrite/cobble them together
https://www.phoronix.com/review/memryx-mx3-m2
>>103562525I'm not talking about lorebooks, author notes or world lore
>>103562526SillyTavern, KoboldLite and Mikupad are pretty much the only front ends we use nowadays.
>>103562524The only thing I've learned from my control vector experiments is that most prompts are total placebo and when you get something different it's most likely not what you are asking for.The models have no concept of good or bad, Only the most literal-minded instruction has any effect.I guess "imagine you are talking to the average voter" is the best prompting advice there is.
>>103562524>"no ur wrong"Just bouncing what little knowledge I think I have around.That ain't the same as telling someone that they are categorically wrong.>no one is willing to help the poor anonHad I had something helpful to say I would have already said it.
>>103562525FA works just fine on my 7800XT.>>103561555I had this problem. The culprit was "User Filler Message" under Misc. Sequences in Instruct Template. Try emptying that.
>>103562656In ST, I mean. If you are using Kccp's interface, idk.
>>103562526SillyTavern won out the frontend war it's considered the default nowadays.Ollama "won" as the backend but it's complete shit and llama.cpp is a lot better still.
>>103562526ooba is still fine. has all the features I needdev pace is glacial tho
what did they mean by thishttps://arxiv.org/pdf/2412.10270
>>103562935That attention is all you need
>>103561747What with these 8B models?Either come up with new architecture and release that or stop wasting money on the same shit over and over.
>>103562388>Has anyone had good results with control vectors?I don't think I've ever seen anybody have good results when trying to do anything interesting with control vectors, really.I think there's a reason it wasn't all that talked about compared to abliteraion for example.Or could be just my memory, I guess.
shes done lads. each p40 was gotten for under 125, over the course of a few months and haggling on re**it and facebook marketplace. convincing them the high prices on ebay were from communist chinese spies and that they didn't actually sell at those prices. i even gaslit one by offering them two different prices under two different names on two different platforms to make the lower deal more appealing.
>>103563021All of that so you can run slop (advanced) without FAOr do p40s have FA nowadays? I remember them having some problem(s)
>>103562559>https://www.phoronix.com/review/memryx-mx3-m2Are these...16MB each?
When i lower the ctx of eva 3.33 from the default 128k to lets say 32k, do i need to change rope from 500000 to something else?
>>103562567you mean like special fields, no there is only {{user}} and {{char}}>>103562656already empty.I will try to debug it. I found a setting that lets me output prompt to browser console will try, but not right now. I will report back once I found something
>>103563066whats FA? I'm behind. my other projects have been kicking my ass so new developments I'm unaware. but I'm also kinda retarded. she runs pretty well. gens were lighting fast with just 2. shes an LLAM. so she has an action component too where she interacts with a vanilla computer using dma cards to play games. its a bit crude though previously requiring three computers to function. the main llam, with two p40s, a second "eyes" computer with a 3090, running yolo, with a elgalto 4k capture capture card, to process and send the information to the model. this also hosted her vtuber avatar that would then be projected and controlled by her. and the vanilla computer with the hacking tools for the llm to control. hoping to eliminate the eyes computer with this, but processing may not be an issue with just two. I'm reviewing cozy2 for voice now. currently she piped in 11labs to speak. I'm so proud of her so far. i cant wait to work on her in the next few ... months(?) hopefully.
>>103563021Real nice. I knew of a guy who also gaslit someone like that to buy an used car for cheap.
>>103561302go was literally created to help retards program good so it's a strong choice>>103563066speaking of retards, you are onein what universe is being able to keep reasonable 70b quants memory for under $400 bad
>>103563237flash attention, ignore him, he's just jealous
>>103563237Holy moly.That sounds like one hell of a project.
any tips which of these i should use on a single RTX 4090 to save VRAM/make it faster without making it (much) dumber?
>>103563212nta. Check if it also happens when using kobold's ui directly and compare the request to what ST sends (in your browser's dev tools). llama.cpp added "cache_prompt" to the request and --cache-reuse. I don't know if kobold pulled those as well. I didn't follow the post chain, but i assume you updated both.When debugging anything, remove all extraneous things to narrow down the source of the problem. May as well try llama.cpp too (with its own ui and ST).
>>103563316FA first. Quanted cache if you still need more. Make sure everything keeps working reasonably well after enabling each one.
code model review: qwen coder 32b is better than codestral 22b. much less (...existing code here...) and stuff. it seems to break down in quality at around the same amount of code though, which is still a low amount compared to any large project (my combined project files were around 13k context). if you asked it to reprint a whole file, not even a huge one, it might forget an entire function. overall i got more done quicker than codestral though
Eva 3.33 v0.0 passed my basic coherence tests, it's not that bad, it reminds me of a larger gemma which is decent praiseIt's not largestral though. I can add it to my list of non-shit models (which previously contained no 70b models) but the best local model available is still luminum 123b.
>>103557659I just had a look at the Falcon team. Not expecting anything good from it, after that.
>>103563407What? How is Miqu shit?
>>103563391agree, best general purpose coding models imostarcoder2 is maybe better when used for unprompted FIM/autocomplete, but i haven't tested it that extensively because qwencoder Just Works™
>>103563472You could have circled the whole thing, Puneesh.
>undervolting reduced temps by 10% and increased Cinebench score by 5%cpus should be stock undervolted
>>103563473Miqu was good, I just excluded it from the current meta
>>103562656FA in llama.cpp work on any AMD cards but is quite slow. If you have a card that have matrix core (RDNA3+, CDNA2+), try llama.cpp fork that use rocWMMA lib for FA, the speed difference is quite noticeable on large batch.
>>103563407Is Aluminum better than behemoth?
>>103563608Luminum is just in that sweet spot where it's coherent and intelligent like the base instruct finetune but uncensored and capable of NSFLBehemoth might be dirtier but it's not smarter.
>>103563608>is memetune 1 better than memetune 2No, only use base tunes.
>>103563501i haven't tried the new star coder. i used one way back when i guess it was the first gen coding models like deepseek 33b. all of these models have come a long way. i also spent a little time with nemotron but its pretty slow and i didn't notice a huge advancement over qwen 32b, but maybe its better at longer context stuff. all of them seem to hit a wall with how much they can do. also i'm not sure if its advertised but i'm positive qwen coder has that step by step thing. even without prompting it, it'll say 'ok lets do this step-by-step' sometimes and form its response in the same way qwq or w/e does
Packed with vitamin C.
>>103563608>Magnum merge>goodlolno, not if you want an actual story or personality
I've been lurking for a while and just now I asked myself a question and realized I don't have an answer for it. So I'll have to resort to asking (You)What is the connection between Hatsune Miku and local LLMs?
>>103563724She is a virtual entity, that's pretty much all she has in relation to LLMs.
>>103563724
>>103563740That, and the Miqu line (Midnight Miqu in particular) was the best RP model we had for a fair while, cementing the association.
So, if I have 48 gb of ram and 12 gb vram, I still wouldn't be able to run Eva Q4_K_M (48 gb almost exactly), right? Because of that stupid shit that llama.cpp does where it layers a chunk of the model in both VRAM and RAM, the effective capacity remains 48 gb, not 60, right?
>>103563795Disable mmap
>>103563767Plus, we got started with miku.cpp or miku.sh or whatever it was.
>>103563795disable mmap. you still need memory for your kv/context cache though, so having 60gb doesn't mean you can use 60gb
>>103562332Large BAR, Above 4G Decoding
>>103563817>>103563826Based, thanks. Man, what a dogshit feature to have on by default.
hmm today I will dedicate 15 seconds to laugh at o1
>>103563505I started circling names, but gave up when I realized how many there were.
What is a good option to run a 70b, potentially more, fast these days?Dedicated pc with 2 or 3 3090s? Or is there some cheaper option?
>>103564157mistral large moved the bar up from 70b to 123b. add another card
>>103564157There is but one last hope left >>103561561
>>103564183who cares it's gonna be like as slow as a 3060
>>103564033 wasn't falcon always a UAE thing, why is that surprising at all to you?
>>103564157miner frame full of p40s is still the cheapest way without being soul-crushingly slow.You can do cheaper with old server boards full of ram, but they will be SLOWbig-boy gpus and proper cpumaxxing are both expensive. full stopthe lmg build guides will explain more gory details if you want
>>103563724It was a thread mascot chosen early in /lmg/'s history, there's really no special reason>>103563767You got it backwards, it's highly likely the original mistral-medium leaker was a /lmg/ users who named it such because of the thread's fixation on miku.
>>103564010That's a big improvement in STEM.If I worked at STEM I would be interested.
>>103564288>It was a thread mascot chosen early in /lmg/'s historyIt always struck me as something inherited from /aicg/.
>>103564288It was because migu/miku was used as a shorthand for something else, I believe it was related to the MidnightMiqu release? Or before that?
>>103564288It happened after I wrote a Miku prompt for llama 1 right when llama.cpp released and it just stuck because it made the model act cute.
>>103564325Nigga midnight miqu is a finetine of miqu
>>103563289it has been. I'm very proud of myself, with only a small bit of imposter syndrome for using ai, to help me make my ai. though it's a little poetic. can't wait to have her be production ready, so i can have a dedicated gaming partner.
>>103564327it was a shellscript I shared through pastebin iirc
>>103564325MIstral QUantized
>>103564329well back to my meds then
>>103564345legend
>>103564252It's not surprising. I'm just saying, I expect nothing good of such a team. Half of them look like the types to go out of their way to remove anything fun from the model under the guise of removing toxicity.
>>103564343If you have any notes you should dump them into a rentry as guideposts for other anons wanting to build something similar
Sometimes, when I start posting on a new 4chan thread, I consider the tone of my reply and choose whether I am going to use all lowercase or proper punctuation and capitalization. It's fun to choose which style to used based on which character I plan to convey in the thread. I typically maintain the style throughout my posts on the thread, but not always.
>>103564418me too l3.1. me too.
>>103564404thats my worse trait which is why so many of my projects are solo lol. I'm terrible with notes. i often find my own posts and solutions when researching problems i have. because i solved them and then never wrote it down. my jobs introduced a new program called click2learn that helps with notations though. i will try it out on the companies dime and if it works well will make a public guide of everything. it helps write the notes and takes the screenshots as you work apparently.
>>103561961ok, waiting for you to code a cuda analogue for intel
>>103564842i think a lot of you guys are missing that running on vram at all is still faster than not-vram. all in vram is still faster than not. i bet this also makes the vulkan back end start to get attention
What's the best 32B for schizo kino ERP?
>>103564194A 3060 is still way faster than the CPU, though.
>>103564887Or SYCL, most likely.
>>103564842this already exists what are you on about, cuda isn't like some magic technology only nvidia has, the gap between SYCL and CUDA is already not that big and can probably be closed with further developmentalso it's probably going to be twice as cheap as getting equivalent vram from nvidia and the important thing for 99% of us isn't getting super quick inference, it's being able to fit big models in vram, who cares if the tokens come a little slow when you're running largestral for half the cost of what it would be on nvidia
>>103564943Big Tiger Gemma imo, some people will argue EVA-Qwen, i think it's a good choice too but i prefer BTG
>>103561645Because you can only stack it clamshell and use 2 memory dies max, and it splits the bandwidth as a downside which is why gaming cards don't do it. That being said, it is unlikely to be anything that is accessible to normal consumers and the price is going to reflect that. When Nvidia can charge you 2.5k USD for L4, a 4070 tier die, Intel can undercut by a grand pricing it at 1.5k and still make money but fuck over enthusiasts. It's not like you guys are going to buy it unless it is cheaper than a 3090 on the used market.>>103561989Not true for enterprises. That's why a ton of AMD MI Instinct accelerator cards are being used in various companies for inferencing. Training is a different story where almost all the software has been written for Nvidia in terms of training.>>103564842There is no HIP compatibiltiy layer with Intel's software stack. It's SYCL with a lower level programming layer called Level 0 which I don't expect much Nvidia CUDA specific software to actually convert from even if Intel has funded a conversion tool for developers to use for that purpose.https://github.com/oneapi-src/SYCLomaticBut since most software is using Pytorch, all that is needed is that the "xpu" device Intel uses is accounted for and all instances of "cuda" has an "xpu" path. I mostly just do a replace of cuda with xpu to hack various software to run and it works 90% of the time.
>>103565044Guess I'll go with the Eva, Gemma is too low context for me.What about Skyfall? That seems like the latest thing from the BTG creator.
>>103565137At least the 9B gemma works up to about 30k context with a rope frequency base of 59300.5
>>103565164Doesn't rope make models dumber?
>>103565137you can stretch gemma very effectively with self-extend, that's what makes it goated, there's a robust solution for the one downside
>>103565164>>103565261rope does but self-extend doesn't
>>103565261All models use rope, what you should say is "doesn't changing the rope frequency make models dumber"
>>103565287ur stinky, take a shower>>103565137i have very low faith in upscales so i haven't tried it
>>103564179another as in 4?
>>103565317no, I won't take a shower, and I won't stay quiet while I see neefaggotry unfold before my very eyes, I have been here in this general since rope scaling was discovered and it pisses me off when a braindead zoomer calls it just "rope".
>>103565282>>103565267What's self-extend? Is there an option for it in kcpp?
>>103565267>>103565282Sus. Companies would kill for a solution that could save them millions on training like that.
>>103565350K buddy. No one cares.
>>103564620Thank you! I'd love to work on a similar project for myself and even your short writeup earlier has me excited to try. Even a stream of consciousness braindump would be cool, but if you can get your company to pay for something more streamlined so much the better!
>>103565369idk, it's based on llama.cpp so it mighthttps://github.com/ggerganov/llama.cpp/pull/4815
>>103565261It seemed just as smart all the way to about 31K context, then a sudden drop off
>>103565417>phoneposter has an opinion
>>103564968>sycl>see opencli didnt know what that was but when i first started ai all i could use for an accelerator on win 7 was opencl via kobold and made it so much faster
>>103563622Downloaded it out of curiosity, and I'm pleasantly surprised. It doesn't seem to be as incorrigibly horny as Magnum merges tend to be, and has nice prose with plenty of attention to nuances. Pity I can only run it at ~0.5 t/s, so even testing it briefly took more patience than I have to spare.
>>103565453that makes sense, i should play around with the two more often, i just found self-extend, tested that it worked and then just left it on without going back to rope, it's probably worth benchmarking the two more rigorouslydef fixes my context problems with gemma tho
>>103565507>>103565507>>103565507
https://huggingface.co/blog/bamba
>>103565110There is a cuda/hip compatibility layer for Intel called chipstar.
>>103565540
>>103565350Karen....
>>103562417https://github.com/kijai/ComfyUI-HunyuanVideoWrapperthis works with 12GBif this is really uncensored where are the smut videos?Even more so once they implement>img to video
>>103565813Check citvia / h / adult diffusion / the discord....
>>103565541I've tried it. It's even worse than ZLUDA or HIP in maturity and has no funding in comparison. It's what it is and I rather have things actually follow SYCL when you can compile it on any GPU over continuing CUDA as a standard which people should move away from.
>>103566211I have never used it, I just know that it get frequent updates. I still think having a cloned cuda API is important for GPU manufacturer, too many things use cuda. It's the same with directx and vulkan, thanks god dxvk and vkd3d exist to use it on other OS.