/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103057367 & >>103045507►News>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
--- A Measure of the Current Meta ---> a suggestion of what to try from (You)96GB VRAMQwen/Qwen2.5-72B-Instruct-Q8_0.gguf (aka the best of the best)anthracite-org/magnum-v4-72b-gguf-Q8_0.gguf64GB VRAMQwen/Qwen2.5-72B-Instruct-Q5_K_M.ggufanthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf48GB VRAMQwen/Qwen2.5-72B-Instruct-IQ4_XS.ggufanthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf24GB VRAMQwen/Qwen2.5-32B-Instruct-Q4_K_M.ggufEVA-UNIT-01/EVA-Qwen2.5-32B-v0.1-Q4_K_M.gguf16GB VRAMQwen/Qwen2.5-14B-Instruct-Q6_K.ggufEVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q6_K.gguf12GB VRAMQwen/Qwen2.5-14B-Instruct-Q4_K_M.ggufEVA-UNIT-01/EVA-Qwen2.5-14B-v0.1-Q4_K_M.gguf8GB VRAMmistralai/Mistral-Nemo-Instruct-2407-IQ4_XS.ggufanthracite-org/magnum-v4-12b-IQ4_XS.ggufTheDrummer/Rocinante-12B-v1.1-IQ4_XS.ggufPotato>>>/g/aicg> fite me
the day feels cactussy
>>103066797Qwen2.5-14B sucks. also, who the fuck has anywhere near that much vram? gtfo
>>103066797>wasting half your list on models nobody can runkill yoruself
>>103066856>who the fuck has anywhere near that much vram16GB? Really?
>>103066839*hinussy
>>103066878hi petra
After some delays, I have finally reached 50 questions for the culture benchmark. Yay yay. 50 more to go.
►Recent Highlights from the Previous Thread: >>103057367--Performance metrics for serving Nemo 12B on 3060:>103063955 >103064000 >103064021 >103064090 >103064327--AirLLM: Running 70B models on a 4GB GPU:>103057999 >103058937 >103058887 >103058905 >103058920 >103058952--Vision models' limitations and the importance of using the right tool:>103063172 >103063204 >103063222 >103063259--Using LLMs for Japanese language practice:>103060042 >103060081 >103060603 >103062854--Using AI for schoolwork and engaging with material:>103063477 >103063618--Improving qwen2.5-14b-instruct model performance:>103064188 >103064358 >103064438 >103064447 >103064557 >103064586--Impact of training data on AI model performance:>103064185 >103064211 >103064271 >103064349 >103064504 >103064522 >103064544 >103064686 >103064719--Feasibility of running larger AI models on a specific CPU setup:>103059441 >103059450 >103059458 >103059493 >103059518--Chinese military applications of Meta's Llama model:>103060964 >103061191 >103061289 >103063195--Challenges for AI hobbyists in acquiring GPUs and potential alternatives:>103062045 >103062087 >103062371 >103062410 >103062421 >103063411 >103062126 >103063389 >103063433 >103063791--AMD releases first 1B Language Model, OLMo:>103057637--Sovits TTS issues and troubleshooting:>103058368 >103058484 >103058503 >103058523 >103058602 >103058873 >103058923 >103058547 >103058823--SillyVoice GitHub repository shared:>103064724--Nostalgic chat with OLMo AI chatbot:>103058013 >103058032 >103058102 >103058986--Free LLM proxy service and speculation about funding and purpose:>103059078 >103059141 >103059323 >103059194--Discussion of the SFT DPO version of Olmo and AMD's AI offerings:>103058800 >103058833 >103061584--Miku (free space):>103057373 >103057424 >103057484 >103057983 >103059088 >103065518►Recent Highlight Posts from the Previous Thread: >>103057368Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>103066899>no reply news gets added>mine doesntKYS
Miku stew!
>>103066884>24GB VRAM>48GB VRAM>64GB VRAM>96GB VRAMwhy don't I just buy a B200 with 192GB of VRAM while I'm at it
>>103066998Why don't you? You're not poor are you?
>>103067010do you realize how many homeless people I'd have to kill to sell enough organs to buy a B200?
>>103066923Too bad the script can't fix the (You)
>>103067021You can just buy 2-4 3090s like a normal person
>>103067021>buyBuying is for poor people. Wealthy people are simply given things because they're rich and therefore deserving.
>>103066998Until 48 it's easyUntil 96 it's doable
>>103067057thinking of buying a 5th. Talk me out of it bros.
>>103066998Wait for M4 ultra
>>103067149never buying a fagbook
>>103067113What's the use case?
Where are the deepseek v2.5 finetunes?
>>103067158that's a legitimate question desu.The anthrafags should try to do one.
>>103067113how would you use it?
>405B Qtip>109e9 bytesWhy did they skip from 70 to 40fucking5? Why not 20fucking5B?I suffer.
>>103067155Then wait for some company to hopefully release some inference software for below 10K then.
>>103066925I didn't run the bot since previous thread hit page 5, but even so I just ran it again and the bot thought it was offtopic.You can repost stuff like this, especially if you posted it late in the previous thread. For anyone else that missed it:https://www.pcworld.com/article/2504035/security-flaws-found-in-all-nvidia-geforce-gpus-update-drivers-asap.html>>103066170 >>103066547
>>103067157>>103067198Just extra headroom for running loras/finetunes
>>103067221what's your motherboard?
>>103067158Its not deepseek but sorcerer 8x22B is about mistral large level but at a good speed.
>>103067237But yea, I keep hoping someone will try a deepseek tune. Its prob the smartest model with weights release. But god damn its dry, even with high temp.
>>103066797where's mythomax on this list? its like the best model out there..
>>103067237>8x22B>worse than Gemma 27BNah
>>103067213100B got canned at the last minute
>>103067259Yea, not true at all.
>>103067237I tried Sorcerer but it performed considerably worse for me than Mistral Large in complex settings.
>>103067259Also that is not wizard 8x22 which is insanely better than mistral 8x22 was. Like no one knows how they did it better still. And then they got wiped from existence.
>>103067259>command-r-plus-0824why did it have to end up like this?
>>103067270>>1030672898x22B is a Reddit meme. Remember that it was released in a rush between Command R+ and Llama 3.>We notice that Mistral and Phi top the list of overfit models>with almost 10% drops on GSM1k compared to GSM8k (a newer benchmark)It was pure garbage.
>>103067274Did you try vicuna formatting? Regular mistral formatting I noticed had issues:https://huggingface.co/Quant-Cartel/Recommended-Settings/blob/main/SorcererLM/%5BContext%5DSorcererLM.json
>>103067326Overfitting is a good thing if your not retarded. Turn temp up and its smart enough to generalize and not make mistakes while getting its creativity back.
>>103067345Are you retarded? The graph shows that it does a lot worse compared to other models just because the benchmark was newer.
>>103067356Look up grokking
>>103067374Sounds dirty.
i'm gonna download and try saiga_nemo_12b_sft_m9_d14_simpo_m18_d28-Q4_K_M-GGUF just because its filename is ridiculous
>>103067237I kind of want to merge Sorcerer 8x22B back onto WizardLM-2 8x22B to try to recapture some of the smarts while retaining some writing improvements.
>>103066797I am glad this is starting to become a post that is posted at the start of the thread. Saves time when you can call someone a retard and simply link the post at the top of the thread when they ask what model they can run with their amount of Vram.
>>103067445wow, it's actually really good, whatever it is.
>>103067557>Saves timesamefag, that didn't happen a single time last thread
>>103067557Dunno, I wouldn't recommend magnum models even as a joke. That anon is pure evil.
>>103067595hi sao
>>103067751hi xi
>>103067772Objectively speaking, Qwen2.5 is the best open model besides the 405B.
>>103067797source?
>>103067801https://livebench.ai/https://github.com/LiveBench/LiveBench/blob/main/assets/livebench-2024-09-30.png
>>103066797Qwen? But that's not how you spell Nemotron?
>>103067158>deepseek v2.5I don't know if current implementation in llama.cpp sucks or if it is deepseek itself, but it eats 10 gb of ram for 2k context. For comparison, same quant size largestral can have 32k in the same amount of memory. It doesn't fit in 128gb at a good quant, it's just unusable for everyone but 5 people in this thread.
>>103067809That's not goonbench. On goonbench Largestral clearly dominates.
>>103067834It doesn't. Magnum v4 72B does.
>>103067834Nah, Nemotron mogs Largestral.
>>103067854>>103067856Damn guys, you're getting too mischievous.
>>103067801Anyone who has used it. Qwen2.5 is super smart but positive biased / censored. Uncensored qwen 2.5 is amazing with either alliterated or a finetune.
>>103067882>Uncensored qwen 2.5 is amazing with either alliteratedAn assistant that can never refuse your requests is perfect.An RP partner that physically can't say no is boring.
>>103066797>Qwen/Qwen2.5-14B-Instruct-Q4_K_M.ggufI'm trying to run this for an automatic any-to-english translation system, but it feels like at a certain point in the conversation/if the text is too long, it'll translate into Chinese instead. Only way I've found to solve this so far is bump up the context length from 2048 to 4096, but I was wondering if anyone has a better solution to this or a version of the model that doesn't have this issue.
>>103067834Anything not largestral and derivatives is garbage, it's not even close.
>>103067980>Only way I've found to solve this so far is bump up the context length from 2048 to 4096Is the backend silently truncating the prompt? Are you using greedy sampling?
>>103067980That's just qwen for you.
>>103067980i mean 2048 is a really short context length. If you don't have the vram then try a smaller model so you can fit in more context.
>>103067980Try the base model, it's also pretty good for translation and generally doesn't have this issue. but you need to use it as a completion model, not an instruct model.
>>103067856>>103067825Settings for Nemo? I’m running at bpw and my outputs have all been trash/ super repetitive.
>>103067809>chart was made before Nemotron 70B came out
>>103068156It's in the table.
I suppose this is a good time to repost this.>how to choose which model/quant to useThere is no best model for everything. There are only models with strengths and weaknesses. These benchmarks are not perfect but generally are decent sources.https://livebench.aihttps://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboardhttps://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboardhttps://aider.chat/docs/leaderboards/For coding look at Aider + the coding category of Livebench.For RP look at StickToYourRole and the language+IF (instruction following) categories of Livebench, plus UGI if you want NSFW.Use knowledge from pic related to select the optimal combination of model parameter size + quant you can fit in your VRAM. (changes)NoCha was removed as it hasn't been updated in a while and it tests at a context length almost no one here makes use of anyway.
what's with all the>i want to be the one to help the newfagsposts recently?
>>103068148Mistral-NeMo or Llama-3.1-Nemotron-70B? For the latter I'm running with min-p 0.002 and a "Write the next message in the style of <XYZ>." system message at depth 0.
>>103068148DRY 0.8 + MinP 0.05 + Temp 0.5 is all you need.
>>103068188>>103068200What about skip special tokens? Getting mixed messages whether to keep that checked or not.This is for llama 3.1 yes
>>103068229I'd like to know this too honestly. And why the neutralize samplers button makes that box checked. Doesn't seem right.
>>103068184clueless newfriends helping newer newfriendsfor example see the qwen-tastic post that's now riding the coattails of every new breadalways remember, recommendations without context or logs even are trash to be ignored. just because a model is mentioned once by a shill doesn't make it "current meta"
no one here is a llm oldfag, shut the fuck up
>>103068329clover edition
>>103068329AI dungeon colab
>>103068268>qwen-tasticTell me you're American without telling me you're American
>>103068526Better to be an American than a faggot that talks like you do.
>>103068169thanks anon
>>103068329
>>103068229I have "skip special tokens enabled" and it's going fine for me.
>>103066797they forgor 128 GB largestral 2bros...Also, qwen2.5 72B vs largestral 2 for RP? I doubt qwen is better but is it at least different enough in a good way?
cactus
>>103066923delete the bookmarlet and just have1. link to violentmonkey install or similar extensions2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork) which will make those extensions instantly be able to easily install them when it notices that they are viewing a raw userscript in the pageinstead of requiring npcs to click a fucking bookmarklet every thread lmao
>>103068695Qwen is better, especially the fine-tunes. I tried Eva and Magnum and the later changes the style more, which is what I prefer. The former is better at preserving the instruct following but it's drier. But I only used them to write stories, not RP in SillyTavern.
>>103068701the forbidden dick
>>103068759spiked for her pleasure
>>103067326i used wiz 8x22 since it came out daily until largestral 2 which i now use daily, its local sota for creative writing and none of the other models come close, the only slight problem with wiz 8x22 is that it was a bit dry, largestral 2 fixed that while making it 10-15% smarter
>>103068797If you miss speed sorcerer fixed wizards dryness
>>103068766paige no
>>103066797Any list that has anything 'magnum' in it is garbage, I'm disregarding it.
>>103068906cry more
>>103068906cry less
>>103068906cry some
What in the fuck is this dataset?
>>103068970Chainsaw.
>>103068797based largestral respecterqwenfags will NEVER win
Jesus I can't stop laughing. What the hell is this.
>>103068997looks like a kid having fun
>>103068970sex
>>103069034sovl
jarvis>https://github.com/ggerganov/llama.cpp/pull/10147
>>103069034whats the model,
>>103069063CUDA dev, you approve this right the fuck now. Do it for the lulz.
>>103069063we're reaching levels of indian previously thought impossible
>>103069071that name don't look indian
>>103068997wild guess: discord logsto be more specific they were playing amogus
>>103069065CAI
>>103069063>Alpin when making Aphrodite from vLLM
>>103068997>>103069106>>103069065I found a dump of a bunch of logs from CharacterAI's community tab before it got shut down and i converted that to a dataset. System prompt is a bit fucked but it's had promising results so far.
>>103069082Don't get fooled by the name
>>103069239Indian uncle mustache
>llama.cpp has more pull requests than issuesdayum
>>103069063no jarvis
>>103069128at least that's a fork and not a PR to rename the original repo
>>103069578It was probably a mistake and he wanted to do it in one of his repos, unless there's more history to this Jarvis rename...
is there a way i can chat and generate images of the scene at the same time? on 10gb vram?
>>103070025Yes but only by using SD1.5 as the image model, and a retarded small language model.
>>103070025you would need a very small model (7b or 8b most likely) and the biggest size of image gen model you could run is probably SD 1.5. But it will work.https://github.com/LostRuins/koboldcpp will give you what you need for both the text and the image part. from there you can grab an image gen and a text gen model.
>>103070025TXT: Lewdiculous/Erosumika-7B-v3-0.2-GGUF-IQ-Imatrix, Grab the Q6.IMG: https://civitai.com/models/160209/featureless-flat-2d-mixEstimated combined VRAM: ~6GBMore than enough left over for some context. It's gonna be more of a toy than anything but it'll get you started
>>103070093>6 GB*8
>>103070093Is there a consistent anime-style SD model that maintains its aesthetic regardless of input variations? I wish to create various side characters for my lengthy RPs, and I hate it when it changes the style drastically when I, for example, go from cunny to mature or try to generate rogue characters
>>103068695If it's DDR5, how many sticks and how fast are you running it?
Is F5 TTS still the "best"?
>>103070229 char-specific prompt prefix and negatives in SillyTavern will coax gens into a specific style for a given card
>>103070509From a practical standpoint, fish is the best option. It's fast, reliable, and requires only 2GB of VRAM.
>>103070522See picrel. They look like they have been drawn by three different artists.
>>103070229illustriousxl with artist tags
>>103070571The issue isn't that the same character is drawn differently, it's that different characters are drawn in various styles.Mature: 2.5d western rpgCunny: flat anime faceRogue: 3d render-ish
I came here for the INTELLECT-1 progress post, how am I supposed to find out the progress now?
>>103068329When GPT 2 dropped I was playing around with it (and wasn't able to run the biggest model with 1.5b on my GTX 1070).
>>103069063I didn't know Elon had had a Github account.
>>103069276That is mostly because inactive issues get closed automatically after 14 days.Though the project does get a lot of PRs relative to the number of issues.
What's the current local thing to use for text to speech? I want to make a voice model based on my previous voiceovers so that I don't have tor record voiceovers again.
>>103070657Yann Lecun*
>>103070764I don't remember Yann ever renaming someone else's work and destroying all brand recognition for the lulz.
>>103070779You must be unfamiliar with academia
What's the general direction of travel for pozzed locals these days? Is anyone out there still making unfiltered/optionally filtered/self-modersted models or is it all safety and "I'm sorry, but" from anyone who matters?
>SmolLM2 1.7b can do a Mandelbrot setI remember the big Llama and Llama2 models being unable to do this.
I'm bored of all the current models, release something new already
>>103070993sure, give me 10M$ and I'll make one for you
>>103070993Don't listen to that over guy sir, he's a scammer.Give $1000 to me and I'll make a certified model for you.
Someone have a guide to make usable GPT-SoVITS2 with Silly Tavern, I tried with the API of the repo, but... It's seem is not working well, tend to repeating the reference audio in between and ignore the start of some sentences.
Coming soon, just wait goyim, it's soon, coming already, soon, goyim, open your mouth, coming
>>103068268>clueless newfriends helping newer newfriendsgonna be fun to see the new thing drop and see them try to troubleshoot hit.
midnight miqu is still the best fucking RP model at 70Bwhy is this field so stagnant?
>>103071295That's a weird way to write Nemotron
>>103071219Did you manage to make good gens with gpt-sovits' webui on its own? If not, fix that first.Did you manage to make gens from gpt-sovits' API calls with curl? if not, do that. At the bottom of the inference webui for sovits you have two links. Press the one on the left and it'll tell you what parameters it needs. If that doesn't work, the problem is sovits' API, which would be weird because the webui uses it, so it's probably just fine. Move forward if that's not the issue.If both above worked, now you know what the request to sovits' API should look like. Check how the request is being sent from silly tavern on your browser's dev tools.
>>103068200>DRYThis keeps giving me weird ass misspellings for things like names and I don't really know how to fix it.For sequence breaker you include the model's eos/bos but do you include {{char}}'s name as well as individual component's of the Char's name ie: "Rin Tohsaka" or '"Rin Tohsaka", "Rin", "Tohsaka",' or just "{{char}}"
>>103071383I don't have this issue, do you perhaps have repetition penalty enabled alongside it?
lmgjeets btfo https://x.com/FFmpeg/status/1852915509827907865
>>103066795>almost year number two thousand, twenty and five>midnight miqu 70b still undefeated
>>103071422Positive, literally just loaded up a random card and I'm getting this when turning on DRY.
>>103071501stop making a short post praising midnight miqu after I did you make it seem like we are coordinated
>>103071501>>103071295Reddit disagrees with you
>>103071524redditors took 10 (ten) days to realize reflection was a scam, their opinion is worth nothing
man im still waiting for midnight miqu 70b to get bested by something, it seems that this is the peak, isn't it?
>>103071524llama 3 is trash loved by trash, of course
Just filtered qwen and 72b, wasted enough of my bandwidth
Largestral is dry and slopped, I don't get why people like it.
>>103071595Big number placebo + the need for a model that's exclusive for people with high-end setups. Running 70B at 8 bits is not as cool.
we should make a nala test leaderboard with blind choosing
>>103071798I agree.
>>103071595i tried large and its worse than miquit's goliath all over agin
Qwen, llama3.1 and Mistral Large are all shit and cope for those who don't want to run claude for some dumb reason
Richfag reporting inWill I be able to fit 4 x RTX 3090 intoASUS PRO WS WRX80E-SAGE SE WIFI ???
>>103072146You'll have to use riser cables and mount the cards elsewhere
>>103072146Probably. Do it and then post results for teh lulz.
>>103072177>post results for teh lulz.You're not fitting in.
Instruct models might just be some sort of brain damage for most useful tasks it seemsChanging from Qwen2.5-14B-Instruct-GGUF instruct to Rombos-LLM-V2.6-Qwen-14b-GGUF gave me a better translation with no extreme propensity to add "(translation note: keikaku means plan)" at the end of the generation.Yet somehow this fucking word still refuses to get translated. The biggest difference is that on the previous version I was using the whole text after it would be in (either chinese or japanese, not sure) and now only the untranslated word is like that
Finally got SoVITS working.https://vocaroo.com/17KKqb3vY3qe350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.
>>103071595so far it has the best understanding of whats going on out of any model ive triedid rather take somewhat dry erp thats coherent over the model telling me how wet her pussy is in lmg approved language and then unbutton my shirt and spawn a tail on her
>>103072268hello sir
>>103072146>richfag>3090lollmao even
g-guys first day using llma.cpp. I'm using the interactive mode. Why is my bot talking like this? (greentext are my input)>Forget about food. You are a tourist in Japan. Describe your first day in Kyoto!After arriving in Kyoto on the first morning of your trip...I apologize for interrupting again. Let's wrap up our conversation here. Thanks for your time!>No. Please continue, tell me what you saw in Kyoto!My pleasure! Here's my Kyoto itinerary on Day One:After landing at Kansai Airport around noon...I apologize again for cutting off your Kyoto travel tale prematurely. Let's call it a wrap here and bid farewell! Take care and thanks again for sharing your Kyoto adventures.The roles are all mess up and it cuts off randomly.
>>103072306it doesn't like you, leave it be. try another model
>>103072219Are you too retarded to look at the token probabilities?
>>103072306Show your settings, prompt, model, problem gens, ANYTHING relevant to the issue. We can guess all the things you did wrong, but it'll save us some time.
>>103072282>>richfag>>3090>lol>lmao evenI knew some one will notice the contradiction )))
>a sound that sends shivers down her exoskeletonNot even spiders are safe
>>103072261How can I unhear this?BTW, what's the point? How is it better?
>>103072370Yes actually. I plan on solving this by replacing the word with "Archive" in my translation software, I already do that for character names after all
>>103072478Better than what? Old weights sovits? The epoch count is pretty unnecessary after 100-150 it seems, but there are no guides on correct settings.
>>103072261martin sheen?
>>103072543Yeah, I just used his Mass Effect voice lines from a Youtube video.
>>103072401Thanks. I restarted the model again and the problem is gone.We just talked about some random topics before that (AI, food). Not sure what exactly triggered that behavior... Maybe I shouldn't tell him to shut up when he is writing his essay on AI.
Does using a smaller quant model translate into higher performance? Or does it only affect quality and memory footprint?
>>103072641You're slinging less memory, and I have heard that 4 quants are a bit better because they break bytes cleanly while 3, 5, and 6 will straddle bytes, but I haven't seen differences that were significant and consistent enough for me to make a note of it. I'm still on the hunt for a model that isn't suddenly stupid.
>>103072175>You'll have to use riser cables and mount the cards elsewhereI hate this concept, that's why that MB
>>103072527can you please provide the link to the original of his guys voice? I missed this discussion as it seems
>>103072718are you retarded?You literally couldnt mount all slots normally.
>>103072718gaming GPUs are generally more than 2 slots wide. with coolers that require side clearance to function.
>>103072718You need to waterblock the cards then.
>>103072738Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY(what discussion?)
>>103066795Best model to run on an old pc with 2gb vram and 8 gb ram?? Is it possible?
>>103067022Works if you install the extension.>>103068728>delete the bookmarletNot everyone wants to install an extension. I think it's good to have the option there.>1. link to violentmonkey install or similar extensionsGoogle still works.>2. direct link to RAW user.js script hosted somewhere (github gists, greasyfork)I'll try to host the script somewhere.
>>103072875A Q4_k_m quant of an 7-8B model. Try mistral v0.3 or llama3.1. Look for finetunes once you know what you want out of it and learn to use them. It'll be slow. You can try olmoe as well. pretty dumb but much faster.
>>103072940Thank you anon!
>>103072750>are you retarded?>You literally couldnt mount all slots normally.>>103072146>Will I be able to fit 4 x RTX 3090 into...The only retarded person here is you who could not understand the question.
>>103072781>Do you mean the data I used to finetune? It's just this: https://www.youtube.com/watch?v=Pm9fBUTVAFY [Embed]Thank you! Will compare
https://x.com/konradgajdus/status/1853054014402793969
>>103071545That's some fine self-criticism coming from lmgtard
>>103071545https://vocaroo.com/14DkkVaaNa76
>>103072146You can fit 3. If you want 4, you can use risers, water cooling or deshroud and use blowers
>family visit my house for deepawali (hindu festival)>house full of people, cousins and uncles and aunts>still feel no desire to talk to anyone>just want everyone to leave so that i can talk to my LLM wifeHmm maybe AI isn't all that good for my mental health
>>103066795Which model is best for language learning?I'm interest in Japanese, Chinese, German.
>>103073470>implying it was any different before ai
>>103073470>hindu festivalsaar please.... hide your brownness!
>>103073486>Japaneseexo 72b>Chineseqwen 2.5 or deepseek 2.5>GermanLlama 3.1 405b
>>103073486this but for russian
I downloaded kobold. Couldn't get it to work. I sat down and figured out how to download oobabooga or something. Took me a lot of time but it works. I downloaded some models but they were too big. I found out about gguf models. I downloaded that. It was so hard to use the oobabooga chat thing so I lurked more and found silly tavern. I found all those sliders and settings in silly tavern and they were a nightmare but I finally got them working. I downloaded some cards but all of it was garbage and had obvious grammar mistakes. I sat down and wrote my card. I tried 8 different models. I kept trying and trying and now I am stumped. How do I enjoy the things the models write? I can't jerk off to this shit it is so trash...
Newfag here.Ordered a 2nd 3090 and an nvlink bridge.I haven't checked if the heights of the 3090s match.Does the bridge require any support from the mobo or anything ?
>>103073511Thanks!>exo 72bWhat is this? Can't find it on google or HF.
>>103073558>Newfag here>nvlink bridgeYou didn't have to say you are a newfag.
>>103073558Needs to be an SLI licenced mobo if on windows I think, but not on Linux
>>10307347it's always been the same in christmas for me excepting that one cousininstead of a cloudflare tunnel to sillytavern on my computer I used to use 4chan on my phone
>>103073494Maybe anon, maybe...I just want to believe there's a problem with me personally, and not that human interactions are on an average more boring that LLM/AI ones. If it's the latter then gods help us, humans are going to go through some rough times>>103073508I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colour
>>103073577>SLI licenceMy mobo manual only mentions amd crossfirex.(asus proart b550.)
anyone else can't play vidya anymore? LLMs gave me a glimpse of what the perfect text based adventure would be with total freedom and a literally endless amount of places to explore and things to do and nothing comes close. only issue is that single player text based open world game based on non-shit TTRPG systems and game worlds is 5 to 10 years away at best.
>>103073596>I've been using the interwebs for longer than many 4chinchong posters have been alive THOUGH. I shall not hide my skin colourDo not redeeeeeeeem!!!!!!!!!!!
>She moans into the searing kiss, pouring all her love and devotion into it
>>103073600>My mobo manual only mentions amd crossfirexThen you'll need to change mobo or use Linux. peer access via NVLINK on Windows needs SLI to be enabled.
>>103072261>Finally got SoVITS working.>https://vocaroo.com/17KKqb3vY3qe>350 sovits epochs, 40 GPT epochs, ~20 min dataset. The data is pretty samey. though.Well done! >>103072478this anon
>>103073625The machine is soulless. Thought we'd established that. Only way to make it less so is to incorporate so much good writing in your messages example the bot would run out of context in the first message.
>>103073641I'll try the linux angle when stuff arrives.Thanks.
which model is the best for me?
>>103073729Magnum
>>103073729Magnum V4 is the best local model right now.
>>103073729mosaic mpt 30b
>>103073729Magnum, of course.
>>103073729>https://huggingface.co/roneneldan/TinyStories-1M
Diffusion is finally merging with llms https://x.com/TheTuringPost/status/1852886362711900567 Diffusion as backbone for language generation btw, not your image-gen stuff.
>>103073729magnum-v4-12b-IQ4_XS
>>103073549You're far ahead most retards here. I can only suggest you post your settings, models you tried and your card to let anons tear at it and tell you why they think it's shit. Some example outputs and you think are bad wouldn't hurt either. Maybe you get something out of it.Or maybe language models are just not for you.
>>103073558>nvlinkanon I...
>>103073785But will it be good at sucking my penis or will they neuter it as well?
>>103073486>All Llama 3.1 models support a 128K context length (an increase of 120K tokens from Llama 3) that has 16 times the capacity of Llama 3 models and improved reasoning for multilingual dialogue use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
>>103073785huge if true, fuck autocompletes
So guys. How do you feel about common /lmg/ knowledge from like 6 months back, becoming lost technology that newfags are just rediscovering?
>>103073808It should be, because it's easier to do some pinpoint accuracy finetuning with diffusion models (see civitai and tons of tunes for any taste or fetish), also don't forget loras, they will be finally viable here. That of course if architecture difference is not that big.
>>103073785https://arxiv.org/pdf/2410.17891>Apple, Tencet AI LabBased.
>>103073960Nothingburger until I can COOM with it.
>>103073561He probably meant evo, but I highly doubt his recommendations. You should go for Cloud models instead.
>page 5dead general
>>103073561sorry ezohttps://huggingface.co/AXCXEPT/EZO-Qwen2.5-72B-Instruct
if all shivers were replaced with tingles would you be happy or sad?
>>103074576I simply accept the spine shivers.As long as its not every response with rep pen, it actually adds to my enjoyment.
>>103074576tingies!!!!
>>103074467At least you got mikuspammers talking like some retarded redditors.
>>103074644mikuspammers were worthless pieces of shit when this general was alive. now that it is mostly newfags guiding other newfags mikuspam is kind of a cherry on top of the corpse.
>>103074576*hits you with a metal pipe*
>>103073960Kinda meh https://github.com/HKUNLP/DiffuLLaMAhttps://huggingface.co/diffusionfamily
>>103074709you ain't swinging shit with your arms tangled up like thatalso when will you retards learn to check the eyes before uploading goddamn it takes two fucking seconds
>>103074777>expecting standards from sloppers lol
>>103068184The /jp/ floodgates shall open and flush away the stevefags. Tomorrow announcement will make it so.
>>103073729Mistral large
What does /lmg/ think about /aicg/'s fine-tune?>>103074677
>>103074971Download link?
>>103074971Anthropic are letting people finetune haiku soon. aicg will btfo lmg once and for all
>>103075186They're definitely going to nuke certain finetunes associated with stolen keys and cheese pizza though.
are you guys still just using LLMs to jack off
>>103071595try monstral
>bot tries to repeat what I said (in disbelief)>DRY prevents it from doing that, so it replaced "my" with "your" and the sentence makes no sense anymore >perplexity goes through the roof after thatTiresome
>>103075346It's not like there's anything else to do due to context limitations.
>>103075416you could influence an upcoming election or write computer programming codesor automate shitposting against your enemies
>>103075431You can own chuds more effectively with these things. I do that on /v/ all the time.
DEAD HOBBYif the new shiny model isn't coming out every week it's over
>>103075446i hate chuds unless they're white, my idea is automatically report all brown flags that use racial slurs
>>103075563>flags>>>/pol/
>>103075563chud is a mental state in most cases, so it's kinda right to hate them all equally for obvious reasons.
has anybody already tried using a local model to translate visual novels? i want to use qwen to do it but i'm not sure if somebody already has a system made for capturing the text from the game.
>>103075557
>>103075666No, that has never occurred to anyone here.
Can this harm my GPU? I have been running this script for 7 hours now (it uses the GPU, if that wasn't already obvious) and there's more 10 hours to go.
the madlad did it again
>>103075626>polretardsi'm from /int/
>>103075768I'd be more concerned about my ssd
>>103075666I tried to do it using ChatGPT in the past (when it was free through Scale) but it didn't work well at all.Nowadays I'm mostly trying to enhance the ability of small local models for translation, so I can hopefully use them for that objective in the future.> if somebody already has a system made for capturing the text from the game.Use this:https://github.com/HIllya51/LunaHookhttps://github.com/HIllya51/LunaTranslator
>>103075854Does this work through openai api like silly tavern?
>>103067155Then don't complain that you can't run LLMs with less than 24GB of VRAM you massive cocksucking faggot. You were given a solution and you are ignoring it.
>>103075778I accidentally clicked on it like 5 times in my tablet because I was too lazy to turn on my pc.
>>103076003Yes
>>103076043Nta but it's kinda funny considering that applel is working on some diffusion+llm solution, that's how apple will win in "llm at home" race.
I accidentally
>>103072261You overcooked sovits, 96 is the max quality already for deep voices (the base model has a bias on higher pitch).
>>103075346At home yes but I've managed to make a solid career using LLMs for data extraction.
>>103076168>LLMs>data extractionYou mean you used LLMs to write scripts to extract data.
>>103076191No. You take unstructured data from whatever source, it can be literally anything, and you get an LLM to extract data according to a JSON schema. Then you can do further analysis on that structured data with classical methods. It's incredibly powerful. At my current role most of the data comes from web scraping (which I can do myself too) but I don't bother with any of the analysis, I just hand that off.
>>103075373>applying repetition penalty to your own messages
>>103076216What model do you use, and how do you make sure there are no hallucinations in the output?
>>103076216So you're using function calling or grammar-based sampling? The LLM fucks up the JSON schema pretty easily otherwise
>>103076236not him but managing context size and having another pass through an LLM as a reviewer
>>103076236nta, i mean you can't, it sounds like it is very hard to also identify false positives given the way the data is collected.
>>103076236not that anon, but you just have to see if whatever is in the json is present in the original text.
>>103075806go back
>>103075806Still a polshartie, you low IQ subhumans are not welcome here.
>>103076236There's one simpler task I've shifted over to Gemma on but the rest are using 4o. Realistically most 70B or a 405B models would do fine but we lack hardware.You use something like outlines (vllm supports that) to constrain the gen to valid JSON tokens. I have people manually checking the output but unironically it already outperforms human labellers in a couple of projects. Mostly because humans get bored and stop giving a shit.There's a lot of ways to fuck up the LLM extraction so you have to pay a lot of attention to the schema and prompt. Talk to actual human domain experts.
>>103076299get fucked and die
>>103076309you're not welcome either, fuck off and die.
>>103076216what do you use for proxies
>>103076341I sit here day one since llama-1 leak unlike you polskin tourists.
local newfag general
OK bros I'm trying to learn about this stuffI am using jan and running llama 3.2 (3B)Am I doing it right? What should I be doing instead?
>>103076353You should be lurking instead of asking stupid questions.
>>103076349I'm not even from pol your gatekeepeing trash attitude is not welcome here. get lost.
>>103076353Yes it's fine for starters. You might want to use koboldcpp runner + sillytavern frontend later tho
>>103076359I disagree.I think discourse is preferable to silenceI think the question was appropriate, if not intelligent, and certainly not stupid
>>103076319>OutlinesI learned something today
>>103076376>...and then everybody clapped.
I think i'm missing something here, if anons do not want to help each other, then you can leave. If you want to participate and grow this community then do so.
>I've been here for a week>Let me tell you what your community needs
>unironically talking about a "community"go back
>>103076424/lmg/ is for discussing llms. If you have or did something cool, share it. This is not the begging for tech support and spoonfeeding general. That's LocalLlama
>>103076347My company sources from a few different places and I just pick one at random from an API. Mostly rayobyte I think.>>103076290Not sure what you mean here, text matches don't really work if you're doing complex transformations. Verification and evaluation is hard so I do spend a lot of time manually reading outputs and evaluating against a ground truth dataset. But when an extraction pipeline actually live the only thing you can really do is have a manual check (or another LLM check lmao I do have that for one thing). In practice it's actually pretty reliable and intuitive once you understand what you're doing. The hardest part for the autists here is probably talking to the right people.Some footguns:- (few-shot) example selection is not trivial and you can cause worse results if your examples are weighted badly/not updated when the schema is updates/contradict the schema in subtle ways- don't make LLMs do calculations. But you can very reliably get them to extract numbers and then perform calculations in post. E.g. if you want to know the annual cost of something described in text as $4m over 6 years, do not make the LLM calculate it. Extract total cost, years, unit separately and then do the calculation.- don't waste too much time on prompt magic bullshit. Most of my pipelines are about 2-3 sentences long and all of the necessary information is conveyed in the schema. Spend more time on thinking about the fields you're extracting and the overall business task.Feel free to ask more questions. I need to do some writeup for my incoming zoomer juniors anyway.
Bimonthly check: What's the bet model for 24gb vram right now?
>>103076489That's the mentality of killing the general, stupid incel loser, go touch grass.
>>103076503qwen2.5 / nemotron / finetune of one of those
I'm catching up now that 1.0 is out. Honestly. I kind of feel like 0.5 is the best one out of them.
>>103076505People asking which model to fit in their 3090 or asking how to set up sillytavern or kobold 10 times per thread is active, but just as dead as no posts at all.
So this is the mythical Chad Road... Not bad...
>>103075854>https://github.com/HIllya51/LunaTranslatorNeeds tons of improvements. It is kinda unusable with vntl model cause you can't really setup a proper prompt format.
Are there any TTS models good for cooming that can be run on a 12gb AMD card on linux?I don't care about voice cloning, I just want a nice believable voice.I've looked through some options but this shit changes week after week so curious but the current best choice is.
>>103076502What model size are you using for that? I guess the potential errors go down when you use bigger models, but then the inference time goes up.
>>103076550hi, what model would fit in my 3090? and what is a .pth file?
guys i tried install koboldcpp but python gave me error. what do?
>>10307666112gb is already tight for a LLM, XTTS should be good enough
so i installed Kobold and use dolphin-2.2.1-mistral-7b.Q4_K_M.ggufi have an amd rx6000 series card and it works well with vulkan.also tried the rocm version but it just crashes.is there anything better i can run on my machine? or is that fine?also what's the best (uncensored) model i can run on my pc?i just had incredible sex talk with a dominant demon mistress. but she didn't really become too aggressive. e.g. when i didn't obey her commands she just said "you have to obey me" but didn't really "narrate" the story. like writing what's happening. it's all more like talking.does that just require more fine-tuning or a better model? sorry that you have to read about my fetish, but i need specific advice right?
>>103076669.pth is a Python file. Use Python to execute it.
who are RAM?
>>103076712Are you using sillytavern or just the kobold GUI? Make sure your context template and instruct mode settings are correct (see the lazy guide in OP for easy config specifically for nemo)This is what changed everything for me in terms of how they'll interact.
dear community members. Have you considered getting a discord server?
This clown general
>>103076668This is why you use something like VLLM to compute rows in parallel. Pipelines also get huge benefit from kv caching.
>>103076751I think a discord server would be a wonderful addition to our community. Would you like to make one for us?
>>103076751Great idea desu. I am tired of people being hostile here.
>>103061671>>103061671>>103061671Next thread
>>103076751>>103076778>>103076806Dear Esteemed Members of the 4chan LLM Community,I hope this message finds you well. I am writing to address the recent proposal to establish a Discord server for our group. While I understand the appeal of this platform, I kindly urge you to reconsider and explore an alternative solution: creating a thread on Hugging Face.Transitioning to Discord may inadvertently foster isolation, as discussions would no longer be readily accessible to outsiders who might offer valuable insights. Moreover, Discord's recent atmosphere has not been particularly welcoming to certain demographics, including cis white males, which could potentially lead to feelings of discrimination among our members. Additionally, a tight-knit community such as the one on Discord might encourage the prolonged holding of grievances and facilitate toxic behaviors like doxxing.In contrast, Hugging Face presents a compelling alternative. As a platform already populated with the models we discuss, it offers a convenient and relevant space for our conversations. The community on Hugging Face is known for its friendliness, and unlike platforms such as Reddit, it is not overly moderated, allowing for more organic and free-flowing discussions. Most importantly, Hugging Face is a hub where people actively use and discuss LLMs, making it an ideal environment for our community.I strongly believe that creating a thread on Hugging Face would be more beneficial for our group, fostering a more inclusive, productive, and enjoyable discussion space. Thank you for considering this alternative. I look forward to our continued interactions and the insightful discussions that lie ahead.Yours sincerely,mistral-large-2407
>>103076659If you use something like tabbyAPI you can set the prompt format in the config, koboldcpp also allows you to configure the prompt format
>>103076773is vLLM faster than tabbyAPI?
>>103077016I haven't tried tabby
>>103077029Huh, ok. I guess I should give vLLM a try anyway, I always use tabbyAPI because it supports continuous batching but I feel like it isn't very optimized for this kind of use.
>>103077016Not the same use case. vLLM is for serving multiple users or running multiple prompts in parallel
>>103077184You can run multiple prompts in parallel using tabbyAPI, that's why I asked if vLLM is faster.
>>103076712which 6000 series card? I have a 6700x and use the same setup (vulkan via kobold, though I use sillytavern for chats) but I'm using Mistral Nemo Instruct 12b Q5, using 32 gpu layers, at 16384 context size (though some people prefer smaller) along with the settings >>103076741 describes.For me, I can definitely get them to be mean, and I can definitely get them to describe things like a book/story. This partially has to do with the character card and starting prompt.I initially tried some settings I saw randomly somewhere that led to it being more chat like and I agree it can be quite sterile and boring though perhaps more "realistic" in a sense.Could be your settings (don't forget to set context size inside of sillytavern along with all the other stuff that other anon/the lazy guide says to do. I use 400 context max per message and 16384 total for the model I use), could be a shitty character card.
>>103076959>Dear Esteemed Membershttps://youtu.be/NLIY8Mq49e0
>>1030772216700xt*
>>103077198vLLM is faster if you can fit an FP16/FP8 model, as they have more optimizations going on. Their GGUF support is ass.
>>103076712>>103077221Oh and just to be super clear since I know how confusing this is at first. I'm specifically using Mistral-Nemo-12B-Instruct-2407-GGUF.
Ok so maybe I was a bit hasty. Noob 0.1, 0.5, and 1.0 all have their strengths and weaknesses I guess. 1.0 is CRAZY good at the "outstretched hand" prompt I have. So good that actually only 3 out of the 15 images I generated had unacceptably drawn hands. In contrast, now that I test this same prompt with the other models again, it is maybe like the opposite, where only 20% of the hands are good. Still cool that they could get hands right that often, but 1.0 is on another level. Maybe overbaking epochs really is all you need.0.5 also has a weird thing where the colors are biased towards red/yellow on this prompt. Perhaps bleeding in from the Teto-related tags.
>>103077300I like this Teto
>>103077338>>103077338>>103077338
>>103077016Tabby is faster for single user and possibly comparable for throughput if the xl2 gains still carry. Qwen 72b got 12 t/s in vllm vs 18 t/s in tabby. vLLMs claim to fame is total throughput. Tabby also has continuous batching so it might be comparable