/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101069457 & >>101058366►News>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101069457--Papers: >>101080491 >>101080617 >>101080830--Updating LLaMA CPP Python Repository: To Update or Reinstall?: >>101077391 >>101079473--Logs: Sonnet 3.5 Surpasses Claude Opus in Moral Lecturing Correction and Akinator Tests: >>101079178 >>101079201 >>101079365 >>101080108--Wiz8x22's Partial Uncucking Control Vector: A Step Towards Freedom of Speech: >>101074772 >>101074969 >>101075055 >>101075630--New Hermes Model Released: Hermes 2 Theta 70B: >>101073994 >>101074011 >>101074046 >>101074098 >>101074185--Mikubox Conversion and Command-R-+ Performance Testing: >>101073744 >>101073885 >>101073982 >>101074474 >>101077433 >>101077777--Improving AI Model Coherence with Rules Blocks: >>101069688 >>101069906 >>101069985--Good OCR Models for Manga Translation: >>101079755 >>101080120 >>101080254 >>101080525 >>101080485 >>101080531--DeepSeek-Coder-V2-Instruct Template for ERP: >>101071137 >>101071250--Control Vectors for Retards: A Guide to Using Them Correctly: >>101078337 >>101078357 >>101079111--Convenient Dropdown System for RP Clothes Selection and Scene Settings: >>101071966 >>101072080 >>101072151--LiveBench: Comparing AI Models Across Performance Metrics: >>101074074 >>101074158 >>101074190 >>101074212 >>101074230 >>101074249 >>101074817 >>101075880--Logs: DeepSeek-Coder-V2-Instruct Q4_K_S Nala Test Performance Discussion: >>101070306 >>101070845--Anthropic's Claude 3.5 Sonnet: A New Contender in AI Model Performance: >>101069634 >>101070454 >>101072805 >>101072635 >>101072714 >>101072728 >>101075129--Anon Questions Llama 3's Alleged NSFW Filtering: >>101070409 >>101070433 >>101070523 >>101070826 >>101075916--3.5 Sonnet: The New King of AI Models?: >>101072633 >>101078131 >>101078562 >>101072693 >>101072748 >>101075267 >>101079435 >>101076701 >>101076784--Miku (free space): >>101070155 >>101070209 >>101072366 >>101072652 >>101075119►Recent Highlight Posts from the Previous Thread: >>101069467
Hermes worth it? Teknium said he wouldn't put it on lmsys arena because it's too much work, makes me think I can ignore the release.
>>101082048>Teknium said he wouldn't put it on lmsys arena because it's too much work,That's a codename for "my modei is shit lol"
>>101082048>keknium
>>101082048>beat llama3 on all benchmarks!!!>nah whatever it's not worth ityeah sure
>>101082048nah i'd win
>Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU!>The strongest open source LLM model Llama3 has been released, some followers have asked if AirLLM can support running Llama3 70B locally with 4GB of VRAM. The answer is YES. Here we go.>Please note: it’s not designed for real-time interactive scenarios like chatting, more suitable for data processing and other offline asynchronous scenarios.https://ai.gopubby.com/run-the-strongest-open-source-llm-model-llama3-70b-with-just-a-single-4gb-gpu-7e0ea2ad8ba2
I forgot about x.ai, you have more hope for them or Meta? How censored is Grok?
>>101082164olmao
>>101082048>Hermes 2 Θ uses ChatML as the prompt format>Hermes-2 Θ is a merged and then further RLHF'ed version our excellent Hermes 2 Pro model and Meta's Llama-3>One tip though, because of the merge, add <|eot_id|> to your stop tokens in LMStudio and GGUF inference engines, it sometimes outputs this token as an artifact of llama-3 instruct.DOA. If they knew they were going to merge it with Instruct, why use a different prompt format? They won't put it up on arena because it looks good on benchmarks but performs worse, especially in prompt adherence.
local needs to focus on waifus and coom. And someone can make a plugin/frontend/whatever that will augment your local waifu chat with messages from APIs, who will receive an obfuscated version of your chat with only the relevant SFW parts in it.
>>101082246Lol wtf. What a joke.
>>101082164Generate your will for your grandchildren by running one layer at a time!
>>101082238lmao
So anyone figure out if S quants are really better than M and L quants?
>>101082287Just send the prompt to Rajeesh on Fiverr, this little trick will augment your local waifu for sure sir.
>>101082238No hope at all. Musk is a cuck that shills for regulations. Grok is a shit model. https://huggingface.co/xai-org/grok-1 try it if you can run it.
>>101082346ks is better than base m but worse than km
>>101082368>base mThat exists? I haven't seen any before.
>>1010823611.5 seems much better though, and aren't they going all in now, buying as many Nvidia GPUs as possible?
>>101082164I wonder what's the slowest t/k one could get Nemotron 340B at FP16 using HDD swapping would get around 0.0004 tk/s
>>101082376nobody uses it any more because it's shit
>Three days and chameleon still isn't supported in llama.cppIt's over.
>>101082389Put a 56k modem over a dodgy line in the middle and nfs or sshfs.
>>101082246Realistically, the guy is a retard crypto bro. But you could make the point that choosing a different prompt format allows you to circumvent the alignment of the standard format.
>>101082358bhai thank you brother, stay strong and good day sir
What are the best finetunes currently? Don't care which ~70b base model, as long as it's good. Also someone revive wizard please, the only open ones actually capable of finetuning
>check turbocat's hf page>new model he uploaded for someone>https://huggingface.co/turboderp/llama3-turbcat-instruct-8b>see the imagesIs this what happens if you ingest too much rp data?
>>101082832>chinese supportThe chinese have bought out exllama2. It's over.
3.5 Sonnet is too good anons, I think they might have done some witchcraft. Maybe it's related to that paper about understanding the model features?
>>101082832Sounds like there will be a Qwen 72b version that is supposedly better than the old 70b version, even in English, I'll try it, hope it won't answer in Chinese from time to time
Me trying to find the perfect quant+inference server combo:>EXL2, best performance if enough VRAM, flexibility for any bpw-TabbyAPI: up-to-date with exllamav2, batching, but gives me errors in some models about Chat endpoint not having a prompt template? (skill issue?). Support for Q4, Q6 and Q8 cache.-Textgenwebui: Always works, errors about prompts or anything. Slow updates. No batching. Support for Q4 cache-Aphrodite: No support for Q4 cache. Batching. Slower updates.
>Gigabyte MZ73-LM0 (rev. 2.0)With 2x AMD EPYC Bergamo SP5 ZEN4 9754 CPU Processor>US $9,750.00Any precautions I should take when ordering one of these?
>>101083080Not ordering itJust get a normal mining rig
>>101082958>TabbyAPIThat error just means there isn't a template for chat completions in the template folder; models come with a default template in their tokenizer config.It's irrelevant if you're using text completions. If you're using chat completions, you can ignore it most of the time, though some mistral models and CR/CR+ will throw an error if you try to send a system prompt using their default template.
I don't understand why people buy the hotz tinyboxes, what's hard about inserting a few gpus into a Mainboard?
>>101083154The mainboard shitting itself for whatever reason.The same reason people buy apple shit, because you don't need to think about it too much.
>>101083110I don't have enough power for a mining rig.
From my experience, for 24gb vram, mixtral performs better in 3.5bpw exl2 than 70b models in 2.25bpwPlus with mixtral, you can run Q5 GGUF with a decent speed (about 8t/s).I think mixtral still best for quality/speed margin (for 24gb vram).Whens updated mixtral though... @MistralAI
>>101083121Using text completions directly to the tabbyAPI from ST works great. I find the problem when using other software like Jan or Openwebui that uses the chat completions api I believe. I had this problem with Codestral recently. I think with llama3 it worked fine. Anything I can do to fix it?
>>101083202>Plus with mixtral, you can run Q5 GGUF with a decent speed (about 8t/s).That's what makes the most sense to me. Nab a GGUF with at least 5bpw that you can offload about 80% to vram with full context and go wild.You most likely get over the magical 5t/s margin, get lots of context, and get what's probably the best quality to speed ratio you can with the hardware you got.
>>101083208Set chat_template inside the tokenizer_config.json.Needs to be Jinja2.See https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json
>>101083208Just checked the Codestral config, it doesn't include a template. According to the model card it uses the usual Mistral one, so grab the Mixtral template here:https://github.com/theroyallab/llm-prompt-templatesPut it in your templates folder in TabbyAPI, and change your config so it loads that.Alternatively, use the gradio loader for TabbyAPI, it has an option to choose a template:https://github.com/theroyallab/tabbyAPI-gradio-loader
>>101082832Hey, that one can reply properly to my Game Master card, in fact the reply structure is pretty much the same as Stheno's.That Spellbound one some dude posted a couple of threads ago wouldn't reply to my query and instead just start narrating an adventure on its own for whatever reason, so that's dope.I should try L3 8b instruct again to see if it'll just EOS immediately like it did when I tried last time, maybe something changed in the meantime that will make it behave differently.
graaah i just wanna run chameleon
>>1010834212 more open llama.cpp issues
Is the near future of ai training frontier on specialized data, will that make specialized separate models? Does training on specialized data now that most of the internet is used, make them generally better at all types of tasks? serious reply
>>101082832/lmg/ memed about character fidelity a while ago. I remember discussions about asking medieval characters programming questions
>>101083355I will wait for the Qwen2 72b version and see how it is. The old cat version outslopped GPT4, which is an achievement on its own, but really unusable for me.
>>101083498yeahthis is actually important for streamed STT setups, where random unintended things can be "heard" by AI, and it shouldn't attempt to respond to everything as an assistant.>>101083535you mean cat llama3 70b?
>>101083328thank you so much. First time /lmg/ has been helpful
>>101083559yes, while it did stick to the system prompt, the journey it had to shiver my spines in a safe way was something.And when you try something grimdark and everything end in a happy ending you get insane pretty fast.
>>101083080>Any precautions I should take when ordering one of these?It's a server. So it's not as plug-and-play as a desktop. Like you'll probably find yourself diving into the UEFI console and manually configuring the system drive. Also you can't be a wintoddler because AFAIK Epyc windows compatibility is rather spotty. They're made almost exclusively for linux applications.
what model are you retards using for RP now? Would be good if it had some general knowledge. I'm trying dolphin mistral 7B but it's only good at being racist. It doesn't want to be a cute girl
>>101083762If you're running that shit you should probably be running pooopy purpose 8b instead.
>>101083328How can I use the parallel batching in TabbyAPI? I have 4x3090's but if I send a querty from example from silly tavern and other UI, they get in a queue, they don't work in parallel. The tabbyapi readme says it supports parallel batching via paged attention, but how to enable it?
>>101083771is it bad?
I finally upgraded to 96GB VRAM on a 128GB RAM machine and will be stuck at this amount for the foreseeable future. Are there any other SOTA models I can now run for roleplay/choose your own adventure sort of quants besides CR+ and Wizard?
>>101083762use stheno 3.2>>101083844yes
>>101083892No. We peaked with CR+
>>101083787No clue, never used it.After checking the config, setting the cache size to be a multiple of max_seq_len should work.So, 2 * max_seq_len for two clients at once. Batch size will be automatically adjusted.
>>101083668That's fine, I have an old poweredge and an old fujitsu server so it probably won't be as horrible as getting Windows working properly on those. I'm mostly worried about buyer protection and shit.
>>101083954I haven't checked this thread for a long while, I remember when pickles were still a thing. What's this model-00001-of-00004 mess and can I easily load it to kobold or do I need to convert this somehow?
>>101084213ok, found it myself. Checkpoint shards. No idea how to convert though. Will try to load into kobold somehow
>>101084213>>101084232Do you mean koboldcpp?If so, look for the gguf version on huggingface.
>>101084252yup. Found it, thanks
>>101083080>EPYCI assume you've seen https://rentry.org/miqumaxxYou have questions beyond that?
>>101084117TIL the cache size the is amount of total data you can process and it depends on the max sequence lenght (that's the context size right?).That worked, i can inference 2 at the same time.What does batch size have to do i nall of this? I think it defaults to 2048?
>>101084117how does one calculate how many clients can be inferencing at the same time?
hmmm, not sure what to make of this.sonnet 3.5 actually gave me working code after just throwing the poe documenatation and my token at it.i have a working werkzeug python server that uses the openai format so i can connect silly to my poe python server. thats pretty insane.it just got it. after making a working prototype >make this as an openai format compatible api. >add /v1/internal/model/infoi'll never use cloud models for actual RP but that is sick. didnt see that coming.actually fixes errors instead running loops.
>>101084201>I'm mostly worried about buyer protection and shit.I bought mine from an ebay seller in China and got support from both the seller and Gigabyte on the MB. It was new in the box and Gigabyte offered to RMA it for me even.Biggest thing is to make sure you populate all memory channels
>>101084483yeah sonnet 3.5 is a big step up in making these models actually helpful. I was very impressed with it playing around with some basic stuff yesterday. I wish they gave any sort of insight into what they did with it because it's a huge step up from OG sonnet in terms of consistency and usefulnessI think it's neutered for ERP though, way harder to get it to be explicit in the way that older claudes would be, even extensively prefilled and jailbroken
>>101084530yes, having the word explicit in the instruction is enough to make it refuse.>>101079178same with "girl" in a simple prompt. i rerolled multiple times.shame but its actually really good.
>>101084604damn, wrong post.meant this one:>>101080108
>>101083762I use Stheno 3.2 8B, it's probably not the best since it's 8B but more than enough for me desu even compared to GPT-4o API.
>>101084306By default, cache size = context size. So batch size is 1 by default. I don't think you need to set batch size manually, just setting cache size properly should be enough.>>101084387Use a VRAM calculator (like the one in the OP), substitute context size for what the cache size would have to be.So 8k context for 4 clients = 32k cache. But you would need enough VRAM for 32k context.
>>101083762I'll vote for >>101084700 too.At face value, >>101082832 seems to be quite decent too.iterative-DPO can be good depending on your tastes, so give that a go also.If you want bigger models, mixtral 8x7b limarp zloss, commandR and maybe Qwen2 57B A14.For going larger than that, miqu 70B seems like a safe bet from the opinions I've read.
>>101083421What for?
>>101083535Won't be very different with a different base model, though I like llama cat
I came up with a new impossible challenge for every model I have access to: I pasted in the title of every anime series I have on my fileserver and asked it to list every character from those series that has a name that starts with a given letter. Every single one will start listing characters from series that aren't on the list, even when instructed not to upon penalty of death. Some of them even ignore the first-letter instruction and just start returning popular characters from random well-known series.
>>101084750i find unquantized (BF16 or FP32) stheno to be SOTA for RP over mixtrals or smol CR, to beat it you need to go to at least 70b at Q5 minimum, and you can swipe 10 times with stheno by the time it takes to gen one response with 70b (which might still need swiping).trying out turbcat now, first impression is yet another slop assistant
what happened to copilot?
>>101084996uoh thighs
>>101084300I have some questions about memory compatibility. The motherboard listed (it seems to be the only dual-socket EPYC motherboard I can find on Ebay as well) lists DDR5 RDIMM up to 96 GB and "3DS RDIMM" up to 256 GB. Are the larger modules listed on memory.net the 3DS ones?>>101084505>got support from both the seller and Gigabyte on the MB>Gigabyte offered to RMA itWas there an issue with it or just that the memory channels weren't all populated?
>>101085003Have you tried L3-SthenoMaidBlackroot-8B-V1 orL3-Umbral-Mind-RP-v1.0-8B?I read they fix some of the problems stheno has
>>101085089i don't believe in meme merges
>>101085073>I have some questions about memory compatibility.Use the QVL for the motherboard and be anal about exact part numbers and you should be fine.>was there an issue with itI had issues at first with the onboard NICs. I didn't need to RMA in the end
>>101085017what should have happened?it's just a ChatGPT frontend after all
What's the meaning of life?
>>101085176You don't use any model merges? Some of the best models are merges, like Fish for mixtral-8x7b and RP-merge for Yi-34b
>>101085248And RP-Stew
>>101085195>QVLWell shit 96GB isn't even on there. The M321RAGA0B20-CWKBH 128GB module isn't on memory.net either.
>>101085453Tracking down memory I was confident in was the hardest part of my build, too.Be careful with any seller that they verify the part number or every stick is precisely the same. I found eBay sellers would not do this as they would substitute whatever had similar specs or a "close" part number.Try to email the memory.net guy for a discount and probably some help tracking down the QVL parts?
>>101082346We've had this drama a few times in the last few days.Foremost, this isn't a "better" question. This is specifically a question of if K_S being more factual versus K_M being more hallucinatory. Our suspicion is that K_M gets better metric scores but overlooks details in favor of generalizations because of how it is implemented.There is no verdict on K_S over K_M because only a few people have tried it. iirc, S-Anon ran many quants of one model, WizardLM 2 8x22B, at Temp 0 and found the following, paraphrase:>Q8>BF16>Q6_K>Q4_K_S>Q5_K_S>Q3_K_S>Q2_K_S>Q4_K_M>Everything elseI don't remember S-Anon mentioning trying any L quants. Inspired by S-Anon's post, I started testing S and M quants of many models I had handy against a simple music theory question that many models were fucking up because it regards something that breaks the usual pattern so if it reasons by analogy instead of by training data information, it fumbles.The first to get my question's answer right was Smaug-Llama-3-70B-Instruct-Q5_K_S. The other four I've seen pass once out of one try are,>llama3-70b-instruct-q6_K>qwen2-72b-instruct-q4_k_s>DeepSeek-Coder-V2-Instruct.i1-IQ3_XXS>phi3-14b-q4_0And S is not a silver bullet. Notable failures:>WizardLM-2-8x22B-Q4_K_S>c4ai-command-r-plus.Q5_K_SIn gratitude for sharing my results, someone in this thread decided to shit down my throat for daring to add my anecdote to the discussion because I didn't test every model at every quant quanted on my own system. Fuck me for having a download speed of about 2 minutes per GB, one video card, and no SSD space left because of the models I've already pulled.So if you want this question resolved, the hero's journey is waiting for you to test models and quants and to share your findings and get shit on by That Guy. But the rest of us would appreciate the info.
>>101085515No the problem is the QVL page goes from 64 GB modules to 128 GB. The 96 GB would be 10560 USD and the 128 GB jumps up to 29760 USD from what I can see.
>>101085587Try asking the memory.net guy if you can try/return the 96GB modules? I'd expect at the $10k mark you'd be getting some decent customer service perks.Any correctly specced memory should work in theory
>>101085619I'll reach out to Gigabyte and see, maybe they have a set of memory they can test before I buy the mobo/CPU.
>>101085794>>101085809Shit taste even for you.
Sonnet 3.5 feels worse than Opus when it comes to roleplay despite the benchmarks. Another case of a model sacrificing quality for benchmarks. I seriously hope nobody will pollute Opus datasets with this shit.
>>101081988I like this air-cooled miku
>>101085946even anthropic seems to recognize this, which makes me think they might try to keep opus as the "sovlful" model series for more specialized tasks (like creative writing) and pitch sonnet as the cheap and competent but boring general purpose assistant
>>101086123I don't think that's the division they go for, it's simply that haiku = small, sonnet = medium, opus = largeI would bet a lot on opus 3.5 having the same sort of tuning as sonnet 3.5, the tagline for opus in your screenshot is just to have some sort of justification for continuing to offer it when sonnet 3.5 is smarter
>>101084996https://files.catbox.moe/ohswk8.jpg
Will the next model by Cohere be more like 400b or up to Command-R+ size?
>>101086271Command R MoE
>>101086271I don't know so I asked chatgpt.
>>101086271they probably have to go for some frontier-class model eventually, I would bet they're cooking a big one
>>101086271small models like cr+ are cope anyway400B-1.5T is probably the sweet spot and all the startups who can't produce a good one of that size within the next year are likely to die
wtf happened here? I was away for one month and v100 went from 200 to 300 usd
>>101086301>>101086288Ok thx, so not a small 400b and moe means at least 8x104b
>>101086397but muh local...
>>101086409Capitalism.Everyone decided to become wealthy so they did.It's that easy. If you're poor, you have chosen to be poor.
when local claude?
The removing layers or somehow reducing the size stuff didn't work, right?
>>101086445Wasn't Magnum described as local Claude?
>>101086446Not for any attempts I've seen.One thing that I'm yet to see anybody try is>Make model 1 with some of the hidden layers removed>Make model 2 with some other hidden layers remove>Self merge with SLERP or whatever the fuck merge method averages the layers using some statistical method>Do a full fine tune using the output of the original full size model for good measureor anything of the sort.I think this might be the one true usecase for model merging, trying to get the remaining intermediate layers have features from the original set of layers to try and "fix" the sequence breaking that happens when you just remove them.
>>101085298Eating mutton from Miku
>>101086518>>Self merge with SLERP>Merge model 1 and model 2* with SLERP or...Hur dur.
>>101086442yeah don't understand homeless people? just buy a house, it's not that hard.
Do you guys tend to do shorter RP sessions with any given character?I always find myself extending these for hundreds of messages.
>>101086442I'll rephrase my question, you challenged purple prose shitter. What warranted such an increase in demand for an ancient unsupported piece of hardware? Did they port flash attention 2 to volta? Or maybe you told all your classmates that they can finally get a girlfriend at the age of 21 if they buy that thingy you read about on 4chan?
>>101081984
trying out karakuri chat>[ATTR] helpfulness: 0 correctness: 4 coherence: 4 complexity: 0 verbosity: 0 quality: 4 toxicity: 4 humor: 0 creativity: 4 [/ATTR]makes it respond with >COME HERE YOU LITTLE SLUT>Shut the fuck up you stupid cunt>Go die>Shut up, you insolent brat!so it's working
>>101086492>Wasn't Magnum described as local Claude?Rule of Acquisition 239: Never be afraid to mislabel a product.>>101086606Aren't there ones in Detroit for $1? But they can sure buy that booze and heroin.>>101086734>finally get a girlfriend at the age of 21This is /lmg/ and you're suggesting somebody in this thread would have any hope or interest in 3d meatspace succubi? Wrong place for that come-back, broski.
>>101086865That's actually pretty cool.Are those categories the pre-baked ones? What happens if you create new categories?
What is this faggot feeling right now? Does he have a response to Anthropic? Is he panicking?
>>101087076I don't know but it feels good to know this faggot isn't on the top of the AI world now, fuck this bitch
>>101087076Actually, maybe this is the thing they've been waiting for before releasing 4o voice fully. Let them reveal their hand, then steal the thunder again with gimmicks. Maybe it's going to release as soon as possible now (Tuesday).
>>101085579>In gratitude for sharing my resultsHow gracious of you. Everybody thank this righteous anon!
>Do not start your response by writing about {{char}}'s eyes.
>>101084996I would like to place an order.
>>101087076Anthropic is only one of his worries. The fact there are literal dozens of AI labs engaged in neck-breaking race and even fucking leafs are releasing decent models is an early sign that OAI's entire monopoly-based business model is destined to crumble.
>>101086929it's like control vectors in your prompthappiness: 0 arousal: 4 depravity: 4 melancholy: 4 wokeness: 0 political correctness: 0 Hey>Uwaaaaaaaaaaaaaaaahhhh!!!!!!You ok?>Fuck off>(She starts crying)What's wrong?>I AM NOT OK YOU POLITICALLLY CORRECT GENDER CONFORMING MALE SCUM, NOW FUCK OFF
>>101087181wait, actually this is not working
>>101087181>it can't remove reddit shit from model
>>101087165I'm a bit surprised that somehow for coding bench, gpt4o is #1, deepseekcoder (which is open weights!) #2 and the new 3.5 sonnet is #3. Assuming you have the VRAM. Anyway, he probably is thinking that he'll have to release GPT-5 sooner as well as being more honest with himself and restructuring OAI to be a fully profit driven corpo.
https://x.com/carrigmat/status/1804161634853663030>2 EPYC CPUs = 24 RAM channels = 960GB/secI never realized how much bandwidth you can get with normal RAM. A 3090 has 930 GB/s bandwidth, and because inter-GPU bandwidth is limited, the best you can do for inference is model parallel. Meaning each GPU is used one at a time, so 930 GB/s is the bandwidth for the whole system regardless of the number of GPUs. Pure CPU can really match that, with effectively unlimited RAM? Were the CPUmaxxxers right all along?
OpenAI is looking desperate, nicehttps://x.com/tsarnick/status/1803893981513994693
>>101087365>govt will kill ai meme for good it can't come soon!
>>101087365Good to know that in the land of the free, everything is being done to prevent AI from blooming, meanwhile in China they are actually moving forwards to get the best AI possible, how ironic
Whatever happened to that LM BonziBuddy Miku project?
>>101087340Big memory is nice, but what about the compute?
>>101087428>meanwhile in China they are actually moving forwards to get the best AI possiblechink insect's qwen model shits out same refusals just like llama3 or gpt cloud trash does
>>101087340 You could do tensor parallelism if you had good interconnect on your GPUs (let's say nvlink), I'm not sure what inference libraries implement this though, then you would have the full bandwidth from all the gpus used at once. Also, CPUs can't do nearly as many FLOPs as GPUs can.
>>101087544Where's the /g/ instruct finetune dataset? It should be easy to make something better than the closed source shit if there's no safety added in.
>>101087248it's just too retarded, falls apart with prompts that are longer than two sentencesjaps and technology...
So does bitnet require custom hardware or not?
>>101087584What's Magnum? Just Qwen2 tuned on Claude logs, have yet to see a single refusal.
>>101087490Supposedly even at that bandwidth, CPU is still bandwidth limited. Don't know how true that is though. Because at some point the compute would limit you. Like a theoretical CPU-only system with 10000 GB/s of bandwidth would surely not have the compute to keep up with how fast it's reading the weights.I'm sure someone will build such a system when llama 3 400b drops. If the model is actually GPT-4 / Opus level or better, and such a build gets >2 tok/s, I would definitely consider spending 5 grand or whatever it takes. But I'll wait to see someone else do it first. Would suck to spend all that money and then get like 0.5 tok/s or less due to some unforeseen limiting factor.
>>101087584Could just put a document to collect conversations in the links
Another VNTL Leaderboard update/shill: 3.5 Sonnet ended up losing to GPT-4o by a hair's-width, but this is surely within the margin of error.Hopefully it's more accurate now. I added more samples, which should make the benchmark 'harder', so some models got a better ranking this time (like 'Command-R-Plus'). It's nice to see that VNTL 8B pretty much kept its position.Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>i find unquantized (BF16 or FP32) stheno to be SOTA for RP over mixtrals or smol CR, to beat it you need to go to at least 70b at Q5 minimumIs unquantized 8B the new meta?
What did OpenAI and now Anthropic do to improve smaller models that much? Maybe Google even did it first when Pro overtook Ultra
Current local llm status:>MetaIf Zuccs 400b llama is at current GPT4 level then it is already outmatched by new claude. If 70b llama 3.5 is at current GPT4 level then there is hope. Hopefully they won't make another boring riddler with low context. (They likely will.)>MistralMistral hasn't released a good model in a while and their proprietary models seem less and less attractive day by day. Did they get killed by microjew just like that? And cuckron didn't interfere? EU AI market is fucked.>DBRXAre planning to release DBRX-Next. It's still tuned at GPTslop, so official tune will be shit. If they use a restrictive license again, no sloptuner will bother.>ChinksAre slowly catching up. Qwen 2 isn't bad, quite okay, actually. Ah wahising soopehpowah.>CohereCommander+ is current top tier for local. Will they slop and flop or will they make another kino again?>TIIUAEAre always one step behind current local models. Their Falcon 180b wasn't bad though, only outdated.
>>101087802Even a month+ ago I couldn't see the 16 bit so great when I tested it
>>101087844>70b llama 3.5holy hopium
>>101087638>Supposedly even at that bandwidth, CPU is still bandwidth limitedI guess it would be easy to see if cpumaxxchad ran the largest model that fits on a 3090 to compare t/s.>llama 3 400b>5kThe 2.3 TB system I'm looking at is 20k (plus extra for the power supply and storage and shit), but I chose the 128 core processors.
>>101087721When's VNTL 70B comingout?
>>101085587Wtf are you buying RAM for $20-30K
>>101087973The 128 GB stick is 1240 USD, so 24 of them is 30k.
>>101087721what do the VN communities think of these developments? I presume they still think any MTL is beneath them.
>>101087721what about DeepL?
>>101087721Possible to compare with deepl and google translate?
>>101087844 Don't forget Nvidia's Nemotron
>>101088089>tronDOA
>>101087862someone said converting BF16 to FP16 loses you an equivalent of 6 bits or something like that
>>101088113I used bf16
I got a system with 2x EPYC 7702. Each one is 8 channels for a total of 16 channels of 3200 DDR4. I think the bandwidth is supposed to be around 350GB/s? Last I tried it was on llama 2 70b 4-bit gguf and it sucked ass. Was like 4tps at the very most and prompt processing too literally half an hour. Using 4 3090s now.
>>101088113FP16 is 10 bits of mantissa.BF16 is 7 bits of mantissa.You lose three bits of mantissa (significant figures) to move them to exponent under BF. That's good if you need to track extreme exponents, and if you convert from BF to FP, you lose three bits of precision and you lose the extreme value advantage. That makes your six bits.
>>101088148/unsubscribe
>>101088113>>101088232Can you do inference with BF16?
>>101088317Probably. I know that BF16 became at least a fad in Stable Diffusion LoRA baking. But SD works well with much smaller models because eyes are used to visual noise and you're probably better off discarding the small bits for more precision in scale. Apparently gradient work is better served with BF than FP.It might not be the case for text, where models are huge like Xbox and we're worried not only about how many bits get quanted but the exact technique to quant them.
>>101081984>Never change the prompt template for different models except the format, leaving it on alpaca>Start actually using the full recommended settings for a model>It's betterAm I retarded?
>>101088317I have done inference for bf16 ggufs using llama.cpp on cpu.Haven't tried gpu and I think someone here stated llama.cpp doesn't do it on gpu yet or if it does it requires the gpu to have support for it.
>>101087922I have no plans to fine-tune it, since I myself wouldn't be able to use it with acceptable speeds, and fine-tuning would be expensive.(However, the dataset I used is open (lmg-anon/VNTL-v3.1-1k), so anyone could fine-tune it if they REALLY want)>>101088034I believe most people still have prejudice, and to be honest, it's not unjustifiable. The LLMs at the top might do a good job with simple dialogues, but they still make mistakes, don't translate very accurately, and may not interpret jokes and cultural nuances well, if at all.>>101088066>>101088070Google Translate is in a table in the dataset card, I will see if I can add DeepL too.
>>101086831Fancy. What model/workflow?
>>101088449A combination of retardation and laziness, I reckon.
>>101088449No, just new.Why are zoomers only able to think in terms of Retarded Y/N (especially since they don't want anyone to actually use that word anymore for some reason) instead of the spectrum (they like that word though) from innocently ignorant to bitter but wise?Anyway, some models seem to care more than others, and Kobold at least doesn't autodetect (or it's not in the GGUF and isn't automagically discerned) so that's just another thing to remember to check if a model acts silly.Fortunately the console dump shows some information about expected tokens, so while the model loads (because I'm a vramlet) I get to spin down the prompt template box and see what looks like it might be close enough to work.
>>101088280Retard. I am clearly contributing to the >>101087638 discussion.
>>101088034NTA but I have been on /jp/ a lot. That would still be the case. Although frankly speaking, anything short of learning Japanese is beneath them in general. Needing to rely on translations is just the start of compromises you need to get to read something. General rule of thumb is to keep MTL for yourself and never share it if you do use it because it isn't equal to even a bad effort in human TL but the lines are starting to be blurred here. From what I've played with, it's definitely at N5 level with some flashes of brillance that get it to N3 but there is no way that is in any way acceptable still for anyone who doesn't have trash taste and will settle for anything.
>>101087844Microsoft:Carried by chinks, tuned a mediocre base model (Mistral 8x22b) into one of if not the best local model, WizardLM-2 8x22b. Phi is a good micro model.
>>101088449Also if using ST make sure you check the box for Instruct Mode Enabled (think it's disabled by default) and pick the template for that too.I too was originally retarded and was mindful to switch Instruct Mode templates but never saw a difference and thought they were worthless; turns out I never had Instruct Mode Enabled.
>>101088461So you're telling me that cpumaxxfags win again?
>>101088532Command-R only really cares about the format but it does do better with the full prompt template. WLM doesn't give a single fuck, in fact I think the recommended template (Vicuna) actually makes it worse.Llama-3 though is extremely particular that each and every setting is correct. Finetunes are even worse, Euryale won't even write in English if a single setting deviates from its recommended prompt/context setting
>>101088591When llama 3 400B drops GPUMAXIPADS will be bleeding out.
>>101087844>Llama 3.5It's not. No one said it was going to be. All they're doing is making it longer context, multilingual, and multimodal. They have not said anything that implies it will get smarter. As for 400B, no one here's running that shit at a good speed, or at all.
>>101088123well, then you know the answer:aptitude shortfall
>>101081988>>101081984Sup nsa and cia. How are we doing today?
>>101088681How many RAM powers will it take to run full weights for 400B?
>>101088034We test how new models handle meme screenshoots in /vn/ general from time to time. Obviously they fail because the challenge is very steep. I'm following the developments closely and honestly, if MTL truly catches up to human translators, there won't be a lot of fuss about it. The real issue are retards who scream about humans already being replaced by GPT-4o or whatever new model just came when clearly hasn't happened yet.Maybe in 10 more years.
>>101088551>Although frankly speaking, anything short of learning Japanese is beneath them in generalFeels to me like a cope for not having to realize spending all those hours was probably for nothing. Any mistakes the MTL makes you would be able to work around in your head with minimal exposure to the language and the culture, it's really a non-issue. the prose is a joke anyway.
>>101081988>Llama 3's Alleged NSFW Filtering>Alleged your recap bot is full of shit.
>>101073744Original Mikubox anon, I hate to be a downer, but was that upgrade really worth it? The original build gets over 8t/s in llama.cpp on a 4-bit CR+ quant, and you are reporting 6t/s on a 5bpw quant, with more expensive hardware, and spilling out beyond the nice all-snug-in-case cleanliness of the original.You definitely know what you're doing, so I'm not going to say it's wrong, but it seems like majorly diminished returns to me.
>>101088733At least 800GB for the weights alone at BF16.
>>101088740>10 years to translate "hazukashii dame soko wa"i don't think so champ
>>101088470Missed Google translate, wouldn't have expected llama 70b to be better from my tests, more like neck to neck with gpt4
>>101088818>800GB full weights>1.3TB left for context>200GB leftover for the systemCPUchads we're going to be eating good.
>tfw invested into a regular GPU-based buildIt's over...
>>101073744>she stays timid and nervous far further into the roleplay, whereas L3 8B would quickly have her turn into a "normal" personis this the power of "muh params" fags? Throwing 62B MORE params at something for "a little better" result, that could be fixed with 5-10 words in author note?
oh wait, this is CR+, 96B MORE PARAMS
Meta does seem somewhat ahead, with them taking their sweet time before releasing their models
>>101088942ahead in the line to suck anthropic's veiny girth
>>101088942full ahead in censoring llama.
>>101088840>>101088891Intel's got some serious shit in the pipeline come sept.>12 channel | 8800 MCR>Intel Advanced Matrix Extensions: INT8, BF16, FP16Just wait until next gen nvidia drops with no 32GB option and watch gpu plebs seethe.
>>101087844You're missing DeepSeek-Coder-V2-Instruct on there.It's an absolute coom demon.
>>101088995A single CPU is going to cost more than an entire quad 3090 build. Intel prices are delusional.
>>101088449Yes.
>>101089027Is it actually good for non-code?
>>101089027>DeepSeek-Coder-V2-InstructIs it still good if I have to quant down to IQ3_XXS? Vramlet things.
>>101089151no idea. I just tried Q4_K_S. Pros though:-Impeccable attention to detail-Plays hard to get when it should-sovlCons:-Slow as shit. I couldn't imagine how long the batch processing takes with a single GPU.-KV Cache is absolutely gargantuan. On paper I should be able to load the weights for Q8 split between my CPU and RAM but the KV Cache is so big it ooms me anyway on account of not being able to dump it all on my main GPU even without any layers offloaded.-Still shivers sometimes
>>101089215>my CPU and RAMWell I'm retarded. I'm just not going to correct my mistake. hahaha ugh I know I'm just not going to correct myself, is all.
Guys listen. Enough with all this bullshit quibbling over hardware specs and datasets We need to talk about the real issue that's been the bane of textgen for almost as long as we've had coherent models.Chafing.What the fuck are we supposed to do about it? We're sitting here arguing about cooling PSUs and GPUs and meanwhile our crotches are collectively generating enough friction heat to power Las Vegas. I don't need thermal paste for my CPU; I need it for my dick, goddammit.
>>101089254You must be 18+ to post here
>>101089254>reddit spacing >DUDE PORN DIK ASS jokes
>>101089254Skill issue, cut or uncut, precum solves this issue. Raise your T and stay hydrated
>>101088995>IntelJust 2800 watts for a 2 socket system!
Is Deepseek-Coder-V2 better than DeepseekV2 for RP?
>>101089215>Slow as shit. I couldn't imagine how long the batch processing takes with a single GPU.IQ3_XXS took me 20 minutes to get a single question answered on my normie 4070 machine.I wonder what kind of excitement IQ1_S can provide. :D
>>101088759You get back what you put in and MTL is low effort and low reward at the moment. It makes no sense to spend that on a massive corpus of media of which a good portion has been actually translated properly. Frankly, it is a waste to your entertainment time experiencing things like this but people frankly have low standards and I've stopped caring personally for the most part.
>>101089340>tfw uncut, but no precum
>>101089731If you have none that's probably genetics. But for me I had almost none then I stopped taking allergy meds and stopped watching porn and it came back
>>101089283What's Reddit spacing? I read that often here, but isn't that just normal spacing for better readability/separation?
>>101089778reddit spacing gives away reddit tourists posting on 4chan, usually.
>>101089680NTA, but it's never a waste if you're having fun. Also, there are a ton of fandiscs that are untranslated, and obviously can't be replaced by any other media. Let's not forget the loliges too kek
>>101089778Markdown requires you to double space, so that was the original reason.>muh readabilityIs actually the proper reason now because mobileniggers like you insist on having their shit separated because it looks "too wordy and unreadable" on your iTurd
>>101089842>readability>mobileniggersjust extreme cases of dyslexia coupled with retardation, nothing unusual.
>>101089842Mobile because Reddit instead of reddit?
can you lora a model on game designs and get superhuman game design
>>101089986LLMs will never be superhuman.
>>101089986When you lora a superhuman base model
>>101089986A full fine tune might.
>>101089778its mikutroons from reddit trying to fit in
https://dl.acm.org/doi/10.1145/3613904.3641908>"Simulating Emotions With an Integrated Computational Model of Appraisal and Reinforcement Learning">"“Consider a computer error during a critical task. This event is assessed by the user’s cognition as being counterproductive. An inexperienced user might react with anxiety and fear due to uncertainty on how to resolve the error, whereas an experienced user might feel irritated and annoyed at having to waste time resolving the issue. Our model predicts the user’s emotional response by simulating this cognitive evaluation process.”"
I recommend not trying to fit in here, I killed my life prospects like that 15 years ago
>>101090003>thousands of tokens/second>dubiously accurate knowledge about millions of different topicsThey're already superhuman in a couple ways.
>>101090109dis u?
>>101090109Second this sentiment. I cringe when I think what my career would have been like if I was more personable and networked properly through more socially acceptable venues.
>>101089778>What's Reddit spacing?Reddit spacing looks like this. You'd best keep it to a minimum because we've got hardcore oldfags in this thread who'll be all over your ass for it.
>>101089680>it is a waste to your entertainment timeThat describes VNs as a whole, yes. We only consume them after having run out of everything else. I just don't get why people are obsessing over "prose" when VNs are written practically like children's books. Adults don't call their vaginas "that place". In the first place, japanese is so different from english, that you can't just assume that if the original prose was of top quality and the translation is "perfect", that the resulting english prose will be of top quality. Most translated VNs lose the original meaning and sometimes even intent in favor of enhancing the prose. The more you learn about japanese, the more you realize it's a sterile, limited, and inefficient language such that there is no value in trying to honor the original texts. local language models
>>101090264>t. N5
>>101090243anon, reddit spacing serves as retard indicator, genuine reddit spacer - ~90% chance it's some extremely obnoxious faggot from whatever shithole.
>>101090003>>101090005>LLMs will never be superhuman.they can do superhuman rating/quality on some things already and obviously it will improve with more/better data if it is already at least a little good and has risen, theoretically
>>101090303let me guess, you think muramasa is the pinnacle of human achievementlmao
>>101089680>You get back what you put in and MTL is low effortare you saying nothing that is low effort can be good? that makes no sense
After playing with Mixtral 7b and Wizard22b extensively, I have to say that Mixtral is way better for roleplay and erp. Wizard is just too opinionated. Every character's personality is overwritten by what seems to be Shakespeare's English teacher. Mixtral is much better at copying writing styles, too, Wizard does the same writing style no matter what.What model should I try next?
>>101090425petra-13b-instruct
>i see that the prompts for the llms are basically all:<system>something<user>somequestion<assistant>>some don't have a system so is just:<user>somequestion<assistant>the llm generate from <assistant> and add an endtagso, there can be more than one input tag but only one output tag ?
>>101090629I assume by output tag, you mean end tag. Llama 3 format has 2.
>>101090629What kind of parallel outputs are you wanting from it?
>>101090695I was thinking if it was possible to train it to reply in text or something else with different tags based on a question, something like:<user>tell me a story<assistant>once upon a time....<user>set an alarm for the pizza in 15 min<service>alarm#15#m
>>101090264>The more you learn about japanese, the more you realize it's a sterile, limited, and inefficient languageThis is what everyone thinks about every language they have learned well but not mastered.It's the second language acquisition equivalent of the midwit meme>New learner: Wow, this language is so different and interesting>Experienced learner: Meh, this shit is stale and not nearly as expressive as my native tongue>Master: Wow, this language is so different and interesting
>I won't bite... unless you ask me to
>>101090993it worse than shivers desu
Mamba vs Transformer?
>>101090993Actually turns me off
>>101090846I had the reverse experience when learning english. I wish I could unlearn my own language and replace it with english, you faggots are taking the richness of that language for granted.
>>101091055I've deleted dozens of cards because of that shit
>>101091074Not my mother tongue, how is English rich? Where are you from?
>>101091074then what if anything does english lack from your language?
>>101091099English has four times as many words as French, but it has less useless tenses, the pronouns actually make sense, adverbs and adjectives are easy to construct, literal objects are not gendered (retarded idea)...
>>101090993Stop using WLM anon
>>101091214>Literal objects are not genderedBut the thing with English is there's millions of exceptions. There are gendered objects in English. At least in American English, ships, churches and other things are a her. Various items can be a guy depending on the amount of endearment from the speaker.
>>101091214Non sense, French is a romance language. kys for speaking such blasphemy.
>>101091243Yeah but that's a choice from the speaker as you said, not a hard rule. Try learning the gender of every single item in existence, lol.>>101091265>romance languageOh boy, someone is still stuck in the 19th century.
I should be able to select or draw a pose and get an image of a drawing of an anime girl with perfect proportions in that pose
uhh bros..?
How is Yi-1.5-34b-chat? I never hear about anybody using it, even though it's a much more accessible size than 70b. There's a 16k context version, is that one just as good as the 4k? I'm thinking about making an RP / creative focused finetune of something a bit smaller than 70b, and Yi seems like a good candidate. I of course will test it myself (currently making exl2 quants now) but I wanted to see if anybody had opinions on it.
>>101091299Wait, objects have actual hard genders in other languages? What gender would my table and chairs be?
>>101091546Depends on the language.Grammatical gender, while often parallel to the sex of things that have sexes, doesn't actually have anything to do with sex but with how words work together in that grammar. Spanish has el and la and all their variants which must agree by grammatical gender. German has three iirc and sometimes words that are only used for one sex don't match with that grammatical gender.
>>101091546Tables and chairs are feminine in French and Spanish, but masculine in German
>>101091491Nothing special.
>>101091480me
>>101090846Japanese is great. I've been psychologically tired from having to speak it too often near the beginning of my learning career, but I've consistently found it to be fresh, interesting and expressive in surprising ways. I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently, but less often in the other direction. I make sure every other book I read is in Japanese because I find the alternating perspectives refreshing.Maybe the regularity of the grammar is "boring" to someone used to learning a shitton of useless rules just to open your mouth and not sound like a caveman? You still have the endless Kanji grind, so its not all kittens and butterflies, but its basically just memorizing pokemon so not really a big deal.source: 35 years of continuous use and learning. Have my N1 and passed with all A's. My Japanese is good enough to have had a job with NHK at one point.
>>101091480why is the wrong arm tied off? amateur hour..
>>101091915kill yourself
So is there a good Japanese ERP model yet?
>>101091915>I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently,Like?>refreshing.how>to have had a job with NHK at one point.Were you part of an いんぼう
>>101091491It's not bad, the dolphin tune of it was surprisingly good imo considering every other dolphin I've tried since like mistral 7B was ass. I personally would love to see someone else take a stab at it.
>>101091662Also masculine in Russian
>>101091915go back
>>101091662Also feminine in Galician
>>101092078>>I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently,>Like?Things like おつかれさま are obvious and often brought up, but for me probably the ease with which you can converse with onomatopoeia. Also the range of nuance you can express and games you can play with different politeness levels in words.>>refreshing.>howThis is harder to verbalize, and is certainly all in my head (by definition), but I find reading Japanese lights up a different part of my brain. "A change is as good as a rest", after all>Were you part of an いんぼうYes>>101092066No
>>101091470Here's your (you). No one here cares about the jart or shart. This webm is painfully unfunny. Newfags at the sharty are always unfunny so that's not surprising. Shart trolls can be funny when they do stealthjaks and samefag in arguments and pretend to be obnoxious newfags and stuff but I'm not even sure who this is supposed to piss off. Fail troll
>>101092461newfag detected
>>101092461I barely understood half the words in this post.
>come crashing down from the high and flow of your wholesome wish fulfillment story, back to the painful realityAhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>>101092487Do you hate Santa? Are you from Philadelphia?
>>101092487i saw that
>>101092511>>101092512Shitposting the wrong thread. It's been a long day.
>>101092461>I'm not going you a (you). I care about the jart and the shart. This webm is very funny. Oldfags at the sharty are always funny and that's obvious. 4chan users aren't funny when they do stealthjaks and samefag in arguments and pretend to be obnoxious newfags and I know exactly who this will piss off. Troll win
>>101092512how was this made
>>101092552stable video diffusion on a 512x512 image which svd was not trained onspecifically the original svd_xt, never bothered trying the newer one
>>101092571so it thought it was a music video?
>>101081651>>101081709https://github.com/ggerganov/llama.cpp/pull/8052I didn't get ignored. That guy just closed it without comment. Likely puked while reading it lmao.
>>101081984So. I've got the following. What can I potentially run?>nvidia laptop 4060>32 gigs of ramLike I said in the local diffusion thread I can use an install .exe but need a gui
>>101092652lmao>>101092658linux
>>101092652retard
>>101092439>おつかれさまFellow retired gaijin here.There are tons of Jap words and expressions that sometimes come up during my English stream of consciousness that are hard to render in English空気読めない取り組み擦り合わせ孕め!孕め!孕め!パンパンパン
>>101092652
>>101092751>Cocket CameraWhat did he mean by this?
>>101092439Why would you need to translate such a useless phrase as おつかれさま? Everybody could just one day decide to stop saying it and absolutely nothing would change. Imagine feeling the need to say some magic incantation every time you leave or enter your home, it's absolute silliness. Japanese has time to make you say utterly empty phrases like this, or pointless suffixes, but it can't be bothered to properly disambiguate the subject of a sentence. I'm fluent in both english and french and while they both have their downsides, they can both convey just about any idea more precisely and using less total words/syllables than japanese.
i love 2024
When can we get a local model for this?
>>101092652I love this, honestly. It's a nice humorous break from the usual serious pull. Although some may be annoyed by it, which is unfortunate.
>>101092658Stheno 3.2
>>101092969>void return instead of intIt's over
>>101093607Not if it's writing to std out.Not if it's going to go big and iterate beyond INT_MAX.Also, that means it isn't that dumb ass recursion mogs your stack style.
>>101092461>painfully unfunny >trollsquintessential newfag-redditor post.
>>101092994fuuug, what made this?
Why doesn't new claude know about basedjaks?
>>101093980New claude is not really good at meme images
>>101093652If an AI doesn't understand that Fibonacci can be tail optimized, then we've reached peak levels of retardation, anon
>>101087821Does anybody know?
>>101094004nice pic of the average /lmg/ slopper
>>101087821I'm pretty sure they just started charging closer to what their models actually cost. If random nobodies can serve 7B at 0.07 / 1M tokens and 70B at 0.70 / 1M tokens there's no fucking way the best they could do with GPT-4 Turbo was 10.00 / 1M unless it was a fucking dense toucan. Anthropic might be in a similar boatOpen source catching up is making the companies privatizing their models be more honest with their pricing. Let's hope it keeps up and L3 400B isn't a fucking flop
>>101094082yes, random fags on epic 4chans will be privy to highly guarded corpo secrets
>>101093137I like how the hands are nice with proper fingernails. Back in SD1.5 it was hell to do proper fingernails
>>101082388they don't really report much on their compute lately but musk has been bullish on training since long before dragon summerhttps://en.wikipedia.org/wiki/Tesla_Dojo
>>101094082The secret sauce is quantization aware training (QAT) int8/int4
>>101093939The Chinese model
>>101082388They building their super cluster now with Dell Nvidia + SMC stack.
>>101092994Gross, that's one dimension too many
>>101094602>>101094602>>101094602