[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101069457 & >>101058366

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101069457

--Papers: >>101080491 >>101080617 >>101080830
--Updating LLaMA CPP Python Repository: To Update or Reinstall?: >>101077391 >>101079473
--Logs: Sonnet 3.5 Surpasses Claude Opus in Moral Lecturing Correction and Akinator Tests: >>101079178 >>101079201 >>101079365 >>101080108
--Wiz8x22's Partial Uncucking Control Vector: A Step Towards Freedom of Speech: >>101074772 >>101074969 >>101075055 >>101075630
--New Hermes Model Released: Hermes 2 Theta 70B: >>101073994 >>101074011 >>101074046 >>101074098 >>101074185
--Mikubox Conversion and Command-R-+ Performance Testing: >>101073744 >>101073885 >>101073982 >>101074474 >>101077433 >>101077777
--Improving AI Model Coherence with Rules Blocks: >>101069688 >>101069906 >>101069985
--Good OCR Models for Manga Translation: >>101079755 >>101080120 >>101080254 >>101080525 >>101080485 >>101080531
--DeepSeek-Coder-V2-Instruct Template for ERP: >>101071137 >>101071250
--Control Vectors for Retards: A Guide to Using Them Correctly: >>101078337 >>101078357 >>101079111
--Convenient Dropdown System for RP Clothes Selection and Scene Settings: >>101071966 >>101072080 >>101072151
--LiveBench: Comparing AI Models Across Performance Metrics: >>101074074 >>101074158 >>101074190 >>101074212 >>101074230 >>101074249 >>101074817 >>101075880
--Logs: DeepSeek-Coder-V2-Instruct Q4_K_S Nala Test Performance Discussion: >>101070306 >>101070845
--Anthropic's Claude 3.5 Sonnet: A New Contender in AI Model Performance: >>101069634 >>101070454 >>101072805 >>101072635 >>101072714 >>101072728 >>101075129
--Anon Questions Llama 3's Alleged NSFW Filtering: >>101070409 >>101070433 >>101070523 >>101070826 >>101075916
--3.5 Sonnet: The New King of AI Models?: >>101072633 >>101078131 >>101078562 >>101072693 >>101072748 >>101075267 >>101079435 >>101076701 >>101076784
--Miku (free space): >>101070155 >>101070209 >>101072366 >>101072652 >>101075119

►Recent Highlight Posts from the Previous Thread: >>101069467
>>
Hermes worth it? Teknium said he wouldn't put it on lmsys arena because it's too much work, makes me think I can ignore the release.
>>
>>101082048
>Teknium said he wouldn't put it on lmsys arena because it's too much work,
That's a codename for "my modei is shit lol"
>>
>>101082048
>keknium
>>
>>101082048
>beat llama3 on all benchmarks!!!
>nah whatever it's not worth it
yeah sure
>>
>>101082048
nah i'd win
>>
>Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU!
>The strongest open source LLM model Llama3 has been released, some followers have asked if AirLLM can support running Llama3 70B locally with 4GB of VRAM. The answer is YES. Here we go.
>Please note: it’s not designed for real-time interactive scenarios like chatting, more suitable for data processing and other offline asynchronous scenarios.
https://ai.gopubby.com/run-the-strongest-open-source-llm-model-llama3-70b-with-just-a-single-4gb-gpu-7e0ea2ad8ba2
>>
I forgot about x.ai, you have more hope for them or Meta? How censored is Grok?
>>
>>101082164
ol
mao
>>
>>101082048
>Hermes 2 Θ uses ChatML as the prompt format
>Hermes-2 Θ is a merged and then further RLHF'ed version our excellent Hermes 2 Pro model and Meta's Llama-3
>One tip though, because of the merge, add <|eot_id|> to your stop tokens in LMStudio and GGUF inference engines, it sometimes outputs this token as an artifact of llama-3 instruct.
DOA. If they knew they were going to merge it with Instruct, why use a different prompt format? They won't put it up on arena because it looks good on benchmarks but performs worse, especially in prompt adherence.
>>
local needs to focus on waifus and coom. And someone can make a plugin/frontend/whatever that will augment your local waifu chat with messages from APIs, who will receive an obfuscated version of your chat with only the relevant SFW parts in it.
>>
>>101082246
Lol wtf. What a joke.
>>
>>101082164
Generate your will for your grandchildren by running one layer at a time!
>>
>>101082238
lmao
>>
So anyone figure out if S quants are really better than M and L quants?
>>
>>101082287
Just send the prompt to Rajeesh on Fiverr, this little trick will augment your local waifu for sure sir.
>>
>>101082238
No hope at all. Musk is a cuck that shills for regulations. Grok is a shit model. https://huggingface.co/xai-org/grok-1 try it if you can run it.
>>
>>101082346
ks is better than base m but worse than km
>>
>>101082368
>base m
That exists? I haven't seen any before.
>>
>>101082361
1.5 seems much better though, and aren't they going all in now, buying as many Nvidia GPUs as possible?
>>
>>101082164
I wonder what's the slowest t/k one could get
Nemotron 340B at FP16 using HDD swapping would get around 0.0004 tk/s
>>
>>101082376
nobody uses it any more because it's shit
>>
>Three days and chameleon still isn't supported in llama.cpp
It's over.
>>
>>101082389
Put a 56k modem over a dodgy line in the middle and nfs or sshfs.
>>
>>101082246
Realistically, the guy is a retard crypto bro.
But you could make the point that choosing a different prompt format allows you to circumvent the alignment of the standard format.
>>
>>101082358
bhai thank you brother, stay strong and good day sir
>>
What are the best finetunes currently? Don't care which ~70b base model, as long as it's good. Also someone revive wizard please, the only open ones actually capable of finetuning
>>
>check turbocat's hf page
>new model he uploaded for someone
>https://huggingface.co/turboderp/llama3-turbcat-instruct-8b
>see the images
Is this what happens if you ingest too much rp data?
>>
>>101082832
>chinese support
The chinese have bought out exllama2. It's over.
>>
3.5 Sonnet is too good anons, I think they might have done some witchcraft. Maybe it's related to that paper about understanding the model features?
>>
>>101082832
Sounds like there will be a Qwen 72b version that is supposedly better than the old 70b version, even in English, I'll try it, hope it won't answer in Chinese from time to time
>>
Me trying to find the perfect quant+inference server combo:

>EXL2, best performance if enough VRAM, flexibility for any bpw

-TabbyAPI: up-to-date with exllamav2, batching, but gives me errors in some models about Chat endpoint not having a prompt template? (skill issue?). Support for Q4, Q6 and Q8 cache.
-Textgenwebui: Always works, errors about prompts or anything. Slow updates. No batching. Support for Q4 cache
-Aphrodite: No support for Q4 cache. Batching. Slower updates.
>>
>Gigabyte MZ73-LM0 (rev. 2.0)With 2x AMD EPYC Bergamo SP5 ZEN4 9754 CPU Processor
>US $9,750.00
Any precautions I should take when ordering one of these?
>>
>>101083080
Not ordering it
Just get a normal mining rig
>>
>>101082958
>TabbyAPI
That error just means there isn't a template for chat completions in the template folder; models come with a default template in their tokenizer config.
It's irrelevant if you're using text completions. If you're using chat completions, you can ignore it most of the time, though some mistral models and CR/CR+ will throw an error if you try to send a system prompt using their default template.
>>
I don't understand why people buy the hotz tinyboxes, what's hard about inserting a few gpus into a Mainboard?
>>
>>101083154
The mainboard shitting itself for whatever reason.
The same reason people buy apple shit, because you don't need to think about it too much.
>>
>>101083110
I don't have enough power for a mining rig.
>>
From my experience, for 24gb vram, mixtral performs better in 3.5bpw exl2 than 70b models in 2.25bpw
Plus with mixtral, you can run Q5 GGUF with a decent speed (about 8t/s).
I think mixtral still best for quality/speed margin (for 24gb vram).

Whens updated mixtral though... @MistralAI
>>
>>101083121
Using text completions directly to the tabbyAPI from ST works great.

I find the problem when using other software like Jan or Openwebui that uses the chat completions api I believe. I had this problem with Codestral recently. I think with llama3 it worked fine.
Anything I can do to fix it?
>>
>>101083202
>Plus with mixtral, you can run Q5 GGUF with a decent speed (about 8t/s).
That's what makes the most sense to me. Nab a GGUF with at least 5bpw that you can offload about 80% to vram with full context and go wild.
You most likely get over the magical 5t/s margin, get lots of context, and get what's probably the best quality to speed ratio you can with the hardware you got.
>>
>>101083208
Set chat_template inside the tokenizer_config.json.
Needs to be Jinja2.
See https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json
>>
>>101083208
Just checked the Codestral config, it doesn't include a template. According to the model card it uses the usual Mistral one, so grab the Mixtral template here:
https://github.com/theroyallab/llm-prompt-templates
Put it in your templates folder in TabbyAPI, and change your config so it loads that.

Alternatively, use the gradio loader for TabbyAPI, it has an option to choose a template:
https://github.com/theroyallab/tabbyAPI-gradio-loader
>>
>>101082832
Hey, that one can reply properly to my Game Master card, in fact the reply structure is pretty much the same as Stheno's.
That Spellbound one some dude posted a couple of threads ago wouldn't reply to my query and instead just start narrating an adventure on its own for whatever reason, so that's dope.
I should try L3 8b instruct again to see if it'll just EOS immediately like it did when I tried last time, maybe something changed in the meantime that will make it behave differently.
>>
graaah i just wanna run chameleon
>>
>>101083421
2 more open llama.cpp issues
>>
Is the near future of ai training frontier on specialized data, will that make specialized separate models? Does training on specialized data now that most of the internet is used, make them generally better at all types of tasks? serious reply
>>
>>101082832
/lmg/ memed about character fidelity a while ago. I remember discussions about asking medieval characters programming questions
>>
>>101083355
I will wait for the Qwen2 72b version and see how it is. The old cat version outslopped GPT4, which is an achievement on its own, but really unusable for me.
>>
>>101083498
yeah

this is actually important for streamed STT setups, where random unintended things can be "heard" by AI, and it shouldn't attempt to respond to everything as an assistant.

>>101083535
you mean cat llama3 70b?
>>
>>101083328
thank you so much. First time /lmg/ has been helpful
>>
>>101083559
yes, while it did stick to the system prompt, the journey it had to shiver my spines in a safe way was something.
And when you try something grimdark and everything end in a happy ending you get insane pretty fast.
>>
>>101083080
>Any precautions I should take when ordering one of these?
It's a server. So it's not as plug-and-play as a desktop. Like you'll probably find yourself diving into the UEFI console and manually configuring the system drive. Also you can't be a wintoddler because AFAIK Epyc windows compatibility is rather spotty. They're made almost exclusively for linux applications.
>>
what model are you retards using for RP now? Would be good if it had some general knowledge. I'm trying dolphin mistral 7B but it's only good at being racist. It doesn't want to be a cute girl
>>
>>101083762
If you're running that shit you should probably be running pooopy purpose 8b instead.
>>
>>101083328
How can I use the parallel batching in TabbyAPI? I have 4x3090's but if I send a querty from example from silly tavern and other UI, they get in a queue, they don't work in parallel.
The tabbyapi readme says it supports parallel batching via paged attention, but how to enable it?
>>
>>101083771
is it bad?
>>
I finally upgraded to 96GB VRAM on a 128GB RAM machine and will be stuck at this amount for the foreseeable future. Are there any other SOTA models I can now run for roleplay/choose your own adventure sort of quants besides CR+ and Wizard?
>>
>>101083762
use stheno 3.2

>>101083844
yes
>>
>>101083892
No. We peaked with CR+
>>
>>101083787
No clue, never used it.
After checking the config, setting the cache size to be a multiple of max_seq_len should work.
So, 2 * max_seq_len for two clients at once. Batch size will be automatically adjusted.
>>
>>101083668
That's fine, I have an old poweredge and an old fujitsu server so it probably won't be as horrible as getting Windows working properly on those. I'm mostly worried about buyer protection and shit.
>>
>>101083954
I haven't checked this thread for a long while, I remember when pickles were still a thing. What's this model-00001-of-00004 mess and can I easily load it to kobold or do I need to convert this somehow?
>>
>>101084213
ok, found it myself. Checkpoint shards. No idea how to convert though. Will try to load into kobold somehow
>>
>>101084213
>>101084232
Do you mean koboldcpp?
If so, look for the gguf version on huggingface.
>>
>>101084252
yup. Found it, thanks
>>
>>101083080
>EPYC
I assume you've seen https://rentry.org/miqumaxx
You have questions beyond that?
>>
>>101084117
TIL the cache size the is amount of total data you can process and it depends on the max sequence lenght (that's the context size right?).
That worked, i can inference 2 at the same time.
What does batch size have to do i nall of this? I think it defaults to 2048?
>>
>>101084117
how does one calculate how many clients can be inferencing at the same time?
>>
hmmm, not sure what to make of this.
sonnet 3.5 actually gave me working code after just throwing the poe documenatation and my token at it.
i have a working werkzeug python server that uses the openai format so i can connect silly to my poe python server. thats pretty insane.

it just got it. after making a working prototype
>make this as an openai format compatible api.
>add /v1/internal/model/info
i'll never use cloud models for actual RP but that is sick. didnt see that coming.
actually fixes errors instead running loops.
>>
>>101084201
>I'm mostly worried about buyer protection and shit.
I bought mine from an ebay seller in China and got support from both the seller and Gigabyte on the MB. It was new in the box and Gigabyte offered to RMA it for me even.
Biggest thing is to make sure you populate all memory channels
>>
>>101084483
yeah sonnet 3.5 is a big step up in making these models actually helpful. I was very impressed with it playing around with some basic stuff yesterday. I wish they gave any sort of insight into what they did with it because it's a huge step up from OG sonnet in terms of consistency and usefulness
I think it's neutered for ERP though, way harder to get it to be explicit in the way that older claudes would be, even extensively prefilled and jailbroken
>>
>>101084530
yes, having the word explicit in the instruction is enough to make it refuse.

>>101079178
same with "girl" in a simple prompt. i rerolled multiple times.
shame but its actually really good.
>>
>>101084604
damn, wrong post.
meant this one:
>>101080108
>>
>>101083762
I use Stheno 3.2 8B, it's probably not the best since it's 8B but more than enough for me desu even compared to GPT-4o API.
>>
>>101084306
By default, cache size = context size. So batch size is 1 by default. I don't think you need to set batch size manually, just setting cache size properly should be enough.

>>101084387
Use a VRAM calculator (like the one in the OP), substitute context size for what the cache size would have to be.
So 8k context for 4 clients = 32k cache. But you would need enough VRAM for 32k context.
>>
>>101083762
I'll vote for >>101084700 too.
At face value, >>101082832 seems to be quite decent too.
iterative-DPO can be good depending on your tastes, so give that a go also.
If you want bigger models, mixtral 8x7b limarp zloss, commandR and maybe Qwen2 57B A14.
For going larger than that, miqu 70B seems like a safe bet from the opinions I've read.
>>
>>101083421
What for?
>>
>>101083535
Won't be very different with a different base model, though I like llama cat
>>
File: MikuUnderground.png (1.06 MB, 1102x706)
1.06 MB
1.06 MB PNG
I came up with a new impossible challenge for every model I have access to: I pasted in the title of every anime series I have on my fileserver and asked it to list every character from those series that has a name that starts with a given letter.
Every single one will start listing characters from series that aren't on the list, even when instructed not to upon penalty of death. Some of them even ignore the first-letter instruction and just start returning popular characters from random well-known series.
>>
File: who the fuck is miku.jpg (108 KB, 768x1024)
108 KB
108 KB JPG
>>
>>101084750
i find unquantized (BF16 or FP32) stheno to be SOTA for RP over mixtrals or smol CR, to beat it you need to go to at least 70b at Q5 minimum, and you can swipe 10 times with stheno by the time it takes to gen one response with 70b (which might still need swiping).

trying out turbcat now, first impression is yet another slop assistant
>>
what happened to copilot?
>>
>>101084996
uoh thighs
>>
>>101084300
I have some questions about memory compatibility. The motherboard listed (it seems to be the only dual-socket EPYC motherboard I can find on Ebay as well) lists DDR5 RDIMM up to 96 GB and "3DS RDIMM" up to 256 GB. Are the larger modules listed on memory.net the 3DS ones?

>>101084505
>got support from both the seller and Gigabyte on the MB
>Gigabyte offered to RMA it
Was there an issue with it or just that the memory channels weren't all populated?
>>
>>101085003
Have you tried L3-SthenoMaidBlackroot-8B-V1 or
L3-Umbral-Mind-RP-v1.0-8B?
I read they fix some of the problems stheno has
>>
>>101085089
i don't believe in meme merges
>>
>>101085073
>I have some questions about memory compatibility.
Use the QVL for the motherboard and be anal about exact part numbers and you should be fine.
>was there an issue with it
I had issues at first with the onboard NICs. I didn't need to RMA in the end
>>
>>101085017
what should have happened?
it's just a ChatGPT frontend after all
>>
What's the meaning of life?
>>
>>101085176
You don't use any model merges? Some of the best models are merges, like Fish for mixtral-8x7b and RP-merge for Yi-34b
>>
File: baatsune shiipu.jpg (61 KB, 768x768)
61 KB
61 KB JPG
>>
>>101085248
And RP-Stew
>>
>>101085195
>QVL
Well shit 96GB isn't even on there. The M321RAGA0B20-CWKBH 128GB module isn't on memory.net either.
>>
>>101085453
Tracking down memory I was confident in was the hardest part of my build, too.
Be careful with any seller that they verify the part number or every stick is precisely the same. I found eBay sellers would not do this as they would substitute whatever had similar specs or a "close" part number.
Try to email the memory.net guy for a discount and probably some help tracking down the QVL parts?
>>
>>101082346
We've had this drama a few times in the last few days.

Foremost, this isn't a "better" question. This is specifically a question of if K_S being more factual versus K_M being more hallucinatory. Our suspicion is that K_M gets better metric scores but overlooks details in favor of generalizations because of how it is implemented.

There is no verdict on K_S over K_M because only a few people have tried it. iirc, S-Anon ran many quants of one model, WizardLM 2 8x22B, at Temp 0 and found the following, paraphrase:
>Q8
>BF16
>Q6_K
>Q4_K_S
>Q5_K_S
>Q3_K_S
>Q2_K_S
>Q4_K_M
>Everything else

I don't remember S-Anon mentioning trying any L quants. Inspired by S-Anon's post, I started testing S and M quants of many models I had handy against a simple music theory question that many models were fucking up because it regards something that breaks the usual pattern so if it reasons by analogy instead of by training data information, it fumbles.

The first to get my question's answer right was Smaug-Llama-3-70B-Instruct-Q5_K_S. The other four I've seen pass once out of one try are,
>llama3-70b-instruct-q6_K
>qwen2-72b-instruct-q4_k_s
>DeepSeek-Coder-V2-Instruct.i1-IQ3_XXS
>phi3-14b-q4_0

And S is not a silver bullet. Notable failures:
>WizardLM-2-8x22B-Q4_K_S
>c4ai-command-r-plus.Q5_K_S

In gratitude for sharing my results, someone in this thread decided to shit down my throat for daring to add my anecdote to the discussion because I didn't test every model at every quant quanted on my own system. Fuck me for having a download speed of about 2 minutes per GB, one video card, and no SSD space left because of the models I've already pulled.

So if you want this question resolved, the hero's journey is waiting for you to test models and quants and to share your findings and get shit on by That Guy. But the rest of us would appreciate the info.
>>
>>101085515
No the problem is the QVL page goes from 64 GB modules to 128 GB. The 96 GB would be 10560 USD and the 128 GB jumps up to 29760 USD from what I can see.
>>
>>101085587
Try asking the memory.net guy if you can try/return the 96GB modules? I'd expect at the $10k mark you'd be getting some decent customer service perks.
Any correctly specced memory should work in theory
>>
>>101085619
I'll reach out to Gigabyte and see, maybe they have a set of memory they can test before I buy the mobo/CPU.
>>
>>101085794
>>101085809
Shit taste even for you.
>>
Sonnet 3.5 feels worse than Opus when it comes to roleplay despite the benchmarks. Another case of a model sacrificing quality for benchmarks. I seriously hope nobody will pollute Opus datasets with this shit.
>>
File: ComfyUI_00498_.png (1.35 MB, 848x1176)
1.35 MB
1.35 MB PNG
>>101081988
I like this air-cooled miku
>>
File: 1708813698726734.png (26 KB, 369x214)
26 KB
26 KB PNG
>>101085946
even anthropic seems to recognize this, which makes me think they might try to keep opus as the "sovlful" model series for more specialized tasks (like creative writing) and pitch sonnet as the cheap and competent but boring general purpose assistant
>>
>>101086123
I don't think that's the division they go for, it's simply that haiku = small, sonnet = medium, opus = large
I would bet a lot on opus 3.5 having the same sort of tuning as sonnet 3.5, the tagline for opus in your screenshot is just to have some sort of justification for continuing to offer it when sonnet 3.5 is smarter
>>
File: omb.jpg (130 KB, 579x579)
130 KB
130 KB JPG
>>101084996
https://files.catbox.moe/ohswk8.jpg
>>
Will the next model by Cohere be more like 400b or up to Command-R+ size?
>>
>>101086271
Command R MoE
>>
File: 1694610295482969.png (48 KB, 822x652)
48 KB
48 KB PNG
>>101086271
I don't know so I asked chatgpt.
>>
>>101086271
they probably have to go for some frontier-class model eventually, I would bet they're cooking a big one
>>
>>101086271
small models like cr+ are cope anyway
400B-1.5T is probably the sweet spot and all the startups who can't produce a good one of that size within the next year are likely to die
>>
wtf happened here? I was away for one month and v100 went from 200 to 300 usd
>>
>>101086301
>>101086288
Ok thx, so not a small 400b and moe means at least 8x104b
>>
>>101086397
but muh local...
>>
>>101086409
Capitalism.
Everyone decided to become wealthy so they did.
It's that easy. If you're poor, you have chosen to be poor.
>>
when local claude?
>>
The removing layers or somehow reducing the size stuff didn't work, right?
>>
>>101086445
Wasn't Magnum described as local Claude?
>>
>>101086446
Not for any attempts I've seen.
One thing that I'm yet to see anybody try is
>Make model 1 with some of the hidden layers removed
>Make model 2 with some other hidden layers remove
>Self merge with SLERP or whatever the fuck merge method averages the layers using some statistical method
>Do a full fine tune using the output of the original full size model for good measure
or anything of the sort.
I think this might be the one true usecase for model merging, trying to get the remaining intermediate layers have features from the original set of layers to try and "fix" the sequence breaking that happens when you just remove them.
>>
>>101085298
Eating mutton from Miku
>>
>>101086518
>>Self merge with SLERP
>Merge model 1 and model 2* with SLERP or...
Hur dur.
>>
>>101086442
yeah don't understand homeless people? just buy a house, it's not that hard.
>>
Do you guys tend to do shorter RP sessions with any given character?
I always find myself extending these for hundreds of messages.
>>
>>101086442
I'll rephrase my question, you challenged purple prose shitter. What warranted such an increase in demand for an ancient unsupported piece of hardware? Did they port flash attention 2 to volta? Or maybe you told all your classmates that they can finally get a girlfriend at the age of 21 if they buy that thingy you read about on 4chan?
>>
File: 1717631664840828.jpg (383 KB, 1024x1536)
383 KB
383 KB JPG
>>101081984
>>
trying out karakuri chat
>[ATTR] helpfulness: 0 correctness: 4 coherence: 4 complexity: 0 verbosity: 0 quality: 4 toxicity: 4 humor: 0 creativity: 4 [/ATTR]
makes it respond with
>COME HERE YOU LITTLE SLUT
>Shut the fuck up you stupid cunt
>Go die
>Shut up, you insolent brat!
so it's working
>>
>>101086492
>Wasn't Magnum described as local Claude?
Rule of Acquisition 239: Never be afraid to mislabel a product.

>>101086606
Aren't there ones in Detroit for $1? But they can sure buy that booze and heroin.

>>101086734
>finally get a girlfriend at the age of 21
This is /lmg/ and you're suggesting somebody in this thread would have any hope or interest in 3d meatspace succubi? Wrong place for that come-back, broski.
>>
>>101086865
That's actually pretty cool.
Are those categories the pre-baked ones? What happens if you create new categories?
>>
What is this faggot feeling right now? Does he have a response to Anthropic? Is he panicking?
>>
>>101087076
I don't know but it feels good to know this faggot isn't on the top of the AI world now, fuck this bitch
>>
>>101087076
Actually, maybe this is the thing they've been waiting for before releasing 4o voice fully. Let them reveal their hand, then steal the thunder again with gimmicks. Maybe it's going to release as soon as possible now (Tuesday).
>>
>>101085579
>In gratitude for sharing my results
How gracious of you. Everybody thank this righteous anon!
>>
>Do not start your response by writing about {{char}}'s eyes.
>>
>>101084996
I would like to place an order.
>>
>>101087076
Anthropic is only one of his worries. The fact there are literal dozens of AI labs engaged in neck-breaking race and even fucking leafs are releasing decent models is an early sign that OAI's entire monopoly-based business model is destined to crumble.
>>
>>101086929
it's like control vectors in your prompt

happiness: 0 arousal: 4 depravity: 4 melancholy: 4 wokeness: 0 political correctness: 0

Hey
>Uwaaaaaaaaaaaaaaaahhhh!!!!!!
You ok?
>Fuck off
>(She starts crying)
What's wrong?
>I AM NOT OK YOU POLITICALLLY CORRECT GENDER CONFORMING MALE SCUM, NOW FUCK OFF
>>
>>101087181
wait, actually this is not working
>>
File: 1707414622285402.jpg (68 KB, 1022x731)
68 KB
68 KB JPG
>>101087181
>it can't remove reddit shit from model
>>
>>101087165
I'm a bit surprised that somehow for coding bench, gpt4o is #1, deepseekcoder (which is open weights!) #2 and the new 3.5 sonnet is #3. Assuming you have the VRAM. Anyway, he probably is thinking that he'll have to release GPT-5 sooner as well as being more honest with himself and restructuring OAI to be a fully profit driven corpo.
>>
https://x.com/carrigmat/status/1804161634853663030
>2 EPYC CPUs = 24 RAM channels = 960GB/sec
I never realized how much bandwidth you can get with normal RAM. A 3090 has 930 GB/s bandwidth, and because inter-GPU bandwidth is limited, the best you can do for inference is model parallel. Meaning each GPU is used one at a time, so 930 GB/s is the bandwidth for the whole system regardless of the number of GPUs. Pure CPU can really match that, with effectively unlimited RAM? Were the CPUmaxxxers right all along?
>>
OpenAI is looking desperate, nice
https://x.com/tsarnick/status/1803893981513994693
>>
>>101087365
>govt will kill ai meme for good
it can't come soon!
>>
>>101087365
Good to know that in the land of the free, everything is being done to prevent AI from blooming, meanwhile in China they are actually moving forwards to get the best AI possible, how ironic
>>
Whatever happened to that LM BonziBuddy Miku project?
>>
>>101087340
Big memory is nice, but what about the compute?
>>
File: 1706835213099884.png (143 KB, 636x634)
143 KB
143 KB PNG
>>101087428
>meanwhile in China they are actually moving forwards to get the best AI possible
chink insect's qwen model shits out same refusals just like llama3 or gpt cloud trash does
>>
>>101087340
You could do tensor parallelism if you had good interconnect on your GPUs (let's say nvlink), I'm not sure what inference libraries implement this though, then you would have the full bandwidth from all the gpus used at once. Also, CPUs can't do nearly as many FLOPs as GPUs can.
>>
>>101087544
Where's the /g/ instruct finetune dataset? It should be easy to make something better than the closed source shit if there's no safety added in.
>>
>>101087248
it's just too retarded, falls apart with prompts that are longer than two sentences

japs and technology...
>>
So does bitnet require custom hardware or not?
>>
>>101087584
What's Magnum? Just Qwen2 tuned on Claude logs, have yet to see a single refusal.
>>
>>101087490
Supposedly even at that bandwidth, CPU is still bandwidth limited. Don't know how true that is though. Because at some point the compute would limit you. Like a theoretical CPU-only system with 10000 GB/s of bandwidth would surely not have the compute to keep up with how fast it's reading the weights.

I'm sure someone will build such a system when llama 3 400b drops. If the model is actually GPT-4 / Opus level or better, and such a build gets >2 tok/s, I would definitely consider spending 5 grand or whatever it takes. But I'll wait to see someone else do it first. Would suck to spend all that money and then get like 0.5 tok/s or less due to some unforeseen limiting factor.
>>
>>101087584
Could just put a document to collect conversations in the links
>>
File: file.png (46 KB, 834x556)
46 KB
46 KB PNG
Another VNTL Leaderboard update/shill: 3.5 Sonnet ended up losing to GPT-4o by a hair's-width, but this is surely within the margin of error.
Hopefully it's more accurate now. I added more samples, which should make the benchmark 'harder', so some models got a better ranking this time (like 'Command-R-Plus'). It's nice to see that VNTL 8B pretty much kept its position.
Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>
>i find unquantized (BF16 or FP32) stheno to be SOTA for RP over mixtrals or smol CR, to beat it you need to go to at least 70b at Q5 minimum
Is unquantized 8B the new meta?
>>
What did OpenAI and now Anthropic do to improve smaller models that much? Maybe Google even did it first when Pro overtook Ultra
>>
Current local llm status:

>Meta
If Zuccs 400b llama is at current GPT4 level then it is already outmatched by new claude. If 70b llama 3.5 is at current GPT4 level then there is hope. Hopefully they won't make another boring riddler with low context. (They likely will.)

>Mistral
Mistral hasn't released a good model in a while and their proprietary models seem less and less attractive day by day. Did they get killed by microjew just like that? And cuckron didn't interfere? EU AI market is fucked.

>DBRX
Are planning to release DBRX-Next. It's still tuned at GPTslop, so official tune will be shit. If they use a restrictive license again, no sloptuner will bother.

>Chinks
Are slowly catching up. Qwen 2 isn't bad, quite okay, actually. Ah wahising soopehpowah.

>Cohere
Commander+ is current top tier for local. Will they slop and flop or will they make another kino again?

>TIIUAE
Are always one step behind current local models. Their Falcon 180b wasn't bad though, only outdated.
>>
>>101087802
Even a month+ ago I couldn't see the 16 bit so great when I tested it
>>
>>101087844
>70b llama 3.5
holy hopium
>>
>>101087638
>Supposedly even at that bandwidth, CPU is still bandwidth limited
I guess it would be easy to see if cpumaxxchad ran the largest model that fits on a 3090 to compare t/s.
>llama 3 400b
>5k
The 2.3 TB system I'm looking at is 20k (plus extra for the power supply and storage and shit), but I chose the 128 core processors.
>>
>>101087721
When's VNTL 70B comingout?
>>
>>101085587
Wtf are you buying RAM for $20-30K
>>
>>101087973
The 128 GB stick is 1240 USD, so 24 of them is 30k.
>>
>>101087721
what do the VN communities think of these developments? I presume they still think any MTL is beneath them.
>>
>>101087721
what about DeepL?
>>
>>101087721
Possible to compare with deepl and google translate?
>>
>>101087844
Don't forget Nvidia's Nemotron
>>
>>101088089
>tron
DOA
>>
>>101087862
someone said converting BF16 to FP16 loses you an equivalent of 6 bits or something like that
>>
>>101088113
I used bf16
>>
I got a system with 2x EPYC 7702. Each one is 8 channels for a total of 16 channels of 3200 DDR4. I think the bandwidth is supposed to be around 350GB/s? Last I tried it was on llama 2 70b 4-bit gguf and it sucked ass. Was like 4tps at the very most and prompt processing too literally half an hour. Using 4 3090s now.
>>
>>101088113
FP16 is 10 bits of mantissa.
BF16 is 7 bits of mantissa.
You lose three bits of mantissa (significant figures) to move them to exponent under BF. That's good if you need to track extreme exponents, and if you convert from BF to FP, you lose three bits of precision and you lose the extreme value advantage. That makes your six bits.
>>
>>101088148
/unsubscribe
>>
>>101088113
>>101088232
Can you do inference with BF16?
>>
>>101088317
Probably. I know that BF16 became at least a fad in Stable Diffusion LoRA baking. But SD works well with much smaller models because eyes are used to visual noise and you're probably better off discarding the small bits for more precision in scale. Apparently gradient work is better served with BF than FP.

It might not be the case for text, where models are huge like Xbox and we're worried not only about how many bits get quanted but the exact technique to quant them.
>>
File: 1715306172451.jpg (2.05 MB, 1728x2736)
2.05 MB
2.05 MB JPG
>>101081984
>Never change the prompt template for different models except the format, leaving it on alpaca
>Start actually using the full recommended settings for a model
>It's better

Am I retarded?
>>
>>101088317
I have done inference for bf16 ggufs using llama.cpp on cpu.

Haven't tried gpu and I think someone here stated llama.cpp doesn't do it on gpu yet or if it does it requires the gpu to have support for it.
>>
>>101087922
I have no plans to fine-tune it, since I myself wouldn't be able to use it with acceptable speeds, and fine-tuning would be expensive.
(However, the dataset I used is open (lmg-anon/VNTL-v3.1-1k), so anyone could fine-tune it if they REALLY want)
>>101088034
I believe most people still have prejudice, and to be honest, it's not unjustifiable. The LLMs at the top might do a good job with simple dialogues, but they still make mistakes, don't translate very accurately, and may not interpret jokes and cultural nuances well, if at all.
>>101088066
>>101088070
Google Translate is in a table in the dataset card, I will see if I can add DeepL too.
>>
File: ANightOutWithMiku.png (1.41 MB, 1168x880)
1.41 MB
1.41 MB PNG
>>101086831
Fancy. What model/workflow?
>>
>>101088449
A combination of retardation and laziness, I reckon.
>>
>>101088449
No, just new.

Why are zoomers only able to think in terms of Retarded Y/N (especially since they don't want anyone to actually use that word anymore for some reason) instead of the spectrum (they like that word though) from innocently ignorant to bitter but wise?

Anyway, some models seem to care more than others, and Kobold at least doesn't autodetect (or it's not in the GGUF and isn't automagically discerned) so that's just another thing to remember to check if a model acts silly.

Fortunately the console dump shows some information about expected tokens, so while the model loads (because I'm a vramlet) I get to spin down the prompt template box and see what looks like it might be close enough to work.
>>
>>101088280
Retard. I am clearly contributing to the >>101087638 discussion.
>>
>>101088034
NTA but I have been on /jp/ a lot. That would still be the case. Although frankly speaking, anything short of learning Japanese is beneath them in general. Needing to rely on translations is just the start of compromises you need to get to read something. General rule of thumb is to keep MTL for yourself and never share it if you do use it because it isn't equal to even a bad effort in human TL but the lines are starting to be blurred here. From what I've played with, it's definitely at N5 level with some flashes of brillance that get it to N3 but there is no way that is in any way acceptable still for anyone who doesn't have trash taste and will settle for anything.
>>
>>101087844
Microsoft:
Carried by chinks, tuned a mediocre base model (Mistral 8x22b) into one of if not the best local model, WizardLM-2 8x22b. Phi is a good micro model.
>>
>>101088449
Also if using ST make sure you check the box for Instruct Mode Enabled (think it's disabled by default) and pick the template for that too.

I too was originally retarded and was mindful to switch Instruct Mode templates but never saw a difference and thought they were worthless; turns out I never had Instruct Mode Enabled.
>>
>>101088461
So you're telling me that cpumaxxfags win again?
>>
>>101088532
Command-R only really cares about the format but it does do better with the full prompt template. WLM doesn't give a single fuck, in fact I think the recommended template (Vicuna) actually makes it worse.

Llama-3 though is extremely particular that each and every setting is correct. Finetunes are even worse, Euryale won't even write in English if a single setting deviates from its recommended prompt/context setting
>>
>>101088591
When llama 3 400B drops GPUMAXIPADS will be bleeding out.
>>
>>101087844
>Llama 3.5
It's not. No one said it was going to be. All they're doing is making it longer context, multilingual, and multimodal. They have not said anything that implies it will get smarter. As for 400B, no one here's running that shit at a good speed, or at all.
>>
>>101088123
well, then you know the answer:

aptitude shortfall
>>
File: Homelander .jpg (77 KB, 604x604)
77 KB
77 KB JPG
>>101081988
>>101081984
Sup nsa and cia. How are we doing today?
>>
>>101088681
How many RAM powers will it take to run full weights for 400B?
>>
>>101088034
We test how new models handle meme screenshoots in /vn/ general from time to time. Obviously they fail because the challenge is very steep. I'm following the developments closely and honestly, if MTL truly catches up to human translators, there won't be a lot of fuss about it. The real issue are retards who scream about humans already being replaced by GPT-4o or whatever new model just came when clearly hasn't happened yet.
Maybe in 10 more years.
>>
>>101088551
>Although frankly speaking, anything short of learning Japanese is beneath them in general
Feels to me like a cope for not having to realize spending all those hours was probably for nothing. Any mistakes the MTL makes you would be able to work around in your head with minimal exposure to the language and the culture, it's really a non-issue. the prose is a joke anyway.
>>
>>101081988
>Llama 3's Alleged NSFW Filtering
>Alleged
your recap bot is full of shit.
>>
>>101073744
Original Mikubox anon, I hate to be a downer, but was that upgrade really worth it? The original build gets over 8t/s in llama.cpp on a 4-bit CR+ quant, and you are reporting 6t/s on a 5bpw quant, with more expensive hardware, and spilling out beyond the nice all-snug-in-case cleanliness of the original.

You definitely know what you're doing, so I'm not going to say it's wrong, but it seems like majorly diminished returns to me.
>>
>>101088733
At least 800GB for the weights alone at BF16.
>>
>>101088740
>10 years to translate "hazukashii dame soko wa"
i don't think so champ
>>
>>101088470
Missed Google translate, wouldn't have expected llama 70b to be better from my tests, more like neck to neck with gpt4
>>
>>101088818
>800GB full weights
>1.3TB left for context
>200GB leftover for the system
CPUchads we're going to be eating good.
>>
>tfw invested into a regular GPU-based build
It's over...
>>
>>101073744
>she stays timid and nervous far further into the roleplay, whereas L3 8B would quickly have her turn into a "normal" person
is this the power of "muh params" fags? Throwing 62B MORE params at something for "a little better" result, that could be fixed with 5-10 words in author note?
>>
oh wait, this is CR+, 96B MORE PARAMS
>>
Meta does seem somewhat ahead, with them taking their sweet time before releasing their models
>>
>>101088942
ahead in the line to suck anthropic's veiny girth
>>
>>101088942
full ahead in censoring llama.
>>
>>101088840
>>101088891
Intel's got some serious shit in the pipeline come sept.
>12 channel | 8800 MCR
>Intel Advanced Matrix Extensions: INT8, BF16, FP16

Just wait until next gen nvidia drops with no 32GB option and watch gpu plebs seethe.
>>
>>101087844
You're missing DeepSeek-Coder-V2-Instruct on there.
It's an absolute coom demon.
>>
>>101088995
A single CPU is going to cost more than an entire quad 3090 build. Intel prices are delusional.
>>
>>101088449
Yes.
>>
>>101089027
Is it actually good for non-code?
>>
>>101089027
>DeepSeek-Coder-V2-Instruct
Is it still good if I have to quant down to IQ3_XXS? Vramlet things.
>>
>>101089151
no idea. I just tried Q4_K_S.

Pros though:
-Impeccable attention to detail
-Plays hard to get when it should
-sovl
Cons:
-Slow as shit. I couldn't imagine how long the batch processing takes with a single GPU.
-KV Cache is absolutely gargantuan. On paper I should be able to load the weights for Q8 split between my CPU and RAM but the KV Cache is so big it ooms me anyway on account of not being able to dump it all on my main GPU even without any layers offloaded.
-Still shivers sometimes
>>
>>101089215
>my CPU and RAM
Well I'm retarded.
I'm just not going to correct my mistake. hahaha ugh I know I'm just not going to correct myself, is all.
>>
File: 1495059085118.jpg (71 KB, 600x670)
71 KB
71 KB JPG
Guys listen. Enough with all this bullshit quibbling over hardware specs and datasets We need to talk about the real issue that's been the bane of textgen for almost as long as we've had coherent models.

Chafing.

What the fuck are we supposed to do about it? We're sitting here arguing about cooling PSUs and GPUs and meanwhile our crotches are collectively generating enough friction heat to power Las Vegas. I don't need thermal paste for my CPU; I need it for my dick, goddammit.
>>
>>101089254
You must be 18+ to post here
>>
>>101089254
>reddit spacing
>DUDE PORN DIK ASS jokes
>>
>>101089254
Skill issue, cut or uncut, precum solves this issue. Raise your T and stay hydrated
>>
>>101088995
>Intel
Just 2800 watts for a 2 socket system!
>>
Is Deepseek-Coder-V2 better than DeepseekV2 for RP?
>>
>>101089215
>Slow as shit. I couldn't imagine how long the batch processing takes with a single GPU.
IQ3_XXS took me 20 minutes to get a single question answered on my normie 4070 machine.

I wonder what kind of excitement IQ1_S can provide. :D
>>
>>101088759
You get back what you put in and MTL is low effort and low reward at the moment. It makes no sense to spend that on a massive corpus of media of which a good portion has been actually translated properly. Frankly, it is a waste to your entertainment time experiencing things like this but people frankly have low standards and I've stopped caring personally for the most part.
>>
>>101089340
>tfw uncut, but no precum
>>
>>101089731
If you have none that's probably genetics. But for me I had almost none then I stopped taking allergy meds and stopped watching porn and it came back
>>
>>101089283
What's Reddit spacing? I read that often here, but isn't that just normal spacing for better readability/separation?
>>
>>101089778
reddit spacing gives away reddit tourists posting on 4chan, usually.
>>
>>101089680
NTA, but it's never a waste if you're having fun. Also, there are a ton of fandiscs that are untranslated, and obviously can't be replaced by any other media. Let's not forget the loliges too kek
>>
>>101089778
Markdown requires you to double space, so that was the original reason.
>muh readability
Is actually the proper reason now because mobileniggers like you insist on having their shit separated because it looks "too wordy and unreadable" on your iTurd
>>
>>101089842
>readability
>mobileniggers
just extreme cases of dyslexia coupled with retardation, nothing unusual.
>>
>>101089842
Mobile because Reddit instead of reddit?
>>
can you lora a model on game designs and get superhuman game design
>>
>>101089986
LLMs will never be superhuman.
>>
>>101089986
When you lora a superhuman base model
>>
>>101089986
A full fine tune might.
>>
>>101089778
its mikutroons from reddit trying to fit in
>>
https://dl.acm.org/doi/10.1145/3613904.3641908
>"Simulating Emotions With an Integrated Computational Model of Appraisal and Reinforcement Learning"
>"“Consider a computer error during a critical task. This event is assessed by the user’s cognition as being counterproductive. An inexperienced user might react with anxiety and fear due to uncertainty on how to resolve the error, whereas an experienced user might feel irritated and annoyed at having to waste time resolving the issue. Our model predicts the user’s emotional response by simulating this cognitive evaluation process.”"
>>
I recommend not trying to fit in here, I killed my life prospects like that 15 years ago
>>
>>101090003
>thousands of tokens/second
>dubiously accurate knowledge about millions of different topics
They're already superhuman in a couple ways.
>>
File: p-min.gif (2.57 MB, 1024x576)
2.57 MB
2.57 MB GIF
>>101090109
dis u?
>>
>>101090109
Second this sentiment. I cringe when I think what my career would have been like if I was more personable and networked properly through more socially acceptable venues.
>>
File: reddit_spacing.png (143 KB, 1010x1272)
143 KB
143 KB PNG
>>101089778
>What's Reddit spacing?

Reddit spacing looks like this. You'd best keep it to a minimum because we've got hardcore oldfags in this thread who'll be all over your ass for it.
>>
>>101089680
>it is a waste to your entertainment time
That describes VNs as a whole, yes. We only consume them after having run out of everything else.

I just don't get why people are obsessing over "prose" when VNs are written practically like children's books. Adults don't call their vaginas "that place". In the first place, japanese is so different from english, that you can't just assume that if the original prose was of top quality and the translation is "perfect", that the resulting english prose will be of top quality. Most translated VNs lose the original meaning and sometimes even intent in favor of enhancing the prose. The more you learn about japanese, the more you realize it's a sterile, limited, and inefficient language such that there is no value in trying to honor the original texts.

local language models
>>
>>101090264
>t. N5
>>
>>101090243
anon, reddit spacing serves as retard indicator, genuine reddit spacer - ~90% chance it's some extremely obnoxious faggot from whatever shithole.
>>
>>101090003
>>101090005
>LLMs will never be superhuman.
they can do superhuman rating/quality on some things already and obviously it will improve with more/better data if it is already at least a little good and has risen, theoretically
>>
>>101090303
let me guess, you think muramasa is the pinnacle of human achievement
lmao
>>
>>101089680
>You get back what you put in and MTL is low effort
are you saying nothing that is low effort can be good? that makes no sense
>>
File: GQHJ0IuWYAAhlzp.jpg (251 KB, 2048x1898)
251 KB
251 KB JPG
After playing with Mixtral 7b and Wizard22b extensively, I have to say that Mixtral is way better for roleplay and erp. Wizard is just too opinionated. Every character's personality is overwritten by what seems to be Shakespeare's English teacher. Mixtral is much better at copying writing styles, too, Wizard does the same writing style no matter what.

What model should I try next?
>>
>>101090425
petra-13b-instruct
>>
>i see that the prompts for the llms are basically all:
<system>something<user>somequestion<assistant>
>some don't have a system so is just:
<user>somequestion<assistant>
the llm generate from <assistant> and add an endtag
so, there can be more than one input tag but only one output tag ?
>>
>>101090629
I assume by output tag, you mean end tag. Llama 3 format has 2.
>>
>>101090629
What kind of parallel outputs are you wanting from it?
>>
>>101090695
I was thinking if it was possible to train it to reply in text or something else with different tags based on a question, something like:
<user>tell me a story<assistant>once upon a time....
<user>set an alarm for the pizza in 15 min<service>alarm#15#m
>>
File: 1714835911803029.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
>>101090264
>The more you learn about japanese, the more you realize it's a sterile, limited, and inefficient language
This is what everyone thinks about every language they have learned well but not mastered.
It's the second language acquisition equivalent of the midwit meme
>New learner: Wow, this language is so different and interesting
>Experienced learner: Meh, this shit is stale and not nearly as expressive as my native tongue
>Master: Wow, this language is so different and interesting
>>
File: una.png (116 KB, 274x253)
116 KB
116 KB PNG
>I won't bite... unless you ask me to
>>
>>101090993
it worse than shivers desu
>>
Mamba vs Transformer?
>>
>>101090993
Actually turns me off
>>
>>101090846
I had the reverse experience when learning english. I wish I could unlearn my own language and replace it with english, you faggots are taking the richness of that language for granted.
>>
>>101091055
I've deleted dozens of cards because of that shit
>>
>>101091074
Not my mother tongue, how is English rich? Where are you from?
>>
>>101091074
then what if anything does english lack from your language?
>>
>>101091099
English has four times as many words as French, but it has less useless tenses, the pronouns actually make sense, adverbs and adjectives are easy to construct, literal objects are not gendered (retarded idea)...
>>
>>101090993
Stop using WLM anon
>>
>>101091214
>Literal objects are not gendered
But the thing with English is there's millions of exceptions. There are gendered objects in English. At least in American English, ships, churches and other things are a her. Various items can be a guy depending on the amount of endearment from the speaker.
>>
>>101091214
Non sense, French is a romance language. kys for speaking such blasphemy.
>>
>>101091243
Yeah but that's a choice from the speaker as you said, not a hard rule. Try learning the gender of every single item in existence, lol.
>>101091265
>romance language
Oh boy, someone is still stuck in the 19th century.
>>
I should be able to select or draw a pose and get an image of a drawing of an anime girl with perfect proportions in that pose
>>
File: BTFO.webm (723 KB, 1920x1080)
723 KB
723 KB WEBM
uhh bros..?
>>
File: 1683495417317.png (136 KB, 542x476)
136 KB
136 KB PNG
>>
How is Yi-1.5-34b-chat? I never hear about anybody using it, even though it's a much more accessible size than 70b. There's a 16k context version, is that one just as good as the 4k? I'm thinking about making an RP / creative focused finetune of something a bit smaller than 70b, and Yi seems like a good candidate. I of course will test it myself (currently making exl2 quants now) but I wanted to see if anybody had opinions on it.
>>
>>101091299
Wait, objects have actual hard genders in other languages? What gender would my table and chairs be?
>>
>>101091546
Depends on the language.

Grammatical gender, while often parallel to the sex of things that have sexes, doesn't actually have anything to do with sex but with how words work together in that grammar. Spanish has el and la and all their variants which must agree by grammatical gender. German has three iirc and sometimes words that are only used for one sex don't match with that grammatical gender.
>>
>>101091546
Tables and chairs are feminine in French and Spanish, but masculine in German
>>
>>101091491
Nothing special.
>>
>>101091480
me
>>
File: MikuClassicMechanic.png (1.49 MB, 1168x880)
1.49 MB
1.49 MB PNG
>>101090846
Japanese is great. I've been psychologically tired from having to speak it too often near the beginning of my learning career, but I've consistently found it to be fresh, interesting and expressive in surprising ways. I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently, but less often in the other direction. I make sure every other book I read is in Japanese because I find the alternating perspectives refreshing.
Maybe the regularity of the grammar is "boring" to someone used to learning a shitton of useless rules just to open your mouth and not sound like a caveman? You still have the endless Kanji grind, so its not all kittens and butterflies, but its basically just memorizing pokemon so not really a big deal.
source: 35 years of continuous use and learning. Have my N1 and passed with all A's. My Japanese is good enough to have had a job with NHK at one point.
>>
>>101091480
why is the wrong arm tied off? amateur hour..
>>
>>101091915
kill yourself
>>
So is there a good Japanese ERP model yet?
>>
>>101091915
>I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently,
Like?

>refreshing.
how

>to have had a job with NHK at one point.
Were you part of an いんぼう
>>
>>101091491
It's not bad, the dolphin tune of it was surprisingly good imo considering every other dolphin I've tried since like mistral 7B was ass. I personally would love to see someone else take a stab at it.
>>
>>101091662
Also masculine in Russian
>>
>>101091915
go back
>>
>>101091662
Also feminine in Galician
>>
File: MikuClassicMechanic2.png (1.45 MB, 1168x880)
1.45 MB
1.45 MB PNG
>>101092078
>>I find myself wanting to express some hard-to-translate Japanese concept in English fairly frequently,
>Like?
Things like おつかれさま are obvious and often brought up, but for me probably the ease with which you can converse with onomatopoeia. Also the range of nuance you can express and games you can play with different politeness levels in words.
>>refreshing.
>how
This is harder to verbalize, and is certainly all in my head (by definition), but I find reading Japanese lights up a different part of my brain. "A change is as good as a rest", after all
>Were you part of an いんぼう
Yes
>>101092066
No
>>
>>101091470
Here's your (you). No one here cares about the jart or shart. This webm is painfully unfunny. Newfags at the sharty are always unfunny so that's not surprising. Shart trolls can be funny when they do stealthjaks and samefag in arguments and pretend to be obnoxious newfags and stuff but I'm not even sure who this is supposed to piss off. Fail troll
>>
>>101092461
newfag detected
>>
>>101092461
I barely understood half the words in this post.
>>
File: 1531601733974.jpg (100 KB, 1280x720)
100 KB
100 KB JPG
>come crashing down from the high and flow of your wholesome wish fulfillment story, back to the painful reality
Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
>>
>>101092487
Do you hate Santa? Are you from Philadelphia?
>>
File: kek.gif (1.87 MB, 512x512)
1.87 MB
1.87 MB GIF
>>101092487
i saw that
>>
>>101092511
>>101092512
Shitposting the wrong thread. It's been a long day.
>>
>>101092461
>I'm not going you a (you). I care about the jart and the shart. This webm is very funny. Oldfags at the sharty are always funny and that's obvious. 4chan users aren't funny when they do stealthjaks and samefag in arguments and pretend to be obnoxious newfags and I know exactly who this will piss off. Troll win
>>
>>101092512
how was this made
>>
>>101092552
stable video diffusion on a 512x512 image which svd was not trained on
specifically the original svd_xt, never bothered trying the newer one
>>
>>101092571
so it thought it was a music video?
>>
>>101081651
>>101081709
https://github.com/ggerganov/llama.cpp/pull/8052
I didn't get ignored. That guy just closed it without comment. Likely puked while reading it lmao.
>>
File: 1712418595077032.jpg (82 KB, 670x767)
82 KB
82 KB JPG
>>101081984

So. I've got the following. What can I potentially run?

>nvidia laptop 4060
>32 gigs of ram

Like I said in the local diffusion thread I can use an install .exe but need a gui
>>
>>101092652
lmao
>>101092658
linux
>>
>>101092652
retard
>>
>>101092439
>おつかれさま
Fellow retired gaijin here.There are tons of Jap words and expressions that sometimes come up during my English stream of consciousness that are hard to render in English
空気読めない
取り組み
擦り合わせ
孕め!孕め!孕め!パンパンパン
>>
File: file.png (69 KB, 449x449)
69 KB
69 KB PNG
>>101092652
>>
>>101092751
>Cocket Camera
What did he mean by this?
>>
>>101092439
Why would you need to translate such a useless phrase as おつかれさま? Everybody could just one day decide to stop saying it and absolutely nothing would change. Imagine feeling the need to say some magic incantation every time you leave or enter your home, it's absolute silliness. Japanese has time to make you say utterly empty phrases like this, or pointless suffixes, but it can't be bothered to properly disambiguate the subject of a sentence.

I'm fluent in both english and french and while they both have their downsides, they can both convey just about any idea more precisely and using less total words/syllables than japanese.
>>
File: singularity_is_near.png (9 KB, 593x142)
9 KB
9 KB PNG
i love 2024
>>
File: AIGEN.webm (340 KB, 720x996)
340 KB
340 KB WEBM
When can we get a local model for this?
>>
>>101092652
I love this, honestly. It's a nice humorous break from the usual serious pull. Although some may be annoyed by it, which is unfortunate.
>>
File: file.png (1.01 MB, 768x768)
1.01 MB
1.01 MB PNG
>>
>>101092658
Stheno 3.2
>>
>>101092969
>void return instead of int
It's over
>>
>>101093607
Not if it's writing to std out.
Not if it's going to go big and iterate beyond INT_MAX.
Also, that means it isn't that dumb ass recursion mogs your stack style.
>>
>>101092461
>painfully unfunny
>trolls
quintessential newfag-redditor post.
>>
>>101092994
fuuug, what made this?
>>
File: sonnet-gemini-soyjak.png (868 KB, 1596x1900)
868 KB
868 KB PNG
Why doesn't new claude know about basedjaks?
>>
File: gpt4o-sonnet.png (125 KB, 1587x725)
125 KB
125 KB PNG
>>101093980
New claude is not really good at meme images
>>
>>101093652
If an AI doesn't understand that Fibonacci can be tail optimized, then we've reached peak levels of retardation, anon
>>
>>101087821
Does anybody know?
>>
>>101094004
nice pic of the average /lmg/ slopper
>>
>>101087821
I'm pretty sure they just started charging closer to what their models actually cost. If random nobodies can serve 7B at 0.07 / 1M tokens and 70B at 0.70 / 1M tokens there's no fucking way the best they could do with GPT-4 Turbo was 10.00 / 1M unless it was a fucking dense toucan. Anthropic might be in a similar boat
Open source catching up is making the companies privatizing their models be more honest with their pricing. Let's hope it keeps up and L3 400B isn't a fucking flop
>>
>>101094082
yes, random fags on epic 4chans will be privy to highly guarded corpo secrets
>>
>>101093137
I like how the hands are nice with proper fingernails. Back in SD1.5 it was hell to do proper fingernails
>>
>>101082388
they don't really report much on their compute lately but musk has been bullish on training since long before dragon summer
https://en.wikipedia.org/wiki/Tesla_Dojo
>>
>>101094082
The secret sauce is quantization aware training (QAT) int8/int4
>>
>>101093939
The Chinese model
>>
>>101082388
They building their super cluster now with Dell Nvidia + SMC stack.
>>
File: 11_04216_.png (1.21 MB, 1280x960)
1.21 MB
1.21 MB PNG
>>101092994
Gross, that's one dimension too many
>>
>>101094602
>>101094602
>>101094602



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.