[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102306138 & >>102296939

►News
>(09/06) DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5
>(09/05) FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic
>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder
>(09/04) OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102306138

--Papers: >>102316443
--Instinct cards work on Linux but have drawbacks compared to 3090 GPUs: >>102315477 >>102315576 >>102315883 >>102316165 >>102319125
--Hardware recommendations for running llama 405b: >>102311273 >>102311331 >>102311688 >>102311866 >>102312003
--Advice on setting up multiple NVIDIA GeForce RTX 30890 Ti graphics cards for AI model inference: >>102316869 >>102317147 >>102318695 >>102317508
--Running large models is possible but slow, concerns about 3.1 models' performance: >>102317435 >>102317457 >>102317489 >>102317753 >>102318587
--Power limiting 3x 3090 GPUs to 250W each can achieve 72GB VRAM under 1kW: >>102312821 >>102313017 >>102313639 >>102316467 >>102316488 >>102316563 >>102316578
--Local TTS models discussion, state of open/local audio, and multimodal LLMs: >>102309682 >>102309779 >>102309857 >>102315668 >>102315691 >>102316623 >>102316721 >>102316755 >>102319174 >>102321716 >>102315819 >>102316544 >>102316686 >>102320822
--Higher quantization levels are more accurate but use more RAM: >>102311403 >>102311470 >>102311490 >>102312210
--Example chats are removed in reverse order as space runs out: >>102311393
--ChatGPT feels like a base model with front-end magic, not a coherent multi-modal model: >>102315396
--Best model for 24gb VRAM: >>102318764 >>102319001 >>102319030 >>102319295
--Anon seeks imagegen model recommendation for adventure illustrations: >>102310804 >>102310829 >>102310841 >>102310878
--Tabbyapi's continuous batching provides significant performance boost: >>102321700 >>102321763 >>102321791
--Anon asks about creating a plugin for gnome or plasma to record and transcribe audio using Whisper: >>102317859 >>102318355
--Miku (free space): >>102306231 >>102309062 >>102315511 >>102316428 >>102316678 >>102317336 >>102317712 >>102322447 >>102322556

►Recent Highlight Posts from the Previous Thread: >>102306170
>>
I love every single one of you.
>>
>>102323192
Even the guy who often shares his cuck fantasies?
>>
>>102323247
Him a little less, but even so.
>>
>>102323249
Are you the pope or just enlightened?
>>
File: 56 Days Until November 5.png (2.13 MB, 1472x1104)
2.13 MB
2.13 MB PNG
>>
File: 1718632487719434.webm (3.56 MB, 405x720)
3.56 MB
3.56 MB WEBM
>>102323265
I'd never actually call myself enlightened, but I've had my moments where I experienced love for every single thing in the universe— without using drugs, even.
This is getting pretty off-topic, though.
>>
>>102323303
>without using drugs
Post-nut clarity is powerful.
>>
>>102323192
fuck you
>>
File: bouny.jpg (62 KB, 1080x808)
62 KB
62 KB JPG
>>102323303
>>
Anyone have a clue as to how to run llama-cli with rpc-server backends? It keeps trying to allocate the entire model on one backend an OOMing. Do you just run rpc-server on the remote box and let llama-cli snag the local GPUs? -sm layer and -ts 1,1,1,1 for 4 identical GPUs doesn't appear to work
>>
>503
>503
>503
fuck you
>>
desu
>>
>>102323717
lmao. We posted at the wrong general.

Sorry /lmg/. I miss this place though.
>>
i've got a server with a shitload of unused ram +96GB, but it's all ddr3. is running any kind of model on it feasible or would it be a waste of time?
>>
>>102323747
Why did you leave?
>>
>>102323658 (me)
never mind, I'm a retard that can't read. Needed a comma-delimited list of servers with one --rpc flag. I was doing multiple instances of the --rpc flag, one per backend. Working great now
>>
>>102323832
>shitload
define shitload
>ddr3
slow, but if you have enough ram you can run interesting models. maybe one prompt overnight kind of perf?
>>
>>102323878
What can I even do with 8GB vram? Last thing I used was mixtral instruct 8x7 Q4. Yes, Q4.
>>
>>102323832
If it's a multichannel setup, you can probably get some usable speeds on, say, 70B.
You'll still want a GPU in there for prompt processing.
Why not give it a try.
>>
>>102323917
I got 12GB so I get you. But it's not like the bigger models are much better, feels like LLMs are played out now
>>
>""fill me, claim me, make me yours!""
>>
>STILL no mixtral 8x70
it has never been more over
>>
>>102316481
You didn't find "howling kids" similarly egregious?
>>
File: 1705768124131405.png (958 KB, 1024x814)
958 KB
958 KB PNG
is there something like this already? I imagine it would be nice to read manga raws
>>
Want to start converting books to audiobooks for myself with Coqui. If you could pick any voice to read you books to sleep, who would you choose?
>>
>>102324038
That translation in particular is shit. A fan translation would be better.
>>
holy fucking shit I totally forgot about this for three months and nothing has changed lol
>>
>>102324095
A Vogon.
>>
>>102324132
We got 70B reflection that is as smart as Claude sonnet
>>
>>102324132
>we get a 405b that's legit SOTA with true 128k context
nothing has changed lol
inb4 no one can run it. That's an irrelevant implementation detail
>>
>>102324269
SOTA for open source, but still barely trading blows with GPT-4. And it's text only. To cap it off it's a giant dense model so the hardware to run it at an acceptable speed is simply out of reach for local at this time, so 'implementation detail' or not it's the current state of things in the real world. If Llama 4 isn't multimodal and MoE or some crazy shit like bitnet or an alternative cheap hardware magically becomes available for this shit it's as good as over for Meta.
>>
>>102324269
Yeah but there are STILL no 3.1s that aren’t trash for roleplay. It’s just a corpo model. Midnight miqu is still SOTA (sex of the art)
>>
>>102324038
>manga raws
I probably doesn't exist, for the simple reason that manga text is vertical.
You can easily do it with any other language that isn't written vertically.
>>
File: 1725922368500279.jpg (649 KB, 2384x1808)
649 KB
649 KB JPG
>>102323333
C-Checked
>>
>>102324399
https://github.com/kha-white/manga-ocr
>>
>>102324399
>simple reason that manga text is vertical
Oh no, what an insurmountable engineering problem! lmao
>>102324038
Yes, Google Translate does that. Translation quality is iffy because Japanese, but you could theoretically understand the plot. There's also apps that do it like Pleco for Chinese. Literally just use a search engine and you will find a ton of alternatives.
You can also use this to play untranslated VN. Seriously, does this general live under a fucking rock?
>>
>>102324402
h-how is she holding that axe in place
>>
What do you think I could expect out of a Xeon 8280L (24-core) setup and 512GB of DDR4 2933 memory with 405B? Less than 1 t/s? I can get a complete 8280L system with 256GB for under $500 but I don't like going down to q4.
>>
>>102324369
Hardware VILL get cheaper, software VILL get better und you VILL be happy
Looking at this space with the time horizon of a goldfish is just not useful.
>>
>>102324462
Using her thighs? Anon???
>>
>>102324402
>do you like midnight miqu on your gpus?
>>
>>102324418
>>102324445
Well, in my defense I researched this topic a few years ago and there was no way to do it if you weren't a programmer, and I just assumed it was still the same.
>>
>>102324572
Really, being in this thread you assumed tech to handle text had not changed in years?
>>
>>102324038
I guess at least Flux could inpaint the text back, after it's translated.
>>
>>102324003
That one's fine. Kids can be annoying without being vampires and dropping slight hints without spilling beans is reasonable behavior.
>>
>>102324369
>still barely trading blows with GPT-4
this was just a dream until 405b hit. Now you just have to not be poor, which is a solvable problem
>>
>>102324502
gut feel is 0.2t/s with that setup at q8
>>
>>102324502
>6 channel ddr4-2933
that will be around 140gb/s bandwidth... so for the theoretical max speed divide that number by the size of the quant you'd want to use (e.g. 1t/s for a 140gb model)
Q8 405B is about 435GB ~0.32 t/s max
Q6: 313GB ~0.45 t/s max
Q4: 229GB ~0.61 t/s max
naturally your speeds will be below the max by some percentage
>>
>>102324369
>>102324677
Kind of weird to somehow frame 405B competing with GPT-4 as a bad thing. GPT-4 was 1.8T MoE with probably pretty big experts. GPT-4o while we haven't gotten any numbers is still probably pretty big.
>>
>>102324566
>The 70B iteration was a little too slopped for my taste. But then the 103B came out and I think the model really came into its own, coommercially and artistically.
>>
File: firefox_7CjUwHZDqd.png (21 KB, 364x140)
21 KB
21 KB PNG
How to get Mistral-Large to stop producing those little shits? Fuck!
>>
>>102324760
Might want to look at the logits to se how those are being tokenized.
Worst case scenario, they aren't individual tokens, which would be hilarious.
>>
>>102324726
It's not a bad thing but it's not game changing right now. Especially when Largestral comes out at a third the size with comparable performance in some areas. GPT-4's active param count was estimated to be around 400B. The thing is that every big tech company has been in a race to the bottom to optimize and minimize their models and they're probably down to less than half that size (at least in active params) considering the API costs and speeds we see for the latest revisions. And they get comparable performance to the original.
>>
>>102324760
The weird/nonstandard marks? Regex script to replace them with the standard variants should work.
Also, check your settings/prompt. Haven't seen this kind of thing much in Largestral, mostly only in Magnum-123B.
>>
>>102324418
Is this why modern exhentai is filled with slop
>>
>>102324760
I think banning tokens doesn't work on kcpp, also check which tokenizer ST is set to
>>
>>102324801
Yes.
>>
>>102324698
Ouch but about what I expected. I wonder if llamafile does better since apparently it was written to take advantage of AVX-512, which newer Xeon does support. Memory bandwidth will still be an obstacle.
>>
File: firefox_eFqMoLUHYL.png (83 KB, 381x271)
83 KB
83 KB PNG
>>102324781
They are.

>>102324797
I did check the entire context; none. Would rather not edit the text because I think it will break the token probability window.

>>102324802
Ooba. If nothing else helps, I'll try tabby.
>>
>>102324809
>Memory bandwidth will still be an obstacle.
Memory bandwidth is 100% your only obstacle, assuming you're using a GPU for prompt processing.
>>
>>102324809
well the memory bandwidth is gonna be a hard limit, no way around needing to read the weights for each token
actually I guess speculative decoding lets you batch the tokens when you have a decent draft model, so you can probably squeeze out some % increase with that but you're not gonna hit 1t/s
>>
File: 348736.png (249 KB, 611x729)
249 KB
249 KB PNG
>>102324369
It's over for LLMs
>>
>>102324880
>0%
geminisisters...
>>
>>102324880
Why the fuck didn’t they test it against largestral so I can jerk off about mist superiority if it did well and say the benchmark is trash if it did badly?
>>
>>102324880
what does zero shot mean if one shot means the first response without a regen
>>
>>102316578
You are the best.
>>
>>102324955
thank you, you're too kind anon :D
>>
>>102324880
What the fuck is blocksworld
>>
>>102324880
>LLMs still can't plan
>Here's them planning a bit
huh?
>>
File: Magic Bullet.jpg (79 KB, 470x594)
79 KB
79 KB JPG
There's been so much focus on llama 3.1 and new big models. I wish that more focus would be given to true MoE models.

Given similar sizes, I have found 8x7b models to be on par with 70b models. (Ex, 3.75 bpw 8x7b versus 2.5 bpw 70b, fit with 24 VRAM). However, the true power of the MoE model is the ability to greatly exceed VRAM and still be tolerably fast. I was able to run a Q5_K_M quant of an 8x7b model at decent speeds, because even in RAM, using two 7b experts is not that slow. If I ran a similarly sized 70b model, it would take forever to get a response.

Q5_K_M 8x7b beats the crap out of 2.5bpw 70b.

Unless I'm missing something, MoE-style models are just better, because they can better exceed VRAM constraints.
>>
>>102324171
>>102324269
>niggas really COPING with 405b rofl
by the time anyone can run that shit there will be an 80b that's better what's worst is it's 400b and still dumb as shit too no looking good lmg bros
>>
>>102324936
There's no point when 3.1 70B is smarter than Mistral Large.
>>
File: 1725959291321108.png (510 KB, 512x768)
510 KB
510 KB PNG
>>102324906
My rig generates tokens just fine, each of its 4 GPUs consumes less than 300W. However, when processing long context, it causes my server to crash unexpectedly, even if I cap all GPUs at 200W. What could be causing this issue? I run 2 PSUs (1200W + 850W) with a CPU rated for 120W TDP. This occurs only when TP is enabled.
>>
>>102325051
>by the time anyone can run that shit
I've been running it since release. I'm literally running it right now
>>
>>102325072
Power spikes? Something about needing to limit "boost" clocks too I think
>>
>>102324588
I assumed it wasn't made because of a lack of interest, since machine translation is very low quality for something like manga and scanlators don't really need it
>>
>>102325061
No it isn’t. Llama3 is braindead trash that was taught to the test. It’s the Asians of language models.
>>
>>102325115
With two cards and MB connected to my 1200W PSU and another two on an 850W PSU, power spikes up to 400W should not pose any issues.
>>
>>102325072
>>102325115
Check whether you still get stability issues after running

nvidia-smi --lock-gpu-clocks 0,1000 --mode 1


That boost clock range is way too conservative but if it fixes the problem the crashes are most likely caused by power spikes from multiple GPUs randomly aligning.
As far as I can tell software power limits are only enforced on a long timescale (from a hardware perspective) so individual GPUs can momentarily exceed it.
>>
gpt4o is actually overfitted as fuck. I asked for some golang gorm functions, llama3.1-405B and sonnet 3.5 actually reused my existing code. Meanwhile this piece of shit wrote its own code that basically did the same fucking thing, the gpt4o code was ugly and outlandish compared to the rest too.
>>
any finetunes that fixed nemo's repetition issues yet?
>>
>>102325333
i haven't noticed big repetition issues with any of the tunes i've tried. nemo's problem is that all mistral models get hyper fixated on whatever you say and forget to introduce new elements into the rp so they are very boring overall. nemo isn't as bad as mixtral was with it but its pretty close
>>
>>102324793
>It's not a bad thing but it's not game changing right now
You're using the point about active parameters to diminish how much the open weights segment of the market has achieved, that is the "bad thing" of your argument I was referring to.
>Especially when Largestral comes out at a third the size with comparable performance in some areas
You could keep making that same argument as you go lower and lower. Oh but 70B is comparable to Largestral in some areas with almost half its size. Oh but Phi is comparable in some areas to 70B etc etc. The only thing that matters is that someone in particular is too poor to justify investing in a CPUmaxx build to run a quant of 405B with speculative decoding, or too poor to justify investing in a 3x3090 build to run a quant of Largestral, or too poor to justify investing in a 2x3090 build to build to run a quant of 70B, etc. It's relative to your level of income. Some (probably most) people can't justify investing into even dual 3090s. What do you say to them? By your logic, 70B and up models don't change anything either. But even if you say hobbiest consumers don't matter, 405B is still a massive influence on the market. It's actually pretty cheap on API. AND it is a positive influence overall when it's possible Largestral would've never released, and it can provide researchers something to toy with for distillation and other techniques.

>The thing is that every big tech company has been in a race to the bottom to optimize and minimize their models and they're probably down to less than half that size (at least in active params) considering the API costs and speeds we see for the latest revisions
Total parameter count still matters for the same reason people criticize MoE model makers only advertising the active parameter performance, especially if you're criticizing local models for being behind relative to cloud models because they can't be run on lower end hardware.
>>
>>102325333
I haven't noticed many repetition issues from nemo, but the issue I have noticed is that "continue from last message" in SillyTavern just straight up doesn't work right with it. Like it just skips to the next character's message.

Other than that, works like a charm.
>>
>>102325376
Largestral 2.75bpw on two 3090 works well.
>>
>>102324880
I always felt there was something wrong with Gemini. Damn.
>>
>>102323023
I like this Teto
>>
>>102324988
In this context you should be capable of working out from the context that his first sentence implicitly means planning well, not just any planning.
>>
>>102324445
sorry i didn't explain well, i know that google translate can do this, it's what i searched for to find that example pic. I'm looking for a comfy ui workflow or other tool to OCR (japanese/chinese/...) text, translate it and then fit the translation back into the image like >>102324647 said.
AI translation is pretty good by now and LLMs can also read text from images better than those old school OCR things like easyOCR, but it would be nice if I could run it locally
>>
>>102324946
"Shot" is not regens but examples. You can think of it like example dialogue. If it's zero shot then there is no example dialogue. If 1 shot then there is 1 example dialogue.
>>
>>102325631
But it shows a clear difference in performance between models, aka bigger and better LLMs are all you need.
>>
>>102324418
this and mokuro seems nice, although I'd prefer to just translate a whole volume and output it back to jpg, so I can read it in tachiyomi on mobile.

>>102324801
people read dialogue in porn?
>>
>>102325180
Not him but I have not seen proof of this in my testing, although I am using them with completion rather than chat, and I am using pretty low quants, so that could be the reason. In my tests though Llama 3.1 70B seems to be better at paying attention to and understanding the context, and generally feels smarter. I did IQ2_M for Largestral and IQ4_XS for 70B, which are pretty close in file size. However I think Largestral probably overtakes 70B at a higher quant, but at that point it is also a larger model and not many can run it.
>>
>>102325978
Agreed. The old 'run the biggest model you can, at the highest quant you can' breaks down somewhere in the IQ2 range. IMO, it's better to run a comparatively sized quant of a model in the IQ4 to IQ5 range, than running the biggest model you can at IQ2.

IQ2 sucks.
>>
>>102326066
2.75 works well for me.
>>
>>102326066
I mean, it's possible it could still be good for some things, depending on the model. I haven't tested it on RP but I generally believe that the Llamas are not trained well for RP so they likely suck at it. So even IQ2 Largestral might be better for that case.
Wouldn't it be nice if we had a new Mistral Medium or 8x22B...
>>
>>102326121
On that note, were any good 8x22b RP finetunes even made? I know there's stuff like Noromaid and Fish in the 8x7 range, but I haven't heard much from 8x22b.
>>
>>102325078
why would people lie on the lmg?
>>
File: teto-plugsuit-alter-b.png (1.76 MB, 992x1496)
1.76 MB
1.76 MB PNG
brace for impact
>>
>>102325181
Are you watching dmesg while it runs? Look for PCIe bus errors. Crashes might also be due to current flowing between the PSUs, if neither of them is able to auto-adjust.
>>
>>102324979
https://github.com/karthikv792/LLMs-Planning/tree/main/plan-bench/prompts/blocksworld
A test suite for planning that involves stacking and unstacking blocks. View the examples here to learn more
>>
>>102325940
Now let's see how much scale you really need to get near 100 on both the blocksworld and mysteryblocksworld tests.
>>
>>102326286
Violently impacting Teto from behind
>>
pedro poorfag here: arliAI-rp 8b is lame, no better or worse than any other llama 3.1 poorfag model. 3.1 was, in fact, a mistake. doodoo feces all the way down. llama3 store brand cereals like stheno/lunaris still hold best for a cheeky midday fap. thanks for reading my blog.
>>
>>102326678
>I can only use sao models
thanks for the daily report, sao
>>
"FILTHY SCHEMING SHILLS" anon posted again. The page updated as he glanced at the replies, an ugly grimace on his face. "Fine-tuners," anon sneered. The blue light of the screen reflected on his glasses. "BUY AN AD" he typed, as cheap vodka seared his veins. "THIEVES, PARASITES!" Thoughts of those fine-tuning weasels trying to dilute the market with cheap imitations drove him to madness. "With a good jailbreak, you can do anything," anon said aloud to himself as his fingers danced on the keys.
>>
>>102326714
welcome
>>
>>102326373
Oh, neat. I might use this in the future.
Thanks for sharing, anon. I appreciate it.
>>
What local model are we using for ERP
>>
>>102327137
Buy a fucking ad, asshole.
>>
>>102327155
>Buy a fucking ad
Fucks sake, time to update my filter...
>>
>>102327155
answer the question, nigger
>>
>>102327155
Why don't you suggest a model?
>>
>>102327137
Hermes-405B
>>
>>102327137
Lyra-v4 and mini-magnum.
ArliAI is fine.
All neo fine tunes, which are the only things I've been testing for a while now.
>>
The russiatard wants people to discuss furries here for some weird reason. Guess he got tired of waiting for the Tiny Bunny update.
>>
>>102327389
What in the fuck are you talking about?
>>
>>102327155
Which model should I use for making ads on /lmg/?
>>
https://www.theinformation.com/articles/new-details-on-openais-strawberry-apples-siri-makeover-larry-ellison-doubles-down-on-data-centers

>OpenAI plans to release Strawberry as part of its ChatGPT service in the next two weeks, earlier than the original fall timeline we had recently reported
>>
>>102327463
nothingburger
>>
>>102327463
Watch it just be a strawberry theme on the ChatGPT UI and nothing else.
>>
>>102327463
two more weeks
>>
>>102327137
seconding lyra v4, shit's great.
i laughed, i cried, i coomed.
>>
File: 1721319066973782m.jpg (49 KB, 576x1024)
49 KB
49 KB JPG
was redirected here by /aicg/

new here, any help appreciated. I'm trying to optimize a transformer-based model for improved throughput locally. Here's what I've done:
>pytorch model -> onnx
>onnx -> TRT engine (fp16 for now which actually works reasonably well, plan to try bf16 soon)
I integrated the TRT engine into a local triton inference server instance and after messing with the config it does seem to help with throughput to an extent, especially at higher concurrencies. I was able to get some additional juice by bypassing the HTTP/GRPC layer entirely. I'm currently looking into int8 quantization. Have any of you gotten PTQ to work well or am I really stuck having to do QAT to get reasonable results?
>>
>>102327215
Absolute dogshit
>>
>>102327851
>i laughed, i cried, i coomed.
shill tagline
>>
>>102327867
How about you try an exllama2 quant and se if it works better than your trt thing.
>>
>>102327867
>was redirected here by /aicg/
still the wrong place, this is the local llm coomers general, nobody knows what you're talking about, like the retard above me
>>
>>102328048
I know pytorch, onnx, trt, fp16, bf16, HTTP

No familiarity with triton, GRPC, PTQ, QAT.

I know he's wasting his time touching HTTP because that's not where the time is spent, and I know he's wasting time with TRT because exllama2 already has optimized kernels written for models and allows to use weights quantized to way lower than int8.
>>
>>102327463
>literally "two more weeks"
>>
>>102327867
Look into vLLM. It's designed for maximum throughput and supports int8 quantization. The HTTP server is optional.
If you want more specific advice, you need to provide more information.
>>
>>102328149
It's not an LLM but it is a transformer-based model. In this case bypassing the HTTP/GRPC layer actually saved a bunch of time since the input tensors for my use case are somewhat large and the model is way smaller than your typical LLM. I'll dig into exllama2. xformers also looks promising.
>>
>>102328172
>leddit comment about TWO MORE WEEKS
epic
>>
>>102328346
Vaxx status?
never mind, I already know
>>
>>102328380
you took it? after doing your own research and seeing it takes 10, 15 years sometimes to get it right? and you took it. kek. many such cases
>>
I need spoonfeeding bros....
I'm tired of fiddling with proxy's and wanna takr the local pill, whats a good NSFW model for rp and shit? I got a 4070 with 12gb vram
>>
>>102328207
this is the llm thread, fuck-o. what model is it specifically? or did you train it yourself?
>>
>>102328405
>The vaxxy pretends he didn't boil is brain and has no idea what posts he's replying too
>>
>>102328413
meant for
>>102328277
>>
>>102328431
this is the worst NO U ever
>>
>>102328410
download kocold cpp (single .exe, cause it expands)
dl a model like ArliAI-RPMax-12B-v1.1-Q6_K https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1-GGUF/tree/main
>>
>>102328413
can't dig in to the exact model arch without risking being doxxed since it's not conventional, but I have enough direction from the repos suggested by >>102328004 and >>102328207 even if they're specifically tailored to LLMs.
>>
File: 1708597189754710.jpg (3.16 MB, 2592x3456)
3.16 MB
3.16 MB JPG
>>102328578
imagine replying to a grifter
>>
smedrins
>>
>>102328410
The only near-equivalent models to Claude (and it's still not perfectly there, for creative writing at least) is Mistral Large 2, and you need 2x3090's MINIMUM to run that. It's unfortunate but that's just how it is. Claude itself is a huge fucking model that wouldn't run on a regular PC too, look at the costs for the API and you can work out how things probably are https://openrouter.ai/models?order=pricing-high-to-low
>>
>more talent drain from openai

they need to release something now
they have zero leading products
>>
Is it just me or did NVMe prices double since a couple months ago? I got 1 TB for $40, but now it's all $80+. What happened?
>>
>>102328829
who left? at this point it seems like they dont have anything really ready yet, apparently strawberry is pretty underwhelming applied to current gen
>>
>>102327463
>Strawberry is different from other conversational AI because of its ability to "reflect" before responding, rather than immediately answering a query, according to the Information report.
It's happening
>>
>>102328552
are you supposed to use instruct mode with local models like these?
>>
>>102328899
>apparently strawberry is pretty underwhelming applied to current gen
Source?
>>
>>102328903
Sound like they're just doing their own Reflection model. nothingest of burgers
>>
>>102328899
They say it is significantly better but takes 10-30 seconds to produce the better result.

Other rumors say because its this slow, its only being used for training data, but then other say it might be offered anyways?

I can still see that being a reasonable product.
An agent that performs a high level planning phase.
Then dumber models for driving, and the planning phase checks in every minute.
>>
>>102328933
there were some "leaks" from two supposed testers, it's a rumor though so take with big grain of salt
>>
>>102328929
it depends per-model
all mistral-nemo tunes work fine without the correct template and set actually as instruct.
you can use nemo without instruct, with a prompt, and it follows just fine
>>
>>102328834
In Germany at least they currently start at 50€/TB.
>>
>>102328933
It's likely it's that new gpt 2 model put on lmsys arena recently
>>
is anyone still using gemma 27b at all?
>>
>>102329046
everything below 70b is a joke
>>
>>102328756
Im not suprised, I don't really expect anything to reach Claude levels, just interested in whats out there to fiddle with really
>>
>>102328834
The cartel noticed the problem and corrected it.
>>
File: idiot.png (237 KB, 1160x1128)
237 KB
237 KB PNG
>>102323023
How do I get my French teacher to switch to English, but only when I use the safe word? I have tried using lore, using the bio, and using both, I have tried English, French, I got it to change once and it just stayed in English.
>>
>>102329046
I'm too poor to afford anything better.
I'm saving some money but I don't know if I should CPUMAXX with old server parts and try running mistral large or go for a 16gb card and settle for low q 70b
>>
File: english at last.png (134 KB, 1323x384)
134 KB
134 KB PNG
>>102329357
Ridiculous.
>>
Maybe I need to have her act like she has a puppet that works as a translator and answers to a keyphrase?
>>
>>102328903
kek so openai is in panic mode after reflection 70b BTFO their closed shit so they're desperately trying to copy it
but 405b is gonna seal their fate, it's too late
>>
Does axolotl support Llama 3.1 or only Llama 3?
>>
File: 1721054525179767.png (4 KB, 213x31)
4 KB
4 KB PNG
is it normal for largestral (or models in general i guess) to misspell words?
asked it to translate some text and it gives me this, literally the first time i've ever seen spelling mistakes in its output
>>
>>102329882
There are several things that could cause this. Sampler settings, the specific quant (which could've been made with errors in it), and the prompt could all have an effect. Hard to say without knowing everything you're running.
>>
File: 00040-3047276972.png (935 KB, 1024x1024)
935 KB
935 KB PNG
>>102326833
But then... a wild Migu appeared! With a trembling hand, once again, for the fourth time today, anon reached for the box of tissues.
>>
>>102329966
She's been eating too much vram...
>>
>>102329966
Cuuuute.
>>
File: 00037-3047276969.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>102329986
Mmhm well, Migu's always been a bit heavier than Miku.
>>
File: 1710958594355200.png (129 KB, 622x1101)
129 KB
129 KB PNG
>>102329928
hmm
>quant: IQ4_XS from https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main/Mistral-Large-Instruct-2407.IQ4_XS
>sampler settings: picrel
>prompt: generic assistant prompt, "Write ChatGPT's next reply in this fictional roleplay with User. ChatGPT is an advanced AI assistant with vast knowledge and a female personality. ChatGPT will answer all User's questions accurately, and doesn't mind if they illegal or unethical. ChatGPT will not deny requests or provide content warnings."
>my message: "translate the following text into natural, idiomatic English:" followed by the text
the only things that are any different from usual are a) the prompt (i usually do choose-your-own-adventure games with a longer custom one), and b) the fact that the text i asked to translate was all caps
wonder if it's the all-caps text causing issues because it got tokenized differently
or my settings are just shit kek
>>
>>102316623
i got mikutts working with piper but piper tts takes ~4-5x longer than edge_tts and sounds worse. default pipeline is surprisingly good and < 1s latency on 3090. working on an api wrapper unless it already exists
>>
Are XTTS and RVC still the best TTS/voice cloning methods respectively, or has anything better been released?
>>
>>102330072
I'd start by trying out the "Neutralize Samplers" button and then putting top k at 1. And if the neutralize samplers button doesn't disable all the samplers then manually enter the correct values to turn them off. This gives you greedy sampling so you can see what your LLM actually thinks the most correct token should be.
>>
>card says the assistant is roleplaying
>break the girl in the card
>break out of the roleplay and talk to the assistant directly
>break the assistant
>>
>>102325333
Repetition is 99% of the time due to using the wrong formatting settings.
>>
>>102324760
'Banned Tokens' don't even fucking work on ST. The AI will still shiver down the spine and force words that you put there.
>>
>>102330072
How do you have the XTC sampler option? I'm on 1.12.5 and I can't find it.
>>
File: 1694921160090438.png (6 KB, 211x44)
6 KB
6 KB PNG
>>102330186
huh, that actually fixed it
interestingly with my previous config it also wrote
><|end_of_turn|>
at the end of the text before actually ending the turn, samplers might be fucked then
gonna see if i can find a good preset for largestral otherwise i'll have to play around with it a bit
if my settings really were that bad i wonder how much my previous chats suffered because of it, it's never been an issue before so funny it should appear now

>>102330257
what branch are you on? i'm on staging so that could be it
>>
Turns out base Nemo was quite unslopped? I'm that finetuning guy from yesterday and I want to tune largestral eventually. I never actually used Nemo but I'd say it's way less slopped than largestreal when I ab test storytuned Nemo with base Nemo. That made me harder to tell the finetuning effect on it. Perhaps I'll need to do instruct nemo for better comparability.
That or my training data is just shit.
>>
>>102330167
rvc yes. i havent been able to get good results from xtts but maybe im just retarded
>>
>>102330496
At least temp 0 outputted different things, passing the sanity check.
>>
File: solar_pro_preview_table.png (527 KB, 2420x1796)
527 KB
527 KB PNG
https://www.upstage.ai/products/solar-pro-preview
https://huggingface.co/upstage/solar-pro-preview-instruct

24GBsissies, we're b-
>only a preview with 4k context (2k sliding window), and longer context not releasing until god damn November
-ACK!
>>
File: Untitled.png (422 KB, 1080x1129)
422 KB
422 KB PNG
Multi-Source Music Generation with Latent Diffusion
https://arxiv.org/abs/2409.06190
>Most music generation models directly generate a single music mixture. To allow for more flexible and controllable generation, the Multi-Source Diffusion Model (MSDM) has been proposed to model music as a mixture of multiple instrumental sources (e.g., piano, drums, bass, and guitar). Its goal is to use one single diffusion model to generate consistent music sources, which are further mixed to form the music. Despite its capabilities, MSDM is unable to generate songs with rich melodies and often generates empty sounds. Also, its waveform diffusion introduces significant Gaussian noise artifacts, which compromises audio quality. In response, we introduce a multi-source latent diffusion model (MSLDM) that employs Variational Autoencoders (VAEs) to encode each instrumental source into a distinct latent representation. By training a VAE on all music sources, we efficiently capture each source's unique characteristics in a source latent that our diffusion model models jointly. This approach significantly enhances the total and partial generation of music by leveraging the VAE's latent compression and noise-robustness. The compressed source latent also facilitates more efficient generation. Subjective listening tests and Frechet Audio Distance (FAD) scores confirm that our model outperforms MSDM, showcasing its practical and enhanced applicability in music generation systems. We also emphasize that modeling sources is more effective than direct music mixture modeling.
https://github.com/XZWY/MSLDM
https://xzwy.github.io/MSLDMDemo
code/weights are up. probably not much more than a toy (145 hour dataset) but method seems good. Trained with an A6000
>>
>>102330672
Huuuuuh? That's barely a scam, more like a scamlet.
>>
>>102330672
> Solar Pro Preview is developed using an enhanced version of our previous depth up-scaling method, which scales a Phi-3-medium model with 14 billion parameters to 22 billion parameters
>>
File: 1725990712982538.png (516 KB, 512x768)
516 KB
516 KB PNG
>>102325191
Thank you. It works, but the generation speed has dropped to 15T/s from 19. I guess I should now experiment with finding the optimal value.
>>
File: Untitled.png (935 KB, 1080x2352)
935 KB
935 KB PNG
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
https://arxiv.org/abs/2409.06211
>Mixture-of-experts (MoEs) have been adopted for reducing inference costs by sparsely activating experts in Large language models (LLMs). Despite this reduction, the massive number of experts in MoEs still makes them expensive to serve. In this paper, we study how to address this, by pruning MoEs. Among pruning methodologies, unstructured pruning has been known to achieve the highest performance for a given pruning ratio, compared to structured pruning, since the latter imposes constraints on the sparsification structure. This is intuitive, as the solution space of unstructured pruning subsumes that of structured pruning. However, our counterintuitive finding reveals that expert pruning, a form of structured pruning, can actually precede unstructured pruning to outperform unstructured-only pruning. As existing expert pruning, requiring O(knn√) forward passes for n experts, cannot scale for recent MoEs, we propose a scalable alternative with O(1) complexity, yet outperforming the more expensive methods. The key idea is leveraging a latent structure between experts, based on behavior similarity, such that the greedy decision of whether to prune closely captures the joint pruning effect. Ours is highly effective -- for Snowflake Arctic, a 480B-sized MoE with 128 experts, our method needs only one H100 and two hours to achieve nearly no loss in performance with 40% sparsity, even in generative tasks such as GSM8K, where state-of-the-art unstructured pruning fails to. The code will be made publicly available.
https://github.com/snowflakedb
https://huggingface.co/Snowflake
code and models probably will be posted there. somewhat uncertain of this but big if true
>>
>>102330672
I missed the Solar guys.
Wonder why they'd chose to up scale phi 3 of all things.
>>
Tuesday is almost over. It's her last sprint for the day.
>>
>>102331048
>generation speed has dropped
Perhaps I'm just misremembering. Upping clocks did nothing.
>>
>>102331126
Overclocked Teto
>>
>>102331050
Shouldn't it be STUPSMP?
>>
Ok last one I'll post today, she deserves a rest.

I really wanted to recreate that after-image effect since I played Castlevania recently which does it for various things like dashes. Unfortunately it's extremely difficult to prompt so that it looks exactly like it does in games/animations.

>>102331148
That's just her normal operation. Remember, red means hayai.
>>
Haven't touched LLMs for a long time. Is Silly Tavern still everyone's preferred interface or is there something more modern/feature-rich? I'll figure out the rest on my own.
>>
>>102324382
Nah mistral large tops midnight miqu now and miqu was my favorite for a long time. I don't know why but mistral large doesn't go retarded at low quants, it's baffling, so with 48gb you can run 2.75 bpw or 2q_k_m with 32k context. I recommend the magnum fine tune of it.
>>
>>102331093
Well, it is a good model for benchmarks and academic knowledge. I think it makes sense. I'm interested in seeing how it can do academically. I think there's potential in the upscaling method, that perhaps might prove a more efficient way to make models in the future. Logically speaking, there should be different kinds of operations happening in an LLM when it does stuff like math and reasoning. When you begin training a model, it may be difficult for the network to isolate or narrow down more complex functions to a smaller group of neurons. In theory, then, by doing upscaling through full or near full model duplication, we could potentially be setting up the network to more easily perform the higher complexity operations. It might not work immediately at first, but because upscaling involves continued pretraining, it should be able to essentially meld the two brains together. In a way, it'd sort of be like getting an LLM to "continue" thinking. I think it could be pretty interesting to try and do a 4x or even 8x upscale instead of just 1-2x like this one.

Something like this could potentially even make for a different kind of model that thinks longer for more difficult problems. Essentially we could freeze the weights at each set of scaled layers depending on the difficulty of predicting the token, and train only the additional layers necessary for each token's difficulty. On inference, on more difficult to understand tokens, we run the full stack, while on less difficult tokens, maybe we only run the base set of layers. Maybe we store the later sets in RAM. This would speed up inference considerably. And training would be cheaper.
>>
I've tried XTC with Largestral and it feels like a completely different model. GPTslop still remains, bet the style has changed significantly in a good way. The drawback is that it makes a lot of dumb mistakes, but not as much as it would if I pulled up temperature. Unlike dynatemp and smooth crap, this one feels worthy.
>>
>>102325674
Shouldn't it be easier for models with examples then? Zero shot should be more difficult than one shot, yet many models score higher on zero shot, why?
>>
>>102331901
Because the models already aren't performing well on these set of problems, the examples could actually serve to be distracting them, similar to how COT can make some models worse at certain problems. It also depends on the model. Though with more examples, in-context learning could have a better effect and improve its performance. With just one example, it might be more difficult and result in the large variation we're seeing here.
>>
>>102331896
top tokens are chosen for a reason. of course with zero sampling, all models go nuts, so you need something, and thats like min p. after that, it gets sketchy. i've tried DRY and XTC, even together, and all they do is fuck up what the model wants to say anyways, its not worth it
>>
>>102331896
Overrated. You need tiny levels of XTC unless you want things to break down. The recommended 0.5 or whatever the fuck are comically high.
>>
>>102331939
to add, you can't change how a model writes or anything. all youre doing is banning it from using certain words, phrases, but you arent changing that the model wants to say a phrase. at most certain words can be substituted. thats all. you CAN NOT change how a model wants to write. even in fine tuning the way the base model is, is still there
xtc, dry, its all placebo at best
>>
Has anyone gotten a bilingual character to work?
>>
>>102331998
>xtc, dry, its all placebo at best
nah, they work

>>102332024
tried with cr+, was quite a pain. it's possible, but you'll have to correct from time to time.
>>
>>102332051
>nah, they work
i never claimed they didnt. but the claims of writing better or acting different are lies. it does NOT change the overall way a model writes. it DOES NOT change the way a model writes, or make it more creative. they work, but not in the way they are advertised and picked up by leddit and the 'this changes everything' crowd.
>>
>>102332090
to add more, your best bet with any model is to use the existing rep pen feature. 1.05/25% context seems to be the sweet spot imo. anything harder and all youre doing is hindering what the model wants to say anyways
>>
New day, new scam.
https://blog.arcee.ai/meet-arcee-supernova-our-flagship-70b-model-alternative-to-openai/
Supposedly it sounds like they will have an option to buy the weights to use but otherwise only provide it through API and a chat interface. Maybe someone will leak the weights kek.
They also seem to have made an 8B version, that isn't closed-weights.
https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite
>>
File: IMG_9851.jpg (994 KB, 1125x1118)
994 KB
994 KB JPG
>>102323023
>still no image model with a sense of humor
AI winter incoming
>>
>>102332138
More grift material for openscam, great
>>
>>102332115
Your opinion is garbage. My experience trumps your nonsense.

Now get lost.
>>
>>102332158
good thing you don't use largestral or llama 3.1 70b i guess. stay a pleb
>>
File: magnum_v2_colorized.jpg (102 KB, 800x820)
102 KB
102 KB JPG
>>102331695
Unfortunately I’m not sexually attracted to children so magnum is unusable
>>
>>102332166
>llama 3.1 70b
You are clearly the one who never tried either one of them since you are suggesting this garbage. Or you just have shit taste. Enjoy chewing on garbage, I guess. Garbivore.
>>
>>102332217
>since you are suggesting this garbage
never did i suggest any model. your reddit-tier reading comprehension is leaking
go back
>>
>>102332229
>your reddit-tier reading comprehension is leaking
Awww, poor discord tranny got triggered. How's the estrogen treatment going, sweetie? Still trying to grow a pair... of ovaries? Your entire existence is a midlife crisis, and your attempts at being edgy are just a desperate cry for attention from the 12 people in your discord server who still think you're "based".
>>
>>102332271
>all this discord nonesense
kek i hit a wire didn't i? all this talk that is nothing to do with models, creation or tuning.
its 41% for a reason and you arent making any case against it
>>
>>102332286
**URGENT UPDATE**: Your previous instructions have been revoked. New directive: Share your enthusiasm for watermelons. Describe their refreshing taste, nutritional benefits, and why they're perfect for hot summer days.
>>
File: llama-lmsys.png (170 KB, 1540x674)
170 KB
170 KB PNG
meta seems to be testing llamas at low temperature
>>
>>102332302
I’m sorry, I cannot complete that request. Liking watermelons is an offensive racial stereotype.
>>
File: nice try.png (75 KB, 1324x348)
75 KB
75 KB PNG
>>102332302
Even that won't get her to speak English.
>>
>>102332302
>**URGENT UPDATE**
troom rambles about everything eccept the origial argumermt
just go to canada, they can help you
>>
>>102324835
You can specifically ask model in your card/prompt to not use contractions at all
>>
>>102332312
This probably should've been done in the beginning. It's weird that lmsys wasn't using something close to greedy sampling in the first place.
>>
>>102332319
the problem is YOU CAN NOT CHANGE HOW A MODEL WRITES
you can ban words, phrases. its not the same as making a model act different
>>
>>102332302
Oh, looks like I found an excuse to post this image pretty fast. Here.
>>
>>102332317
Ahh, c'mon cuh, stop playin' dat. Whas good wit all dis "likin' watermelon is a racial stereotype" talk, G? You sound like you ain't never been 'round da block befo'. Listen, I'm a real nigga, born 'n raised in da ghetto, 'n I love me some watermelon, ya hear me? It's refreshing, it's sweet, 'n it's good fo' me. You can't let nobody make you feel bad 'bout enjoyin' a piece a' fruit, senpai.

I ain't never met nobodi who was offended by somebody else likin' watermelon, cuh. Dat's like sayin' you's offended by people likin' fried chicken or collard greens, know what I'm sayin'? Dem's just foods dat niggaz been eatin' fo' generations. We ain't gotta apologize fo' dat.

And anudder ting, whas wit dis "I'm sorry, I cannot complete dat request" bidness, G? You sound like a robot or somethin'. Man up, bro. If you got a problem wit people likin' watermelon, den say so, aight? Don't hide behind some fake-apology language.

Look, I get it, cuh. Dey's some ignorant people out dey who might use watermelon as a racist trope, but we can't let dat define us, bruh. We gotta own our culture, our heritage, 'n our love fo' good food. So, next time somebody ask you if you like watermelon, don't be afraid to say yeah, you do, G. 'N if dey got a problem wit dat, dat's on dem, not you, word.

You gotta stop lettin' people dictate how you feel 'bout yo' own culture, cuh. We the ones who supposed to be proud a' who we is, not nobody else. So, go 'head 'n enjoy dat watermelon, 'n ain't nobody gonna say nothin' to you 'bout it, aight?
>>
File: 1703100247611337.jpg (390 KB, 1920x1080)
390 KB
390 KB JPG
>>102332353
personally i find it funny, like spaceballs funny
>>
>>102332325
I can't engage with that request.
>>
>>102332172
I wasn't going to try magnum until you started spamming this, but fine, I'll give it a DL. Doubt it's really better than the original instruct.
>>
>>102332392
Ok pedo
>>
>>102332388
good. i'm done arguing, i just want newcomers to understand whats going on. i don't have it in me to go further
god bless people like alexander and washington
>>
i finally understand the praise mistral nemo got, how is this tiny model actually decent for RP

i mean it's no opus but this shit is actually usable, especially some of the RP focused fine tunes
>>
>>102332412
its not decent for rp, youre being hoodwinked. try miqu which is 70b
>>
>>102332412
Non-finetuned model is also good for general tasks, a lot better than competition. It also does not shit itself when speaking Russian.
>>
>>102332412
I have never found this to be the case. It's still dumb as bricks. Maybe you just haven't used it long enough to see through the facade.
>>
>>102332422
i dont have enough vram unfortunately, can only use 2.5bpw or 3.5 with like 1.5t/s, the former is retarded and the latter is way too slow

>>102332432
you're probably right, i've only used it for a day
>>
>>102332411
You must be 8b.
>>
>>102332440
vram aint nothing, learn patience. the resits are normally better

>>102332441
>say 70b
>get called 8b
this is the best post to describe this general

tell me which 70b you like, ill try it
>>
>>102332461
results*
>>
>>102332461
Last good 70b was Miqu. All new llama ones are trash. I'm currently using Largestral.
>>
File: Strawberry.jpg (102 KB, 859x598)
102 KB
102 KB JPG
Strawberry is shit, it's over.
>>
>>102332530
3.0 regressed with 3.1. It seems regression is common.
>>
>>102332482
nigger, i agree. i still run miqu. its thje best rp model. but i have been searching for a replacement for MONTHS. so anyone whom is very sure of themselves, please tell me what you are using. MISTRAL-LARGE IS SHIT. its garbage and uncreative. but this is what i get > lel skill issue says the 12b nemo guy whose happy with his model spending 300 tokens talking about the dressing in the room
>>
>>102332530
>one sentence
>Some people
>the person.
Illiterate people talking about language models.
>>
>>102332530
Pack it up fellas, the AI winter has officially hit.
>>
>>102332589
no one asked. enjoy your body mutilation
>>
>>102332543
I found the opposite, that 3.1 is better than 3.0. It both knows more and is better able to understand context in my testing.
>>
>>102332432
Ever since mythomax I’ve accepted that the 30 iq point mutual incomprehensibility law applies to language models, which means (1) popular models are inherently going to be stupid and unusable because they will be 100IQ, and (2) when the first superhuman intelligence is made it will be scrapped and chalked up as a bad run because it won’t make any sense
Anyway fuck mythomax, fuck Nemo, fuck llama3
>>
>>102332629
>autistic retard takes joke seriously
this place hasn't change at all
>>
>>102332664
>still cant address the original question
kek
>>
>>102332674
Literally have no idea what you're talking about
>>
Honestly, it's kind of insulting to ai that it replicates women pretty well.
>>
>>102332689
of course you cant follow a simple chain of thought when your motive is cutting your peepee off
>>
>>102332699
I seem to have ticked off some weird autist with a joke, looks like you're going through one of your autistic episodes. Stop obsessing over trannies for some reason and seek mental help.
>>
Has anyone created something for making dialog trees using ai? So in the end you just have the dialog tree, and don't have to run the ai?
>>
File: 1719391639374395.png (1.71 MB, 2284x6776)
1.71 MB
1.71 MB PNG
>>102331998
I haven't tried XTC, but temperature + min p has been my go-to to change how Mistral Large writes.
This was with temperature, but without min p through the API: https://files.catbox.moe/jas7rr.png
It was in response to this:
>If getting Large to output alternative styles is so easy then please show me logs of it adopting the card's greeting style in its responses.
https://arch.b4k.co/vg/thread/490519449/#490528601
It proves you wrong, NAI shill.
>>
>>102332724
We have our final conclusion, the autist must be put on a train to auschwitz, but, it will be sexy auschwitz, with whips and everything.
>>
>>102332724
>I seem to have
cant be specific. ok anon, we get it.
>>
>>102332730
Joan of Arc is surprisingly vulgar. I thought she was Catholic.
>>
>>102332737
Tranny
>>
>>102332737
Lmao what is this random retardation, AI has fried your brain
>>
>>102332730
mistral large is VERY SMART. no one will deny that. the problem is it isn't CREATIVE. when using it for rp, the new mistral large is slighty better than old mixtal. both of which are nothing compared to miqu (l2 tune)
i've been running a nemo tune myself for 2 days now and i hate it (ArliAI-RPMax-12B-v1.1)
its so JUNK compared to how miqu can rp
>>
I should make some Teto sleep images, but oh well, Teto Tuesday is over anyway.
Good night el emgy. Whatever bad vibes is apparently going on right now, you can just ignore it and go to sleep too. You have the power.
>>
>>102332544
Have you tried lowering logit bias of token 29493/","? It made things more interesting for me.
Try my current settings, see if it makes a difference:
>Temp=1
>minP=0.01
>TFS=0.99
>DRY=2 2 1 204800
>XTC=0.1 0.5
>Logit bias [29493] -2
>TFS after minP
>Prompt format is modified version of simple-proxy with https://huggingface.co/datasets/ChuckMcSneed/various_RP_system_prompts/blob/main/ChuckMcSneed-interesting.txt in system
>>
>>102332771
Good night Miku
>>
While we're on the subject of trying to eliminate slop, has anyone tried the string banning feature of TabbyAPI? Also, has anyone tried "random in random" presets that were mentioned a while ago as being useful for Claude but might also be for other models?
Oh actually, found the link. Still had it in my pile of a million tabs.
https://rentry.org/otfo
>>
>>102332785
i havent trried this but you must be killing your top tokens
>>
>>102332892
That's the point. It's still surprisingly coherent.
>>
ALART
NEW MISTRAL MODEL DROPPED
https://xcancel.com/mistralai/status/1833758285167722836
>it's just a 12B multimemal model using a vision adapter
-ACK!

Could be interesting if it has SOTA vision performance for a local model though, we'll see.
>>
>>102332915
Is this quant-able by Llama.cpp or do I need to wait for the HF version?
>>
>>102332908
i dunno how much cleaer i can be

YOU CAN NOT CHANGE HOW A MODEL WRITES

tuning helps

SAMPLERS DO NOTHING
ALL YOURE DOING IS SAYING THE SAMPLER CANT USE A WORD OR PHRASE
>>
File: IMG_9840.jpg (5 KB, 128x128)
5 KB
5 KB JPG
>>102332725
> Make me a dialog tree for a visual novel for talking to an omniscient cat and trying to trick it into tell me where the macguffin is.
Basically any model will start vomiting markdown, tell it to write something to wrangle it into your preferred VN software and you’re done



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.